Background jobs evolution: Rake, custom daemons, Resque

Background Rake tasks

When we start one of our projects there is always the need for some background tasks: updating online/offline user counts, doing some remote API calls, fetching a fresh version of GeoIP database, etc.
The easiest and most straightforward solution was the creation of appropriate Rake tasks and scheduling their use using Cron. It is easy to implement, easy to test (just run a Rake task), and we used a perfect Whenever gem for defining those Cron jobs with a clean Ruby syntax.

As our projects progressed, the number of background tasks increased. Most of our Rake tasks had one great disadvantage: they were dependent on Rails itself, and so every time we launched those tasks new Rails instances were initialized. It was OK for us to have Rake jobs, each of which takes, let’s say, 5-7 seconds and 100mb of RAM to initialize Rails and then less than a few seconds for doing the job itself, because we had just 2-3 jobs per hour.
Once we got about 3-4 different tasks running, each 5-10 seconds for every job, Rails initialization time and its memory consumption became a problem. So we had to use find a thriftier solution.

Custom Ruby daemons

Custom Ruby daemons (as were shown in Railscast by Ryan Bates) seemed a good alternative to the Rake and Cron combination. We just moved scheduling from Cron to daemon classes and pulled task calls there from Rake tasks. So we ended up with a few daemons, each for a group of background tasks (it’s possible to put all the tasks into a single daemon, but this will complicate daemon logic dramatically). We also found it useful to use Cron for restarting deamons a few time per day as a way to fight Ruby memory bloats.
This solution was quite resource-friendly, but brought a lot of drawbacks:

Daemons gem (or its launcher) was quite buggy. For example, it didn’t restart daemons properly: after each restart previous daemons were not unloaded from the memory.
Debugging these daemons is hard as hell!
Sometimes daemons died or become frozen unexpectedly, so we had to use a monitoring tool (God) for checking daemon health and restarting it when needed, instead of general Cron restarts.
It was impossible to check job progress and statistics without implementing additional logic in daemon and/or web app.
The solution was not scalable out of the box, and we had to implement a type of job queue logic if we wanted to distribute daemons into different server nodes.

But still we were a manly men: we struggled, we fought the bugs and found some workarounds, we were even ready to build statistics support and a job queue into our daemons.

Yet, soon our project required a different type of asynchronous tasks — nonscheduled ones. I mean nonrecurring tasks that have to be addressed ASAP. Usually this means slow controller actions that can be offloaded from Rails to speed up request handling, for example actions that require remote API calls for rendering a page part.
That was a turning point, because we decided to stop just being manly men and to start being a little smarter. We had to improve our daemons or pick a solution that didn’t require us to reinvent the wheel, for example, by taking existing background jobs solutions like Workling or Delayed Job.

Resque rescued us from pain

Luckily, Github at about that time released their Resque tool inspired by Delayed Job (but more advanced) and based on Redis database for its job queue. Resque suited our needs very well. We picked it instead of a custom daemon and finally become happy!

So, why do we love resque so much?

It just works!
It uses Redis as the job queue storage, and Redis is blazing fast.
It supports multiple job queues and queue priorities.
Horizontal scaling is a peace of cake: you can just spread the workers over your server nodes.
Resque is lightweight, and the code is extremely well documented. So there is always no mystification about “why it work this way, not that way?” — you can analyze all the Resque source code in less than an hour.
It has a nice rack-compatible application for tracking jobs and workers’ statistics built in. In addition you can just plug in HopToad (using instructions in the Resque source code) to track worker errors.
As previously mentioned, Resque is lightweight and extremely well documented, so it is easy to extend! Just fork it on to GitHub and do whatever you want.

OK, I see. So, what’s bad about Resque?

It uses Redis for the job queue storage. There is no clean way of using Redis as a cluster database at this moment, just a master-slave replication. So Redis may constitute a single point of failure unless you have additional failover logic in your application.
Workers are pure Ruby. So you still need to deal with its monitoring with God, Monit or another monitoring tool.
And as a pure Ruby tool Resque is not quite suited for multi-technology enterprise environments where you’ll prefer setting tasks as huge XML documents. Anyway, you can achieve this weird stuff with some small workarounds … remember? Resque is easy to extend!
It doesn’t have any scheduling mechanism for scheduling task processing for specific times or time periods. Again, you can fix this by using cron and adding a few lines of code to your app. I’ll show you.

How do we use it?

There are two types of tasks settings. The first one is for non-reccuring tasks: workers are using Rails, and tasks are queued directly from the rails app.

The second type of tasks are recurring tasks that need to be scheduled. We use a special Rails controller for putting tasks outside of the main application. Restful HTTP client (launched as a Cron job) enqueues tasks on Resque. However, some people may consider this overengineering, since even a CURL-based client may be enough.
Whenever gem is used for managing Cron schedules of reccuring tasks. [Not sure about the logic of this sentence. The use of Gem triggers Cron scheduling for recurring tasks? Can we clarify? Thanks.]

Resque and PostgreSQL

There is an issue if you use PostgreSQL database and access it via ActiveRecord from Resque workers. It looks like Resque forks its workers and tries to reuse the same PostgreSQL connection, while PostgreSQL does not allow the use of the same connection from different processes. Of course, you can modify the Resque source code or tweak its workers but we found a much easier workaround: PgPool-II for PostgreSQL connection pooling.

What are the alternatives?

Guys at a Ratepoint have much higher loads and require more flexibility. So they implemented their own amqp-consuming daemons based on a Daemon-kit Gem, Nanite and RabbitMQ as a job queue. And I know that they are quite happy with this custom solution.