When we started using Resque two years ago we were impressed by two things: its power out-of-the-box and its opportunities for scalability. Over the past two years, we’ve explored Resque internals and plugins. We’d like to share what we’ve learned from our practical experience using Resque during different phases of the web application life cycle.
Working in development and running tests
The first challenge we encountered was deciding how to organize development for non-Ruby developers and test environments. Our aims were to keep most of the team free from knowledge about Resque and Workers and to avoid stubs in tests. This idea resulted in the inline mode for Resque that solved the problem perfectly.
First step in production: ActiveRecord and Resque
Since there are many articles covering how to deploy Resque, we’ll focus on issues that haven’t been thoroughly described before.
After two weeks in production it was clear to us that there was an issue with Resque and ActiveRecord. In some cases you may enqueue a Resque job while inside a database transaction, but Redis commands are independent from database transactions. Sometimes a worker starts processing a job before the transaction that creates the specific job commits. After a few ugly solutions that forced us to restructure the code, we discovered what we needed in the ar_after_transaction gem. This Resque FAQ details how to make a Resque job wait for an ActiveRecord transaction commit, so that it can see all the changes made by that transaction. Of course, if you ensure database transactions are committed prior to enqueuing jobs, you can structure your application in any manner you desire.
Second step in production: Outer HTTP APIs with Resque
External HTTP calls are a common bottleneck for web requests and need to be moved to the background because of unpredictable response time and downtime for these APIs. You may find the resque-retry plugin (and resque-scheduler plugin as a dependency) useful, allowing you to retry exceptions in workers with a customizable delay.
Here are some common HTTP errors in the “just try again to fix” category:
@retry_exceptions = [
# errors from your favorite
# Net::HTTP wrapping library goes here
N.B. Errno codes are platform-specific, make sure you understand how portable your code needs to be.
Third step in production: Email sending
If you are using an external SMTP server to send email, you will need to move the email delivery to the background — with Resque’s help, of course. There are number of solutions available, such as ar_mailer. We decided to use resque_mailer. We encountered an initial problem with
Timeout::Error exceptions that appeared randomly while sending email. We found resque-retry is also useful here. In the case of resque_mailer we wanted to have shared configuration for resque-retry for every Mailer class. We found that this was not easy because historically Resque is configured through instance variables in a class that are not inherited. We needed a base class that could share all instance variables across any child class:
class AsyncApplicationMailer < ActionMailer::Base
# All Notifiers inherited from this class
# require same resque-retry options.
# Resque workers are classes but not instances of classes.
# That is why resque retry require class variables that is not inherited
# In order to setup same resque-retry class variables
# for every inherited class we need this hack.
@retry_exceptions = [Net::SMTPServerBusy, Timeout::Error, Resque::DirtyExit]
@retry_limit = 3
@retry_delay = 60 #seconds
Use this class as the base class for all your mailers and retry configuration will be shared among them.
Play Minesweeper: Bug fixing along the way
As the number of users and load grew, we decided it was a good idea to include other plugins like resque-loner (to track job uniqueness) and resque-cleaner (to cleanup failed jobs). This required fixing and improving these libraries:
- Fix resque-scheduler process death after pushing invalid resque job class
- Fix infinite recursion in edge case usage of ar_after_transaction
- Make resque-mailer respect the
- Fix resque-retry suppression in resque failures for jobs with custom identifier
- Modulize resque-loner to be compatible with other plugins
Business requirements increase: Returning results from jobs
The original Resque design does not allow you to receive something back after the worker completes. This may be beneficial for most use cases. However, for our use case (payment checkout through an outer Authorization gateway) it was important to know whether the worker was in progress or not, and if not – whether it was successful or not.
Many Resque plugins introduce a job identifier based on arguments passed to this job, but there is no standardization regarding how it should be done. Here are two examples:
# @abstract You may override to implement a custom identifier,
# you should consider doing this if your job arguments
# are many/long or may not cleanly cleanly to strings.
# Builds an identifier using the job arguments. This identifier
# is used as part of the redis key.
# Payload is what Resque stored for this job along with the job's class name.
# On a Resque with no plugins installed, this is a hash containing :class and :args
In order to synchronize a job identifier across plugins, we implemented our own interface for jobs with completion status. This can be helpful for people that need something like this. However, don’t confuse this with execution status in resque-status.
Contribution to open source
Last but not least, thank you to these people responsible for supporting our patches:
- @bvandenbos/resque-scheduler: Merged. Thanks. Will ship in 1.9.8.
- @defunkt/resque: Love it. … This is a great patch – docs, tests, and code! Thanks.
- @zapnap/resque_mailer: Good point. I just pushed a change to the repository that should take care of this and released a new gem (1.0.1)
- @jayniz/resque-loner: Awesome, thanks! Will pull ASAP :)