URL shortening. How we do it.

The growing demand to share content through the different social networks has rapidly led us to the need to shorten URLs in order to free space within a post (Twitter is a good example here) and it actually looks good.

We didn’t want to reinvent the wheel – so we have been using popular external services for this purpose.
However, by producing a somewhat heavy load on the service, we’ve run into some well known problems: request limit per hour, external service timeouts, service downtime, and other nasty things.

The shortening service was used as a part of a few business tools.
Shortening was done both in the background, meaning it had to be highly reliable, and in the foreground, necessitating speed of process.
Also, when shortening was finished, the new, truncated URLs had to be accessible for the consumers.

From time to time, we encountered all of the issues, resulting in slow, unreliable, inaccessible shortened URLs.

So, we had to do something.
The solution should meet the following requirements:

1) In order to maintain high accessibility, the new method would require the ability to do a full fallback, just in case the main service went down completely;

2) There would have to be a visual consistency in the shortened URLs; and

3) It must be easy to develop, cheap to host and take zero support efforts.

The popular shortener service was set to use a custom domain xxx.aa, so that we have control over the accessibility.
The xxx.aa DNS TTL would be set to 5 minutes, so we can switch it any time.

A very simple backup shortener service was to be developed and hosted on a xxx.bb domain for visual similarity.
It wouldn’t contain any extra functionality: just shorten a long url and follow a shortened url.

A shortener client was to be developed to automatically balance between services, when the url creation wouldn’t work.
First, it would try the external service and if it doesn’t work – would switch to the backup service.
For a quick response time, it would perform just one shortening try when called from the foreground, but would perform multiple tries, with incremental back-off, from the background, to use as much external service shortening as possible.

The shortener client would also contain a caching mechanism to reduce duplicated requests to the external service, thus saving limits usage.

We would monitor the external service and in case it goes down, the xxx.aa domain had to be switched to the backup shortener service.
The backup shortener service would use the cached data to serve all the external services shortened urls, for the accessibility purpose.

We rolled our sleeves and got to work.

Backup shortener service

To make sure that both urls are visually similar, we were running a custom domain (xxx.aa) on one of the external shortener while running our own backup shortener on xxx.bb. So when the backup model took place – the difference was not really noticeable.

Results caching and the external service “bad” days

Initially memcached was used for caching.

Several hardware upgrades helped us to switch to Redis quickly.
That’s because the second one has permanent storage feature while the first one “forgets” everything after server restart.
This switch helps us to keep external service away from overloads produced by us in such cases.

On the “bad” days, or just times when we had a big volume of urls to shorten, we had near 50/50 conversions between external service and our backup.
But overall, the backup service was only used in less than 1% cases.

Backup shortener service architecture

Architecture of our solution is as simple as possible. At the time of its creation we’ve used Rails 2 and Redis as a storage of short URLs.

We have two controllers – shortener and redirector.

Shortener takes incoming parameters, parses them and tries to create short URL.
We used a simple hashing algorithm (see below).
Output is given in JSON format including status code, message, if needed, and a bunch of other stuff which includes the shortened url itself.

Redirector runs as Rails Metal application. All it does is searching for short url hash taken from incoming parameters and sending a redirect request to the original URL. Nothing more, nothing less.

Shortener client

The client was created as a gem, so it could be reused in multiple projects.
The minor balancing was done automatically from the code using something like this:

 services.each_with_index do |service, i| next_service = services[i+1] begin return try_get_service_short_url(long_url, service, options) rescue ShortenerOutOfTriesError => e # out of tries can be logged in a file here end end def try_get_service_short_url(long_url, service, options) options = {:incremental_backoff => false, :retry_count => 3, :timeout => config.timeout}.merge(options) timeout = options[:timeout].to_i retry_count = options[:retry_count] incremental_backoff = options[:incremental_backoff] tries = 0 errors = [] begin return get_service_short_url(long_url, service, timeout) rescue ShortenerValidationError => e # give long url back if validation failed raise ShortenerValidationError, e.message if e.instance_of?(ShortenerValidationError) rescue Exception => e # log raise ShortenerOutOfTriesError.new(e.message, errors) if e.instance_of?(ShortenerHourlyLimitError) raise ShortenerOutOfTriesError.new("Retries count exceeded (#{retry_count})", errors) if (tries += 1) > retry_count sleep(2**tries) if tries > 1 && incremental_backoff retry end end

The get_service_short_url method implements remote requests to each service with response parsing.

Hashing algorithm

For consistency purpose we would use the popular Base62 encoded string of 6 characters length.

After trying several approaches we came to the current one finally. It is based on using MD5 hashing of incoming string.

 url36 = Digest::MD5.hexdigest(long_url+"in salt we trust").slice(0..6)

Here we’re taking string, adding salt, converting it to MD5 hash and taking the first 7 chars.
Why just 7, you’ll wonder? It’s simple: 7 chars of Base62 is enough for 6 chars of Base36. 36^7 > 62^6.

Sample code to convert Base36 to Base10 and then to Base62

 base62 = ['0'..'9','A'..'Z','a'..'z'].map{|a| a.to_a}.flatten base36 = {};['0'..'9','a'..'z'].map{|range| range.to_a}.flatten.each_with_index{|char, position| base36[char] = position} url10 = 0; url62 = "" # convert to base10 url36.reverse.chars.to_a.each_with_index { |c,i| url10 += base36[c] * (36 ** i)} # convert to base62 6.times{|i| url62 << base62[url10 % 62]; url10 = url10 / 62}

Great!
But what if a produced result is already used by previous conversion? You’ll have to repeat the process.

We’ve decided to add trailing “_” sign to the original URL for the conversion purposes to produce different result.
The problem will occur when most of Base62 variants will be used, so you’ll have to be well prepared to deal with eternal conversion cycle :), but as it’s a backup service and is used only for less than 1% cases – it will take a while.

Conclusion

After implementing all of the above – we’ve never had any issues.
At first, we have also implemented some monitoring to see how often would the balancing take place, checking few cases a week if it works.
But finally, it all got stable and we never look into the solution again.

By the time the external service got more stable and don’t fall down as often as they did earlier, especially with the caching solution in place.

But if it do fall, our balancing and the backup service is right in the place doing it’s job.