{"id":246,"date":"2011-03-30T18:57:17","date_gmt":"2011-03-30T16:57:17","guid":{"rendered":"http:\/\/blog.railsware.com\/?p=246"},"modified":"2021-08-12T17:14:03","modified_gmt":"2021-08-12T14:14:03","slug":"url-shortening-how-we-do-it-2","status":"publish","type":"post","link":"https:\/\/railsware.com\/blog\/url-shortening-how-we-do-it-2\/","title":{"rendered":"URL shortening. How we do it."},"content":{"rendered":"\n<p>The growing demand to share content through the different social networks has rapidly led us to the need to shorten URLs in order to free space within a post (Twitter is a good example here) and it actually looks good.<\/p>\n\n\n\n<p>We didn\u2019t want to reinvent the wheel \u2013 so we have been using popular external services for this purpose.<br>\nHowever, by producing a somewhat heavy load on the service, we\u2019ve run into some well known problems: request limit per hour, external service timeouts, service downtime, and other nasty things.<\/p>\n\n\n\n<p>The shortening service was used as a part of a few business tools.<br>\nShortening was done both in the background, meaning it had to be highly reliable, and in the foreground, necessitating speed of process.<br>\nAlso, when shortening was finished, the new, truncated URLs had to be accessible for the consumers.<\/p>\n\n\n\n<p>From time to time, we encountered all of the issues, resulting in slow, unreliable, inaccessible shortened URLs.<\/p>\n\n\n\n<p>So, we had to do something.<br>\nThe solution should meet the following requirements:<\/p>\n\n\n\n<p>1) In order to maintain high accessibility, the new method would require the ability to do a full fallback, just in case the main service went down completely;<\/p>\n\n\n\n<p>2) There would have to be a visual consistency in the shortened URLs; and<\/p>\n\n\n\n<p>3) It must be easy to develop, cheap to host and take zero support efforts.<\/p>\n\n\n\n<p>The popular shortener service was set to use a <strong>custom domain<\/strong> xxx.aa, so that we have control over the accessibility.<br>\nThe xxx.aa DNS TTL would be set to 5 minutes, so we can switch it any time.<\/p>\n\n\n\n<p>A very simple <strong>backup shortener service<\/strong> was to be developed and hosted on a xxx.bb domain for visual similarity.<br>\nIt wouldn&#8217;t contain any extra functionality: just shorten a long url and follow a shortened url.<\/p>\n\n\n\n<p>A <strong>shortener client<\/strong> was to be developed to automatically balance between services, when the url creation wouldn&#8217;t work.<br>\nFirst, it would try the external service and if it doesn&#8217;t work &#8211; would switch to the backup service.<br>\nFor a quick response time, it would perform just one shortening try when called from the foreground, but would perform multiple tries, with incremental back-off, from the background, to use as much external service shortening as possible.<\/p>\n\n\n\n<p>The <strong>shortener client<\/strong> would also contain a caching mechanism to reduce duplicated requests to the external service, thus saving limits usage.<\/p>\n\n\n\n<p>We would monitor the external service and in case it goes down, the xxx.aa domain had to be switched to the <strong>backup shortener service<\/strong>.<br>\nThe <strong>backup shortener service<\/strong> would use the cached data to serve all the external services shortened urls, for the accessibility purpose.<\/p>\n\n\n\n<div id=\"_mcePaste\">We rolled our sleeves and got to work.<\/div>\n\n\n\n<p><strong>Backup shortener service<\/strong><\/p>\n\n\n\n<p>To make sure that both urls are visually similar, we were running a custom domain (xxx.aa) on one of the external shortener while running our own backup shortener on xxx.bb. So when the backup model took place &#8211; the difference was not really noticeable.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Results caching and the external service &#8220;bad&#8221; days<\/h2>\n\n\n\n<p>Initially memcached was used for caching.<\/p>\n\n\n\n<p>Several hardware upgrades helped us to switch to Redis quickly.<br>\nThat&#8217;s because the second one has permanent storage feature while the first one &#8220;forgets&#8221; everything after server restart.<br>\nThis switch helps us to keep external service away from overloads produced by us in such cases.<\/p>\n\n\n\n<p>On the &#8220;bad&#8221; days, or just times when we had a big volume of urls to shorten, we had near 50\/50 conversions between external service and our backup.<br>\nBut overall, the backup service was only used in less than 1% cases.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Backup shortener service architecture<\/h2>\n\n\n\n<p>Architecture of our solution is as simple as possible. At the time of its creation we&#8217;ve used Rails 2 and Redis as a storage of short URLs.<\/p>\n\n\n\n<p>We have two controllers &#8211; <em>shortener<\/em> and <em>redirector<\/em>.<\/p>\n\n\n\n<p><em>Shortener<\/em> takes incoming parameters, parses them and tries to create short URL.<br>\nWe used a simple hashing algorithm (see below).<br>\nOutput is given in JSON format including status code, message, if needed, and a bunch of other stuff which includes the shortened url itself.<\/p>\n\n\n\n<p><em>Redirector <\/em>runs as Rails Metal application. All it does is searching for short url hash taken from incoming parameters and sending a redirect request to the original URL. Nothing more, nothing less.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Shortener client<\/h2>\n\n\n\n<p>The client was created as a gem, so it could be reused in multiple projects.<br>\nThe minor balancing was done automatically from the code using something like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"> services.each_with_index do |service, i| next_service = services[i+1] begin return try_get_service_short_url(long_url, service, options) rescue ShortenerOutOfTriesError => e # out of tries can be logged in a file here end end def try_get_service_short_url(long_url, service, options) options = {:incremental_backoff => false, :retry_count => 3, :timeout => config.timeout}.merge(options) timeout = options[:timeout].to_i retry_count = options[:retry_count] incremental_backoff = options[:incremental_backoff] tries = 0 errors = [] begin return get_service_short_url(long_url, service, timeout) rescue ShortenerValidationError => e # give long url back if validation failed raise ShortenerValidationError, e.message if e.instance_of?(ShortenerValidationError) rescue Exception => e # log raise ShortenerOutOfTriesError.new(e.message, errors) if e.instance_of?(ShortenerHourlyLimitError) raise ShortenerOutOfTriesError.new(\"Retries count exceeded (#{retry_count})\", errors) if (tries += 1) > retry_count sleep(2**tries) if tries > 1 &amp;&amp; incremental_backoff retry end end <\/pre>\n\n\n\n<p>The get_service_short_url method implements remote requests to each service with response parsing.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Hashing algorithm<\/h2>\n\n\n\n<p>For consistency purpose we would use the popular Base62 encoded string of 6 characters length.<\/p>\n\n\n\n<p>After trying several approaches we came to the current one finally. It is based on using MD5 hashing of incoming string.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"> url36 = Digest::MD5.hexdigest(long_url+\"in salt we trust\").slice(0..6) <\/pre>\n\n\n\n<p>Here we&#8217;re taking string, adding salt, converting it to MD5 hash and taking the first 7 chars.<br>\nWhy just 7, you&#8217;ll wonder? It&#8217;s simple: 7 chars of Base62 is enough for 6 chars of Base36. 36^7 &gt; 62^6.<\/p>\n\n\n\n<p>Sample code to convert Base36 to Base10 and then to Base62<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"> base62 = ['0'..'9','A'..'Z','a'..'z'].map{|a| a.to_a}.flatten base36 = {};['0'..'9','a'..'z'].map{|range| range.to_a}.flatten.each_with_index{|char, position| base36[char] = position} url10 = 0; url62 = \"\" # convert to base10 url36.reverse.chars.to_a.each_with_index { |c,i| url10 += base36[c] * (36 ** i)} # convert to base62 6.times{|i| url62 &lt;&lt; base62[url10 % 62]; url10 = url10 \/ 62} <\/pre>\n\n\n\n<p>Great!<br>\nBut what if a produced result is already used by previous conversion? You&#8217;ll have to repeat the process.<\/p>\n\n\n\n<p>We&#8217;ve decided to add trailing &#8220;_&#8221; sign to the original URL for the conversion purposes to produce different result.<br>\nThe problem will occur when most of Base62 variants will be used, so you&#8217;ll have to be well prepared to deal with eternal conversion cycle :), but as it&#8217;s a backup service and is used only for less than 1% cases &#8211; it will take a while.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>After implementing all of the above &#8211; we&#8217;ve never had any issues.<br>\nAt first, we have also implemented some monitoring to see how often would the balancing take place, checking few cases a week if it works.<br>\nBut finally, it all got stable and we never look into the solution again.<\/p>\n\n\n\n<p>By the time the external service got more stable and don&#8217;t fall down as often as they did earlier, especially with the caching solution in place.<\/p>\n\n\n\n<p>But if it do fall, our balancing and the backup service is right in the place doing it&#8217;s job.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The growing demand to share content through the different social networks has rapidly led us to the need to shorten URLs in order to free space within a post (Twitter is a good example here) and it actually looks good. We didn\u2019t want to reinvent the wheel \u2013 so we have been using popular external&#8230;<\/p>\n","protected":false},"author":20,"featured_media":9469,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[3],"tags":[],"coauthors":["Igor Antonyuk"],"class_list":["post-246","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development"],"acf":[],"aioseo_notices":[],"categories_data":[{"name":"Engineering","link":"https:\/\/railsware.com\/blog?category=development"}],"post_thumbnails":"https:\/\/railsware.com\/blog\/wp-content\/themes\/railsware\/vendors\/images\/article-thumbnail-default.jpg","amp_enabled":true,"_links":{"self":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/246","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/comments?post=246"}],"version-history":[{"count":31,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/246\/revisions"}],"predecessor-version":[{"id":14001,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/246\/revisions\/14001"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/media\/9469"}],"wp:attachment":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/media?parent=246"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/categories?post=246"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/tags?post=246"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/coauthors?post=246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}