{"id":1327,"date":"2012-03-13T13:24:09","date_gmt":"2012-03-13T11:24:09","guid":{"rendered":"http:\/\/blog.railsware.com\/?p=1327"},"modified":"2021-08-12T17:07:51","modified_gmt":"2021-08-12T14:07:51","slug":"ruby-2-0-enumerablelazy","status":"publish","type":"post","link":"https:\/\/railsware.com\/blog\/ruby-2-0-enumerablelazy\/","title":{"rendered":"Ruby 2.0 Enumerable::Lazy"},"content":{"rendered":"\n<p>My Enumerable::Lazy patch was accepted into ruby trunk few days ago. So, in ruby 2.0, we can go like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">a = [1,2,3,4,2,5].lazy.map { |x| x * 10 }.select { |x| x > 30 } #=> no evaluation\na.to_a #=> [40, 50], evaluation performed - no intermediate arrays generated.\n<\/pre>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\" id=\"why\">Why?<\/h2>\n\n\n\n<p>Ruby is awesome language, and while being an imperative one, still allows us to write an elegant, functional programming style code. For example:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">data.map(&amp;:split).map(&amp;:reverse)\n<\/pre>\n\n\n\n<p>looks way more readable than this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">data.map { |l| l.split.reverse }\n<\/pre>\n\n\n\n<p>But there&#8217;s a serious performance drawback here in the first case: while maping <code>data<\/code> array twice, unnecessary intermediate array generated. Not a big deal while manipulating tiny arrays, but let&#8217;s say you want to parse a huge text file:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">File.open(\"text\") do |f|\n  f.each.flat_map(&amp;:split).grep(\/ruby\/)\nend\n<\/pre>\n\n\n\n<p>In this case you&#8217;d like to avoid unnecessary memory consumption, and that is when laziness come in handy. Having lazy <code>flat_map<\/code> and <code>grep<\/code> makes possible to perform evaluation of the whole chain only when we want to get an actual result. Moreover, iterating only once over the original data. That&#8217;s the purpose of Lazy enumerator. It overrides several <code>Enumerable<\/code> methods (<code>map<\/code>, <code>select<\/code>, etc) with their lazy analogues.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"update\">UPDATE:<\/h4>\n\n\n\n<p>Now, after Enumerator::Lazy is almost finished I&#8217;ve decided to measure it&#8217;s performance. After running some trivial benchmarks I was very upset &#8211; it&#8217;s almost 4 times (!) slower than the normal arrays are. The point is in blocks that extensively created while chaining enumerator together. See <a href=\"http:\/\/bugs.ruby-lang.org\/issues\/6183\">this bug report and comments for more info<\/a>. In this case the real benefit of Enumerator::Lazy is not as big a it may seem at the very beginning. But take look at this piece of code:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Prime.lazy.select {|x| x % 4 == 3 }.take(10).to_a\n<\/pre>\n\n\n\n<p>instead of this one:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">a = []\nPrime.each do |x|\n  next if x % 4 != 3\n  a &lt;&lt; x\n  break if a.size == 10\nend\n<\/pre>\n\n\n\n<p>Now it totally makes sense &#8211; when evaluation of the whole chain is too expensive or even impossible (infinite sequences) then laziness is a must if we want to keep our code simple and elegant.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"when\">When?<\/h2>\n\n\n\n<p>It&#8217;s really hard to say who was the first to come up with an idea of lazy enumerations for ruby. Probably <a href=\"http:\/\/blade.nagaokaut.ac.jp\/cgi-bin\/scat.rb\/ruby\/ruby-core\/19679\">this post<\/a> back in 2008 is among the ground-breakers. The idea is quite simple and based on fact that Enumerators can be chained. Comments were added to enumerator.c file explaining how laziness can be achieved and since that <a href=\"http:\/\/spin.atomicobject.com\/2011\/03\/31\/enumerating-ruby-lazy-chains\/\">many<\/a> <a href=\"http:\/\/railsillustrated.com\/how-to-be-lazy-in-ruby-and-why-you-should.html\">many<\/a> <a href=\"http:\/\/www.michaelharrison.ws\/weblog\/?p=163\">many<\/a> great articles were published. <a href=\"http:\/\/bugs.ruby-lang.org\/issues\/708\">ruby-lang<\/a> discussion was started more than 3 (!) years ago, and finally <a href=\"http:\/\/bugs.ruby-lang.org\/issues\/708#note-7\">Matz vote<\/a> for <a href=\"http:\/\/bugs.ruby-lang.org\/issues\/4890\">implementation by Yataka Hara<\/a>.<\/p>\n\n\n\n<p><code>Enumerable::lazy<\/code> method were proposed. It returns instance of <code>Enumerable::Lazy<\/code> on the top of the enumerable object, that can be lazy-chained further. C patch was <a href=\"http:\/\/bugs.ruby-lang.org\/issues\/4890#note-18\">requested<\/a> and I found myself challenged to make a <a href=\"https:\/\/github.com\/ruby\/ruby\/pull\/101\">pull request<\/a> (I&#8217;m in to functional programming recently and interested in ruby internals too). The patch was slightly refactored and accepted a few days ago. It&#8217;s landed trunk and will be available since ruby 2.0 (see roadmap).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how\">How?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"enumerator-skip-if-familiar-with\">Enumerator (skip if familiar with)<\/h3>\n\n\n\n<p>Just to give an insight of what Enumerator can do:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># enumerable as enumerator\nenum = [1, 2].each\nputs enum.next #=> 1\nputs enum.next #=> 2\nputs enum.next #=> StopIteration exception raised\n\n# custom enumerable\nenum = Enumerator.new do |yielder|\n  yielder &lt;&lt; 1\n  yielder &lt;&lt; 2 end puts enum.next #=> 1\nputs enum.next #=> 2\nputs enum.next #=> StopIteration exception raised\n\nenum = \"xy\".enum_for(:each_byte)\nenum.each { |b| puts b }\n# => 120\n# => 121\n\no = Object.new\ndef o.each\n  yield\n  yield 'hello'\n  yield [1, 2]\nend\nenum = o.to_enum\np enum.next #=> nil\np enum.next #=> 'hello'\np enum.next #=> [1, 2]\n\n# chaining enumerators\nenum = %w{foo bar baz}.map\nputs enum.with_index { |w, i| \"#{i}:#{w}\" } # => [\"0:foo\", \"1:bar\", \"2:baz\"]\n\n# protect an array from being modified by some_method\na = [1, 2, 3]\nsome_method(a.enum_for)\n\n# how about this one\n[1,2,3].cycle.take(10) #=> [1, 2, 3, 1, 2, 3, 1, 2, 3, 1]\n<\/pre>\n\n\n\n<p>As you might already noticed <code>#to_enum<\/code> and <code>#enum_for<\/code> are <code>Kernel<\/code> module methods, thus available for any object. Examples are taken from enumerator.c directly, you can find more if you want, also check test\/ruby\/test_enumerator.rb. Well, Enumerator internals probably deserve a separate blog post, but worth to know it&#8217;s ruby fibers that makes all this &#8216;next&#8217; magic possible.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lazy-enumerator\">Lazy enumerator<\/h3>\n\n\n\n<p>To understand how Enumerable::Lazy works just check this out:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">module Enumerable\n  class Lazy &lt; Enumerator\n\n    def initialize(obj)\n      super() do |yielder|\n        obj.each do |val|\n          if block_given?\n            yield(yielder, val)\n          else\n            yielder &lt;&lt; val\n          end\n        end\n      end\n    end\n\n    def map\n      Lazy.new(self) do |yielder, val|\n        yielder &lt;&lt; yield(val) end end end end a = Enumerable::Lazy.new([1,2,3]) a = a.map { |x| x * 10 }.map { |x| x - 1 } puts a.next #=> 9\nputs a.next #=> 19\n<\/pre>\n\n\n\n<p>There&#8217;s nothing new here &#8211; it&#8217;s a typical lazy enumerator ruby implementation that can be googled in a second. Same as provided by <a href=\"https:\/\/github.com\/yhara\/enumerable-lazy\/blob\/master\/lib\/enumerable\/lazy.rb\">Yutaka<\/a>. But did you notice &#8211; I&#8217;m not using <code>&amp;block<\/code> as a parameter (to call it as a proc inside each block) here but yielding directly instead. I love this hidden ruby power &#8211; you can <code>yield<\/code> inside another block! <code>block_given?<\/code> works as expected too. Moreover, you can call <code>self<\/code> inside another block or make a return from the function. Awesome &#8211; we are lucky guys here :-). See Yehuda Katz <a href=\"http:\/\/yehudakatz.com\/2010\/02\/07\/the-building-blocks-of-ruby\/\">post<\/a> (<a href=\"http:\/\/yehudakatz.com\/2012\/01\/10\/javascript-needs-blocks\/\">another one<\/a>) to have a better feeling.<\/p>\n\n\n\n<p>The code is self-explanatory, but let&#8217;s make it crystal-clear:<br>the basic idea is to chain enumerators &#8211; rather than perform evaluation directly, map returns another <code>Enumerable::Lazy<\/code>, having previous one as an argument. And only when we need to get an actual result (by calling <code>to_a<\/code>, <code>next<\/code>, <code>each<\/code> with block, <code>take<\/code>, etc) evaluation performed. To get next value ruby climbs back over this enumerators chain finally getting next value from the actual enumerable (Fig.1). Then this value pops-up back to you, while modified with blocks along the way (in the same order they were applied).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"c-patch\">C patch<\/h3>\n\n\n\n<p>The C <a href=\"https:\/\/github.com\/ruby\/ruby\/pull\/101\">patch<\/a> &#8211; mimics ruby code example. Except the fact that rather than calling <code>super<\/code> inside <code>lazy_initialize<\/code>,<br>I&#8217;m allocating generator with a block and then calling <code>enumerator_init<\/code> passing this new generator as an argument.<\/p>\n\n\n\n<p>In final patch <a href=\"https:\/\/github.com\/nobu\">nobu<\/a> refactored code a little bit &#8211; instead of having if-else condition inside a block, he extracted two methods (<code>lazy_init_block_i<\/code> and <code>lazy_init_block<\/code>) and moved if-else into <code>lazy_initialize<\/code> directly. Also, I was passing a ruby array as a block parameter, but it&#8217;s better to construct and pass a simple C array. Thus, no need to use <code>rb_ary_entry<\/code> to get yielder and value inside a block, like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">static VALUE lazy_map_func(VALUE val, VALUE m, int argc, VALUE *argv)\n{\n    VALUE result = rb_yield_values2(argc - 1, &amp;argv[1]);\n\n    return rb_funcall(argv[0], id_yield, 1, result);\n}\n<\/pre>\n\n\n\n<p>instead of this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">static VALUE lazy_map_func(VALUE val, VALUE m, int argc, VALUE *argv)\n{\n    VALUE result = rb_funcall(rb_block_proc(), id_call, 1,\n    rb_ary_entry(val, 1));\n\n    return rb_funcall(rb_ary_entry(val, 0), id_yield, 1, result);\n}\n<\/pre>\n\n\n\n<p>Another lesson for me to learn from ruby core guy. Frankly speaking, I was a total newbie in ruby patching. So it took me two weekends to come up with the (fairly trivial) pull request. First weekend I came up with another <a href=\"https:\/\/github.com\/ruby\/ruby\/pull\/100\/files\">crazy pull request<\/a> &#8211; I was storing all blocks as procs inside enumerator itself. And when next value (using <code>Enumerable#next<\/code>) is requested &#8211; all blocks are applied one by one. Lazy map and select were working great, but when trying to adjust Enumerator#each I realized that it&#8217;s a road to nowhere (is it?).<\/p>\n\n\n\n<p>Well, you are a tough guy if you made this far, so If you are planning to start patching there are <a href=\"http:\/\/www.ruby-doc.org\/docs\/ProgrammingRuby\/html\/ext_ruby.html\">plenty<\/a> of great <a href=\"http:\/\/clalance.blogspot.com\/2011\/01\/writing-ruby-extensions-in-c-part-1.html\">articles<\/a> for you. Also, bonus <a href=\"http:\/\/railsillustrated.com\/how-to-be-lazy-in-ruby-and-why-you-should.html\">article<\/a> showcasing why we should be lazy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"conclusion\">Conclusion<\/h2>\n\n\n\n<p>We have 5 lazy methods so far &#8211; <code>select<\/code>, <code>map<\/code>, <code>reject<\/code>, <code>grep<\/code> (added by first patch) and <code>flat_map<\/code> (<a href=\"https:\/\/github.com\/ruby\/ruby\/commit\/a21d0f72c280a3effe15554279ae006918cd97ce\">added<\/a> later on by <a href=\"https:\/\/github.com\/shugo\">shugo<\/a>). Additionally &#8211; rather than doing <code>Enumerable::Lazy.new([1,2,3,4])<\/code> you can use handy shortcut <code>[1,2,3,4,5].lazy<\/code>. If you want to get your hands on &#8211; just compile ruby trunk and feel free to play.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"update-1\">UPDATE:<\/h3>\n\n\n\n<p>A lot of commits was made into Lazy enumerator during this week. In particular:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Nesting changed: Enumerable::Lazy is now Enumerator::Lazy<\/li><li>Additional lazy methods added, here&#8217;s full list so far: <code>map<\/code>, <code>flat_map<\/code>, <code>select<\/code>, <code>reject<\/code>, <code>grep<\/code>, <code>zip<\/code>, <code>take<\/code>, <code>take_while<\/code>, <code>drop<\/code>, <code>drop_while<\/code>, <code>cycle<\/code><\/li><li>Enumerator::Lazy#lazy method added &#8211; just returns self<\/li><li>Enumerator::Lazy#force as alias to <code>to_a<\/code><\/li><\/ul>\n\n\n\n<p>Stay tuned!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My Enumerable::Lazy patch was accepted into ruby trunk few days ago. So, in ruby 2.0, we can go like this:<\/p>\n","protected":false},"author":34,"featured_media":9464,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[3],"tags":[],"coauthors":["Innokenty Mihailov"],"class_list":["post-1327","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development"],"acf":[],"aioseo_notices":[],"categories_data":[{"name":"Engineering","link":"https:\/\/railsware.com\/blog?category=development"}],"post_thumbnails":"https:\/\/railsware.com\/blog\/wp-content\/themes\/railsware\/vendors\/images\/article-thumbnail-default.jpg","amp_enabled":true,"_links":{"self":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/1327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/users\/34"}],"replies":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/comments?post=1327"}],"version-history":[{"count":53,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/1327\/revisions"}],"predecessor-version":[{"id":13996,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/posts\/1327\/revisions\/13996"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/media\/9464"}],"wp:attachment":[{"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/media?parent=1327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/categories?post=1327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/tags?post=1327"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/railsware.com\/blog\/wp-json\/wp\/v2\/coauthors?post=1327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}