ruby

Ruby 2.0 Enumerable::Lazy

| 21 Comments

My Enumerable::Lazy patch was accepted into ruby trunk few days ago. So, in ruby 2.0, we can go like this:

Why?

Ruby is awesome language, and while being an imperative one, still allows us to write an elegant, functional programming style code. For example:

looks way more readable than this:

But there’s a serious performance drawback here in the first case: while maping data array twice, unnecessary intermediate array generated. Not a big deal while manipulating tiny arrays, but let’s say you want to parse a huge text file:

In this case you’d like to avoid unnecessary memory consumption, and that is when laziness come in handy. Having lazy flat_map and grep makes possible to perform evaluation of the whole chain only when we want to get an actual result. Moreover, iterating only once over the original data. That’s the purpose of Lazy enumerator. It overrides several Enumerable methods (map, select, etc) with their lazy analogues.

UPDATE:

Now, after Enumerator::Lazy is almost finished I’ve decided to measure it’s performance. After running some trivial benchmarks I was very upset – it’s almost 4 times (!) slower than the normal arrays are. The point is in blocks that extensively created while chaining enumerator together. See this bug report and comments for more info. In this case the real benefit of Enumerator::Lazy is not as big a it may seem at the very beginning. But take look at this piece of code:

instead of this one:

Now it totally makes sense – when evaluation of the whole chain is too expensive or even impossible (infinite sequences) then laziness is a must if we want to keep our code simple and elegant.

When?

It’s really hard to say who was the first to come up with an idea of lazy enumerations for ruby. Probably this post back in 2008 is among the ground-breakers. The idea is quite simple and based on fact that Enumerators can be chained. Comments were added to enumerator.c file explaining how laziness can be achieved and since that many many many great articles were published. ruby-lang discussion was started more than 3 (!) years ago, and finally Matz vote for implementation by Yataka Hara.

Enumerable::lazy method were proposed. It returns instance of Enumerable::Lazy on the top of the enumerable object, that can be lazy-chained further. C patch was requested and I found myself challenged to make a pull request (I’m in to functional programming recently and interested in ruby internals too). The patch was slightly refactored and accepted a few days ago. It’s landed trunk and will be available since ruby 2.0 (see roadmap).

How?

Enumerator (skip if familiar with)

Just to give an insight of what Enumerator can do:

As you might already noticed #to_enum and #enum_for are Kernel module methods, thus available for any object. Examples are taken from enumerator.c directly, you can find more if you want, also check test/ruby/test_enumerator.rb. Well, Enumerator internals probably deserve a separate blog post, but worth to know it’s ruby fibers that makes all this ‘next’ magic possible.

Lazy enumerator

To understand how Enumerable::Lazy works just check this out:

There’s nothing new here – it’s a typical lazy enumerator ruby implementation that can be googled in a second. Same as provided by Yutaka. But did you notice – I’m not using &block as a parameter (to call it as a proc inside each block) here but yielding directly instead. I love this hidden ruby power – you can yield inside another block! block_given? works as expected too. Moreover, you can call self inside another block or make a return from the function. Awesome – we are lucky guys here :-). See Yehuda Katz post (another one) to have a better feeling.

The code is self-explanatory, but let’s make it crystal-clear:
the basic idea is to chain enumerators – rather than perform evaluation directly, map returns another Enumerable::Lazy, having previous one as an argument. And only when we need to get an actual result (by calling to_a, next, each with block, take, etc) evaluation performed. To get next value ruby climbs back over this enumerators chain finally getting next value from the actual enumerable (Fig.1). Then this value pops-up back to you, while modified with blocks along the way (in the same order they were applied).



Fig.1 Enumerable::Lazy chain

C patch

The C patch – mimics ruby code example. Except the fact that rather than calling super inside lazy_initialize,
I’m allocating generator with a block and then calling enumerator_init passing this new generator as an argument.

In final patch nobu refactored code a little bit – instead of having if-else condition inside a block, he extracted two methods (lazy_init_block_i and lazy_init_block) and moved if-else into lazy_initialize directly. Also, I was passing a ruby array as a block parameter, but it’s better to construct and pass a simple C array. Thus, no need to use rb_ary_entry to get yielder and value inside a block, like this:

instead of this:

Another lesson for me to learn from ruby core guy. Frankly speaking, I was a total newbie in ruby patching. So it took me two weekends to come up with the (fairly trivial) pull request. First weekend I came up with another crazy pull request – I was storing all blocks as procs inside enumerator itself. And when next value (using Enumerable#next) is requested – all blocks are applied one by one. Lazy map and select were working great, but when trying to adjust Enumerator#each I realized that it’s a road to nowhere (is it?).

Well, you are a tough guy if you made this far, so If you are planning to start patching there are plenty of great articles for you. Also, bonus article showcasing why we should be lazy.

Conclusion

We have 5 lazy methods so far – select, map, reject, grep (added by first patch) and flat_map (added later on by shugo). Additionally – rather than doing Enumerable::Lazy.new([1,2,3,4]) you can use handy shortcut [1,2,3,4,5].lazy. If you want to get your hands on – just compile ruby trunk and feel free to play.

UPDATE:

A lot of commits was made into Lazy enumerator during this week. In particular:

  • Nesting changed: Enumerable::Lazy is now Enumerator::Lazy
  • Additional lazy methods added, here’s full list so far: map, flat_map, select, reject, grep, zip, take, take_while, drop, drop_while, cycle
  • Enumerator::Lazy#lazy method added – just returns self
  • Enumerator::Lazy#force as alias to to_a

Stay tuned!

Share
* Railsware is a premium software development consulting company, focused on delivering great web and mobile applications. Learn more about us.
  • http://twitter.com/webdirect_srl WEB DIRECT

    Nice job guys!

  • http://twitter.com/rwz Pavel Pravosud

    errr.. Is there actually any reason not to use lazy by default?

    • http://mike-burns.com/ Mike Burns

      Lazy evaluation is a nightmare when it comes to performance debugging, especially when it comes to memory consumption.

    • Innokenty Mihailov

      There was a discussion on ruby-lang.org about the defaults. The key point stated by Matz is compatibility.
      See this for more info http://bugs.ruby-lang.org/issues/4890#note-25

  • http://github.com/dkubb Dan Kubb

    I’m considering sending a pull request to the backports gem so we can have this feature in 1.8 and 1.9: https://github.com/marcandre/backports/issues/51 .. I don’t know if it’ll be accepted, but I’m pretty optimistic since it does backport lots of other 1.9 Enumerable features.

    If anyone has any feedback or suggestions on the code spike I’ve written I’d be happy to hear it.

  • Pingback: Making Bioruby lazy with Enumerators « Biolateral

  • trans

    Nice article. Very respectably done too by giving nods to history of development. Well done.

    Oh, definitely add take method –should be an easy one.

    • Innokenty Mihailov

      Thanks, the history is really an interesting and important part of the story for sure!
      take is already added along with another lazy methods.
      Guys (shugo, nobu) are working hard now on Enumerator::Lazy improvements. Can’t wait for ruby 2.0 :-)

  • http://eregon.myopenid.com/ Benoit Daloze

    Nice article!

    I would just add a comment about performance: lazy is slower (about 10 times) if you take all the result (but that’s a wrong use).

    I found out that #take is now defined and lazy for Enumerator::Lazy. I can’t find the reason, and it means you need to ary.lazy.select { … }.take(10).to_a to get the resulting Array. Does it make sense to you?

    • Innokenty Mihailov

      Thanks!
      At first sight it really looks handy to have take(n) (drop(n)) evaluate directly, however there are use cases when you still want to continue building the chain.Let’s say lazy behavior is default – then we expect each and every method to be lazy.
      Unfortunately – compatibility is in the first place so we’ll have to call to_a.
      I’ve added ‘Update’ section – more lazy methods were added during past few days.

  • http://freemusicformormons.com/lds-ward-choir-music roger pack

    Is there a gem to bring in the functionality for 1.9/1.8 users? Also does it still need to create a fiber internally and use that? http://en.wikibooks.org/w/index.php?title=Ruby_Programming/Reference/Objects/Enumerable&stable=0#3._As_an_external_iterator

    • http://twitter.com/woollyams woollyams

      Roger, my “enumerating” gem provides similar functionality, and works in 1.8/1.9. The API is different, though.

      Other options include “lazing”, and “Enumerable#defer” from the “facets” gem. No doubt there are others, too – as Innokenty mentions, it’s not a new idea :-)

    • http://github.com/dkubb Dan Kubb

      I’m working on some patches to backports to bring this to 1.8/1.9 users https://github.com/marcandre/backports/issues/51

      The process with backports is you make use there is rubyspec coverage for the feature (which there isn’t, but I’m working on it), and then you submit the patch to backports. I should have something within the next few days though.

  • bwv549

    I’ve found myself wanting to use something like this for some time now.  The solution is elegant and straightforward.  Very nicely done.

  • Pingback: 046 RR Objects in Rails Part 2

  • Innokenty Mihailov

    I did a series of bench-marking yesterday too (see Update section in Why? block for more info). The performance is really poor. It’s not the fibers but blocks that called while evaluating the actual result (when calling to_a, each, etc).
    Fibers are used to support #next method magic.

  • Pingback: NoName Podcast S04E05 | crowler-pcworld

  • Pingback: telepong » Ruby 2.0.0 preview1

  • Pingback: Ruby 2.0.0 preview1 - Linux в Беларуси

  • Pingback: Ruby 2.0.0 preview1 « Все обо всем

  • Pingback: 099 RR Ruby 2

Want to get more of Railsware blog?

RSS FEED

We're always ready to help!

CONTACT US