Saturday, April 22, 2017

Streaming Functional

Suneido has functional aspects like map and filter. They're not a major part of the language but they're definitely useful. However, it has always bothered me that they aren't "streaming". An operation like Map or Filter always generated a new collection. For small collections and many usages that works ok. Obviously it doesn't allow you to work with infinite streams but that's not a common requirement for our applications. But it's annoyingly inefficient if the desired result isn't a collection. e.g. If you end up joining the results into a string or writing them to a file.

I've thought about how I could change Suneido's functional methods to be streaming but it alway seemed like it would be a big change. When Java 8 introduced its streaming methods (for similar reasons) it was another reminder. However I didn't really like their approach - having to start by calling .stream and end by with some kind of collector. It makes sense and works well but it's verbose. One of the things I like about functional style is that it tends to be quite succinct.

The other day I was thinking about it again and I realized I might be closer than I thought, that Suneido might already have most of what I needed.

Early on in Suneido's development I'd face a similar problem with iterating over the members of an object. The Members method returned a new collection but that was wasteful if you just wanted to iterate. So I'd introduced "sequences" which could be iterated over without instantiating a new collection. Basically they just wrapped an iterator. However, they could also be treated as collections. If you called a collection method on them, they would instantiate the collection automatically. This meant a single method like Members could be used both to iterate lazily, and to generate a new collection, automatically and invisibly.

Later I added a Sequence function so that you could write these kinds of lazy sequences in user code. For example, we used it for a string lines method. But it never got used extensively.

My insight (as usual, obvious in hindsight) was that Sequence provided almost everything I needed to make the functional methods streaming. I simply rewrote Map and Filter as iterators and wrapped them in Sequence.

I wrote them as standalone functions, but also made them methods on collections. I prefer to write list.Filter(...).Map(...) than Map(Filter(list))

But I immediately ran into a problem. list.Filter(...).Map(...) worked, but it wasn't streaming. i.e. The result of Filter was still getting instantiated. The reason was that the sequence/iterator didn't have a Map method so it would instantiate the collection which did.

That was easy to fix, although it required a change to the runtime implementation of Sequence. There was no reason that you couldn't call user defined collection methods on sequences. It was only the built-in collection methods that required instantiation.

The next problem I ran into was that people were misusing Map purely for side effects. They should have been using Each instead. That was easy to fix, but somewhat tedious to examine every use of Map.

Then I found that Join would trigger an instantiation because it was a built-in collection method. So I added a built-in Join method on sequences.

I also found that laziness (as with these streams) and mutability don't play well together. I ended up needing to explicitly instantiate sequences in a few places because otherwise things would change out from under them. This is ugly and I'm not really happy with it, although thankfully it seems quite rare. But I could see it stumping someone who wasn't aware of how sequences work.

Being explicit about when to transition between collections and streams (like Java 8) avoids many of these issues, but doing it automatically fits better with Suneido's philosophy.

This project was a bit of a roller coaster (aren't they all?) alternating between thinking it was easy and I was done, with finding "fundamental" flaws without obvious solutions.  Unusually, even quite flawed implementations would handle most (simple) scenarios. It was the corner cases that gave me grief.

After hanging Suneido a few times I ended up adding a way to mark infinite sequences and refusing to instantiate them or display their contents. For fun I wrote a Primes infinite sequence.

Overall I'm pretty happy with how it turned out. I haven't done many benchmarks. I'm sure I can come up with scenarios where it will be a big win, but I doubt it'll make much difference in the larger picture.

One weakness is that Suneido doesn't have any easy way to write sequence operations. Currently they have to be written as iterators, which is ok for simple stuff but not very nice for things like tree traversals. Ideally you'd have some kind of "generators" with "yield". A project for another time!