The Red Black Moose

For some time I've been wanting to implement the Enfilade, a really
cool data structure discovered by the Xanadu people among others. It's
like a balanced tree, except that it's indexed relatively rather than
absolutely – making it ideal for things like text editor buffers, as
you can insert and delete with impunity, knowing that the changes to
the address of pieces to your right will percolate up the tree.

As there isn't already an implementation on Perl's CPAN I've occasionally planned to do it for fun.

Problems: 1) I'm lazy, and 2) I'm weak in Computer Science, so I
don't even know how to do the balanced binary tree that I need as a
basis. I looked on CPAN some time back, which has some implementations
of trees, but I couldn't figure out how to get them relatively indexed,
and besides which, it's always instructive to implement something for
yourself.

I bought Okasaki's “Purely Functional Data Structures” some time
back, in a fit of enthusiasm, read the first chapter (a very readable
introduction) and been utterly terrified by the rest of it. But
recently I picked it up off the shelf, read and mostly understood the
chapter on Red Black trees, and had a go implementing it with Perl.

Why a purely functional version? As Okasaki says, the algorithm is
simpler than the destructive one (and potentially very fast). Also I
think that having a persistent, functional tree will make it trivial to
implement things like “undo”.

A quick look at the algorithm suggested that I might want lazy
building of the tree structure, and that reminded me of the overview in
Moose's cookbook about a lazy binary tree.

Oddly, the Moose example has “parent” fields, which rather defeat
the point of sharing, but then again Moose is an OO base class, not an
FP language so fair enough. But I've been meaning to learn Moose for a
while so this seemed like a perfect opportunity.

Five hours later (I'm a bit slow, but I suppose that's not bad as
I'm grokking a new library and an algorithm at the same time) I have a
working implementation.

Some notes:

  • Moose's “lazy” fields are cute! Actually,
    in the end, I didn't use them as the algorithm itself was strict (I
    just created a base case using an Empty node class instead.
  • “coerce” is lovely. I let the left and right branches be coercable from
    • a node (oddly, you have to specify this case explicitly!)
    • undef (for an empty node)
    • hashref (to create a node initialised with that data)
    • other (set the data field to that value)

    which made the code very neat and DWIMmy.

  • Perl doesn't have algebraic datatypes, so I simulated them with object classes:
        Node
    |-> Node::E - empty node
    `-> Node::T - full node
  • Pattern matching allows the MLian languages to do the
    Red-Black balancing in 4 lines. But they are 4 dense, ugly lines, that
    check 2×2 cases manually, and from which it's very hard to understand
    the intent. Okasaki's diagram was much more helpful. I contemplated
    • implementing pattern matching for Nodes (or a generic pattern matching for Perl :-)
    • doing a hand coded mess of if {} else {} blocks just to get it working

    In the end I rejected those and looped the cases [L,L], [L,R], [R,L], [R,R]
    storing the details of the path on the way down, and then working back
    up to do the rotation. I think the intent is much clearer, but it is a
    fair bit more verbose.

  • Debugging an algorithm that you don't
    understand is hard. I scratched my head for an hour with an unpleasant
    bug because I'd forgotten to clone a node before balancing it. (This is
    part of the mismatch between FP and Perl – you can do Purely
    Functional in Perl, but the language doesn't enforce it in any way, so
    you end up maintaining a situation whereby the public API is
    functional, but within a subroutine you can mutate the thing you're
    working on. End result of this is that you have to keep track of
    whether you've mutated something and therefore need to clone it first.
    And that means you'll sometimes get it wrong…)
  • I don't
    really understand how the Red-Black invariants are enforced by the
    rotations, but working through this on paper (and later with a pretty
    printer) you can see that it really works… wheee!
  • That is
    to say, I can see the constraint about 2 reds in a row, as it means
    that you can't ever have a mismatch of depths without triggering a
    rebalance. But the “constant number of blacks to the leaf” is
    mysterious.
  • I don't understand the optimizations Okasaki
    suggests in Exercises 2.2 and 3.10 re not doing too many comparisons,
    and not rebalancing uselessly. Will come back to this.
  • Okasaki suggests that the algorithm is very fast, even without those
    optimizations… but my Mooseperl version was not… Of course I'm
    doing “Object Functional” but I had the feeling that Moose was giving a
    fair amount of overhead.

    This isn't a criticism of Moose of course – it's designed for OO
    systems, and there, the amount of work that Moose does makes your code
    powerful and readable, and some Very Clever People in the Perl world
    are developing it, using it, and making it fast.

    But an FP algorithm uses many short lived structures, which may make
    this overhead untenable. Certainly my test script was noticeably slow,
    and profiling may have pointed to Moose as the culprit.

  • I say may but Devel::DProf is a buggy piece of shit and suggests that my script completed in negative time, which is not confidence-inspiring.

    Actually the other profiling modules (like DProfLB – where LB stands for “Less Bad” seem to do the same thing) Pah.

  • Moose is beautiful, powerful, and well documented. It seems to be
    robust, and gives a lot of the goodness of Perl6 and Ruby, while being
    respectively not vapourware, and still Perl.

    It makes certain aspects of implementing algorithms in an FP style
    very neat to prototype, but seems to be impractical due to the overhead
    on short-lived objects. I wouldn't have any reservations to using it
    for an OO class though!

  • I rewrote the class in
    non-moosey Object/Functional Perl, and as expected, this runs in a more
    reasonable time. Still need to optimize etc., but it's faster, while
    some of the code is clunkier than the Moose equivalent.

You can see the Moosey and non-Moosey code in my svn repo.