The Red Black Moose — Just another lambdabananacamel,

For some time I've been wanting to implement the Enfilade, a really
cool data structure discovered by the Xanadu people among others. It's
like a balanced tree, except that it's indexed relatively rather than
absolutely – making it ideal for things like text editor buffers, as
you can insert and delete with impunity, knowing that the changes to
the address of pieces to your right will percolate up the tree.

As there isn't already an implementation on Perl's CPAN I've occasionally planned to do it for fun.

Problems: 1) I'm lazy, and 2) I'm weak in Computer Science, so I
don't even know how to do the balanced binary tree that I need as a
basis. I looked on CPAN some time back, which has some implementations
of trees, but I couldn't figure out how to get them relatively indexed,
and besides which, it's always instructive to implement something for
yourself.

I bought Okasaki's â€œPurely Functional Data Structuresâ€ some time
back, in a fit of enthusiasm, read the first chapter (a very readable
introduction) and been utterly terrified by the rest of it. But
recently I picked it up off the shelf, read and mostly understood the
chapter on Red Black trees, and had a go implementing it with Perl.

Why a purely functional version? As Okasaki says, the algorithm is
simpler than the destructive one (and potentially very fast). Also I
think that having a persistent, functional tree will make it trivial to
implement things like â€œundoâ€.

A quick look at the algorithm suggested that I might want lazy
building of the tree structure, and that reminded me of the overview in
Moose's cookbook about a lazy binary tree.

Oddly, the Moose example has â€œparentâ€ fields, which rather defeat
the point of sharing, but then again Moose is an OO base class, not an
FP language so fair enough. But I've been meaning to learn Moose for a
while so this seemed like a perfect opportunity.

Five hours later (I'm a bit slow, but I suppose that's not bad as
I'm grokking a new library and an algorithm at the same time) I have a
working implementation.

Some notes:

Moose's â€œlazyâ€ fields are cute! Actually,
in the end, I didn't use them as the algorithm itself was strict (I
just created a base case using an Empty node class instead.
â€œcoerceâ€ is lovely. I let the left and right branches be coercable from
- a node (oddly, you have to specify this case explicitly!)
- undef (for an empty node)
- hashref (to create a node initialised with that data)
- other (set the data field to that value)
which made the code very neat and DWIMmy.

Perl doesn't have algebraic datatypes, so I simulated them with object classes:

    Node
    |-> Node::E   - empty node
    `-> Node::T   - full node

Pattern matching allows the MLian languages to do the
Red-Black balancing in 4 lines. But they are 4 dense, ugly lines, that
check 2×2 cases manually, and from which it's very hard to understand
the intent. Okasaki's diagram was much more helpful. I contemplated
- implementing pattern matching for Nodes (or a generic pattern matching for Perl :-)
- doing a hand coded mess of if {} else {} blocks just to get it working
In the end I rejected those and looped the cases [L,L], [L,R], [R,L], [R,R]
storing the details of the path on the way down, and then working back
up to do the rotation. I think the intent is much clearer, but it is a
fair bit more verbose.
Debugging an algorithm that you don't
understand is hard. I scratched my head for an hour with an unpleasant
bug because I'd forgotten to clone a node before balancing it. (This is
part of the mismatch between FP and Perl – you can do Purely
Functional in Perl, but the language doesn't enforce it in any way, so
you end up maintaining a situation whereby the public API is
functional, but within a subroutine you can mutate the thing you're
working on. End result of this is that you have to keep track of
whether you've mutated something and therefore need to clone it first.
And that means you'll sometimes get it wrong…)
I don't
really understand how the Red-Black invariants are enforced by the
rotations, but working through this on paper (and later with a pretty
printer) you can see that it really works… wheee!
That is
to say, I can see the constraint about 2 reds in a row, as it means
that you can't ever have a mismatch of depths without triggering a
rebalance. But the â€œconstant number of blacks to the leafâ€ is
mysterious.
I don't understand the optimizations Okasaki
suggests in Exercises 2.2 and 3.10 re not doing too many comparisons,
and not rebalancing uselessly. Will come back to this.
Okasaki suggests that the algorithm is very fast, even without those
optimizations… but my Mooseperl version was not… Of course I'm
doing â€œObject Functionalâ€ but I had the feeling that Moose was giving a
fair amount of overhead.

This isn't a criticism of Moose of course – it's designed for OO
systems, and there, the amount of work that Moose does makes your code
powerful and readable, and some Very Clever People in the Perl world
are developing it, using it, and making it fast.

But an FP algorithm uses many short lived structures, which may make
this overhead untenable. Certainly my test script was noticeably slow,
and profiling may have pointed to Moose as the culprit.
I say may but Devel::DProf is a buggy piece of shit and suggests that my script completed in negative time, which is not confidence-inspiring.
Actually the other profiling modules (like DProfLB – where LB stands for â€œLess Badâ€ seem to do the same thing) Pah.
Moose is beautiful, powerful, and well documented. It seems to be
robust, and gives a lot of the goodness of Perl6 and Ruby, while being
respectively not vapourware, and still Perl.

It makes certain aspects of implementing algorithms in an FP style
very neat to prototype, but seems to be impractical due to the overhead
on short-lived objects. I wouldn't have any reservations to using it
for an OO class though!
I rewrote the class in
non-moosey Object/Functional Perl, and as expected, this runs in a more
reasonable time. Still need to optimize etc., but it's faster, while
some of the code is clunkier than the Moose equivalent.

You can see the Moosey and non-Moosey code in my svn repo.