For some time I've been wanting to implement the Enfilade, a really cool data structure discovered by the Xanadu people among others. It's like a balanced tree, except that it's indexed relatively rather than absolutely - making it ideal for things like text editor buffers, as you can insert and delete with impunity, knowing that the changes to the address of pieces to your right will percolate up the tree.

As there isn't already an implementation on Perl's CPAN I've occasionally planned to do it for fun.

Problems: 1) I'm lazy, and 2) I'm weak in Computer Science, so I don't even know how to do the balanced binary tree that I need as a basis. I looked on CPAN some time back, which has some implementations of trees, but I couldn't figure out how to get them relatively indexed, and besides which, it's always instructive to implement something for yourself.

I bought Okasaki's “Purely Functional Data Structures” some time back, in a fit of enthusiasm, read the first chapter (a very readable introduction) and been utterly terrified by the rest of it. But recently I picked it up off the shelf, read and mostly understood the chapter on Red Black trees, and had a go implementing it with Perl.

Why a purely functional version? As Okasaki says, the algorithm is simpler than the destructive one (and potentially very fast). Also I think that having a persistent, functional tree will make it trivial to implement things like “undo”.

A quick look at the algorithm suggested that I might want lazy building of the tree structure, and that reminded me of the overview in Moose's cookbook about a lazy binary tree.

Oddly, the Moose example has “parent” fields, which rather defeat the point of sharing, but then again Moose is an OO base class, not an FP language so fair enough. But I've been meaning to learn Moose for a while so this seemed like a perfect opportunity.

Five hours later (I'm a bit slow, but I suppose that's not bad as I'm grokking a new library and an algorithm at the same time) I have a working implementation.

Some notes:

  • Moose's “lazy” fields are cute! Actually, in the end, I didn't use them as the algorithm itself was strict (I just created a base case using an Empty node class instead.
  • “coerce” is lovely. I let the left and right branches be coercable from
    • a node (oddly, you have to specify this case explicitly!)
    • undef (for an empty node)
    • hashref (to create a node initialised with that data)
    • other (set the data field to that value)
    which made the code very neat and DWIMmy.
  • Perl doesn't have algebraic datatypes, so I simulated them with object classes:
        Node
    |-> Node::E - empty node
    `-> Node::T - full node
  • Pattern matching allows the MLian languages to do the Red-Black balancing in 4 lines. But they are 4 dense, ugly lines, that check 2x2 cases manually, and from which it's very hard to understand the intent. Okasaki's diagram was much more helpful. I contemplated
    • implementing pattern matching for Nodes (or a generic pattern matching for Perl :-)
    • doing a hand coded mess of if {} else {} blocks just to get it working
    In the end I rejected those and looped the cases [L,L], [L,R], [R,L], [R,R] storing the details of the path on the way down, and then working back up to do the rotation. I think the intent is much clearer, but it is a fair bit more verbose.
  • Debugging an algorithm that you don't understand is hard. I scratched my head for an hour with an unpleasant bug because I'd forgotten to clone a node before balancing it. (This is part of the mismatch between FP and Perl - you can do Purely Functional in Perl, but the language doesn't enforce it in any way, so you end up maintaining a situation whereby the public API is functional, but within a subroutine you can mutate the thing you're working on. End result of this is that you have to keep track of whether you've mutated something and therefore need to clone it first. And that means you'll sometimes get it wrong...)
  • I don't really understand how the Red-Black invariants are enforced by the rotations, but working through this on paper (and later with a pretty printer) you can see that it really works... wheee!
  • That is to say, I can see the constraint about 2 reds in a row, as it means that you can't ever have a mismatch of depths without triggering a rebalance. But the “constant number of blacks to the leaf” is mysterious.
  • I don't understand the optimizations Okasaki suggests in Exercises 2.2 and 3.10 re not doing too many comparisons, and not rebalancing uselessly. Will come back to this.
  • Okasaki suggests that the algorithm is very fast, even without those optimizations... but my Mooseperl version was not... Of course I'm doing “Object Functional” but I had the feeling that Moose was giving a fair amount of overhead.

    This isn't a criticism of Moose of course - it's designed for OO systems, and there, the amount of work that Moose does makes your code powerful and readable, and some Very Clever People in the Perl world are developing it, using it, and making it fast.

    But an FP algorithm uses many short lived structures, which may make this overhead untenable. Certainly my test script was noticeably slow, and profiling may have pointed to Moose as the culprit.

  • I say may but Devel::DProf is a buggy piece of shit and suggests that my script completed in negative time, which is not confidence-inspiring.

    Actually the other profiling modules (like DProfLB - where LB stands for “Less Bad” seem to do the same thing) Pah.

  • Moose is beautiful, powerful, and well documented. It seems to be robust, and gives a lot of the goodness of Perl6 and Ruby, while being respectively not vapourware, and still Perl.

    It makes certain aspects of implementing algorithms in an FP style very neat to prototype, but seems to be impractical due to the overhead on short-lived objects. I wouldn't have any reservations to using it for an OO class though!

  • I rewrote the class in non-moosey Object/Functional Perl, and as expected, this runs in a more reasonable time. Still need to optimize etc., but it's faster, while some of the code is clunkier than the Moose equivalent.

You can see the Moosey and non-Moosey code in my svn repo.