This week’s simple longest path exercise seems to have had more mileage in it than I expected. Thanks to everyone’s comments and suggestions, I’ve updated with a number of times with, among other things, an improved Haskell version that acts on path elements instead of just characters.
But I had intended to do a version of this in Perl. I mentioned last time that the typical approach in this language would be nested hashes… which elicited the comment:
Ugh, one of the absolute worst things with perl: nonsensical handling of nested data structures. Every other modern language does this vastly better.
Em’s fightin’ words! So let’s try it in Perl and see how it looks…
First of all we need to create the multilevel hash. We want to end up with something like this:
{ '' => { 'a' => { 'a' => {} }, 'qux' => { 'wibble' => {} }, 'foo' => { 'bar' => { 'baz' => {} } }, 'aa' => {} } };
The usual way to do this is to maintain an idea of the node that we’re currently in, and follow it to the next item of the path, changing the $node variable to the next level down. So, if we had a path /foo/bar/baz, we’d start off with an empty hash {}, create the next node { foo=> {} } and start the process again with the new empty hash {} and the remaining path bar/baz.
Remind you of anything? I thought it looked a little like a fold with an accumulator variable: the $node is the accumulator, which is initialised to {} and which gets set to each corresponding node in turn. The path is the list that we fold over. And it tickled me that we can write:
use List::Util 'reduce'; sub mk_hash { my %hash; reduce { $a->{$b} ||= {} } \%hash, split '/' for @_; return \%hash; }
In polite company (well, in Perlish polite company at last), calling map in a void context is frowned upon, as it’s abusing a functional operator for its side effects. Here we are abusing reduce, making it destructive on a tree, and discarding its (useless) return value (the final leaf $node element). But it does work exactly as required: (the for @_ means that we run this sick fold on each path in turn) and returns a multilevel hash.
I’m aware that the example above might confirm the original commenter’s dislike of Perl hackishness. Personally I think it’s quite cute, mixing a beautiful functional idiom with pragmatic mutation. This may be brought on by my dabbling with Perl and FP programming languages, and is probably incurable. If you are either a) disgusted or b) confused by the Perl code above, you might like to try implementing the traditional algorithm (please let me know!) We’ll have a look at the equivalent in Haskell shortly.
There isn’t a leaves function built in for multilevel hashes (we do find some in the Tree:: and Graph:: namespaces, but let’s stick with the core datastructures for now), so let’s build one. This version is actually very similar to an FP language definition I think.
sub leaves { my ($node, $path) = @_; if (keys %$node) { # We still have to descend into all the leaves map { my ($k,$v)=@$_; leaves($v, [@$path, $k] ); } kv_list $node } else { # the base case - we are at a leaf, so return path! join '/', @$path; } }
Sadly, though there’s an inbuilt iterator each, there isn’t a flat list of keys/values, so we’ll have to define the function kv_list used above. The easiest way would be:
sub kv_list { my $hashref = shift; map { [$_ => $hashref->{$_}] } keys %$hashref; }
Though we could also create an “iterator_to_list” function that acts on each.
And now, the Haskell version: let’s start with creating the tree. Instead of a hash, we’ll use Data.Map which is similar, but has an implementation better suited to purely functional usage. Of course, we can’t use the same technique as the Perl version: we don’t have destructive mutation, and if we did a fold, it would merely return the final leaf node of each path. So we start with a simpler recursive definition. I say “simpler”, but I needed a lot of help with this. Luckily #haskell came to the rescue: rwbarton pointed out that I’d need a newtype to do a recursive map, Heffalump improved it with unNode, sjanssen and quicksilver helped with a missing node constructors etc. So eventually my naive version of the code looked like:
import qualified Data.Map as M import Text.Regex newtype Node = Node { unNode :: M.Map String Node } deriving Show dive :: Node -> [String] -> Node dive n [] = n dive n (s:ss) = let v = M.lookup s (unNode n) :: Maybe Node n' = case v of Nothing -> Node M.empty Just v' -> v' :: Node in Node $ M.insert s (dive n' ss) (unNode n) splitpath = splitRegex $ mkRegex "/" make_tree ps = foldl dive (Node M.empty) $ map splitpath ps
This is quite clumsy, and could be improved by replacing the lookup/insertion with a single insertWith. The Data.Map API is quite large and you need to play with it a bit to get your head around it! And of course I got a better version, from sjanssen:
import qualified Data.Map as M import Text.Regex import Data.Monoid newtype Node = Node { unNode :: M.Map String Node } instance Monoid Node where mempty = Node M.empty mappend (Node x) (Node y) = Node $ M.unionWith mappend x y splitPath = splitRegex $ mkRegex "/" nodeFromPath = foldr (\x n -> Node $ M.singleton x n) mempty . splitPath nodeFromPaths :: [FilePath] -> Node nodeFromPaths = mconcat . map nodeFromPath
There are a number of clever things here! First of all, he returns the root node using a fold, which I was dubious about being able to do. That’s because instead of diving from the root, he starts from the right (foldr) and constructs the leaf node, then inserts that as the value of the node above, all the way up to the top. Very cute. Then note that he’s using the Data.Map API elegantly: notice how he uses M.singleton where I would have naively done a fromList of a single element, for example. Also, instead of having to descend each tree, updating, he simply creates a set of single path trees, then merges them together at the end with M.unionWith. Finally, it’s using Monoids, which (as well as being a classic Doctor Who monster) is a fancy way of saying “they behave like appendable things” (more or less). In fact you could write the snippet above without, but it does give us the convenient mempty and mconcat functions.
So, which version do you prefer? I mentioned to sjanssen that I thought the Haskell one had more “concepts”, but actually both versions use folds and some sort of mapping data structure. One is destructive, the other pure.
Of course, as he pointed out, not knowing Perl, my version looked incomprehensible.
But if you didn’t know how to write the sick fold above in Perl, you could have easily written a simple recursive version. Whereas the Haskell version does require some knowledge, like the use of recursive newtypes (which I found very confusing — and hard to compile/debug) and as I’ve mentioned, the Data.Map API is large, as befits its power.
I’m too tired to attempt the leaves function in Haskell — please do comment if you give it a go!