Perl, Haskell, stuff
Will on #geekup has been working on a Countdown letters and numbers game solver written in Python. I thought it'd be fun to try to do it in Haskell, and started with the letters game (anagram) solver.
Starting with a string of jumbled letters, the goal is to make the longest possible anagram. I remember the first time I tried to solve anagrams I jumped into the problem without thinking and got mixed up in all kinds of complicated combinatorial mess. The actual answer is very simple: let's take two words which are anagrams of each other:
Both of them contain the same letters, so they are identical in some form of "canonical representation", for example
So for example:
This function is called a powerset. I'm lazy so I googled a definition. We want the longest words first. The definition of powerset I found does a depth first search so it's not in order of length. What we want to do is to work on a list like thispmqnrdzoa
...
m n d oa
...
where canonicalize is just sort . map toLower . filter isLetter.list s =
sortBy (flip $ comparing length) -- longest first
. nub -- unique entries only
. powerset -- all combinations of
. canonicalize -- canonical (sorted) string
$ s
comparing is a nice litle utility sub that makes the above effectively the same as (\a b -> length a `compare` length b). We then flip it to reverse the ordering (and this is actually a good use for flip ;-).
Ordering by length is potentially inefficient — it checks the length of each element twice, and unlike Perl (where a string knows its own length), a string is just a list, so it has to descend the list to find it out. This is easy to optimize by precalculating the lengths, using a technique that in Perl we call the "Schwartzian transform", and I'll probably come back to this.
OK, so we have a list of subsets to compare, now we need to find a dictionary of canonical representations of words. Luckily most unixy distributions ship with one, often /usr/dict/words, but Ubuntu sticks is elsewhere.
I asked on #haskell, and was told I should use a Data.Map, Haskell's basic equivalent of a hash or associative array, but implemented using a Functional Programming friendly tree representation. In actual fact, quicksilver, mrs, and mmorrow told me the answer straight away, but let's pretend for the purpose of this post that we're working it out now :-)
Assuming I load that module, as is common, as M, I'd essentially want to call M.insertWith (++) on each element. The (++) is the concatenation operator, and it's the right thing to use because the dictionary is mapping String -> [String], for example
insertWith returns a new copy of the Map each time. It's like an accumulator which gradually takes on the entries from the list of words. And whenever we think about accumulators, we can think about folds.fromList [("admno", ["nomad","monad","Damon"]),...]
mempty is shorthand here for "an empty Data.Map". But we can go one better as apparently fold/insertWith is so common that there is a shorthand, fromListWith!foldl' (\m x -> M.insertWith (++) (canonicalize x) [x] m) mempty listOfWords
Woah! That's quite compact, and I just introduced some new syntax too: The &&& is basically saying "let's make a tuple with the result of calling these 2 functions on my input!" so it's the same asfromListWith (++) . map (canonicalize &&& return)
And return just means "wrap this value in the appropriate Monad". So it's a scary way of saying [a], because we're "in" the List monad. (In the same way that mempty above was an empty Data.Map, because it was "in" the Map Monad.)fromListWith (++) . map (\a -> (canonicalize a, return a))
Whenever I play with Map, I get angry errors about the monomorphism restriction. The way around that is to add an explicit type signature. If, like me, you're not quite sure what to put there, you can add a compiler directive to quell the error, then work out what the signature would be by calling :t my_function from the GHCI command line. (You'll often find afterwards that you can remove the signatures if you wanted to, because later on the compiler has more information to work out the types of things. It's only really during incremental development that you get the problem.
{-# LANGUAGE NoMonomorphismRestriction #-}
-- (that's the compiler directive, you can comment this out later)
makeAnag s = do
d <- dict
return $ take 4 $ getAnagrams s d
dict = do file <- readFile "/etc/dictionaries-common/words"
return $ mkdict $ lines file
mkdict :: [String] -> M.Map String [String]
mkdict = M.fromListWith (++) . map (canonicalize &&& return) . filter longEnough
longEnough = (>=3) . length
As you can see, for all the perceived difficulty of doing IO in a pure language
like Haskell, it doesn't seem all that hard in this simple case. readFile
reads the file, and lines splits it into an array of lines.
The final thing is to check each powerset against the dictionary. To extract the value, we use M.lookup. This function fails if it can't find a value. So we could do
With an empty list for each failure. We can use concatMap to join these together. So it's:[ ["anagram"], [], [], ["anagram 1", "anagram 2"], [] ]
Though that actually returns:concatMap (\v -> M.lookup v dict) listOfPowersets
which I hadn't expected. (M.lookup returned a list like ["anagram 1", "anagram 2"]. Quite literally it returned it, which in List context means it actually passed [["anagram 1", "anagram 2"]], which is why the list isn't completely flattened by concatMap. I get around this by using join. This is another of those monadic functions: in List context it does exactly what we want here, flattening this list.[ ["anagram"], ["anagram 1", "anagram 2"], ]
You can look at the final Haskell Countdown code. I'll look at optimizing the sort and the powersets soon, any comments on other improvements (including better algorithms) very welcome.getAnagrams s d = join
. concatMap (flip M.lookup $ d)
$ filter longEnough -- 3 or more letters
. sortBy (flip $ comparing length) -- longest first
. nub -- unique entries only
. powerset -- all combinations of
. canonicalize -- canonical (sorted) string
$ s
Osfameron's blog on Haskell, Perl programming, stuff.
Leave a reply