Countdown words game solver in Haskell

Will on #geekup has been working on a
Countdown
letters and numbers game
solver
written in Python. I thought it'd be fun to try to do it in Haskell,
and started with the letters game (anagram) solver.

Starting with a string of jumbled letters, the goal is to make the longest possible
anagram. I remember the first time I tried to solve anagrams I jumped into the
problem without thinking and got mixed up in all kinds of complicated combinatorial
mess. The actual answer is very simple: let's take two words which are anagrams of
each other:

  • monad
  • nomad

Both of them contain the same letters, so they are identical in some form of "canonical
representation", for example

  • {a:1, d:1, m:1, n:1, o:1} — dictionary mapping letter to number of times used
  • "admno" — a string with the letters sorted

So we just need to consider all the subsets of the original jumbled letters in turn, and
compare them against a map of:: canonical representation -> [list of words].

So for example:

pmqnrdzoa
...
 m n d oa
...

This function is called a powerset. I'm lazy so I googled
a definition.
We want the longest words first. The definition of powerset I found does a
depth first search so it's not in order of length. What we want to do
is to work on a list like this

list s =
sortBy (flip $ comparing length) -- longest first
. nub -- unique entries only
. powerset -- all combinations of
. canonicalize -- canonical (sorted) string
$ s

where canonicalize is just sort . map toLower . filter isLetter.

comparing is a nice litle utility sub that makes the above effectively the
same as (\a b -> length a `compare` length b). We then flip it to reverse the
ordering (and this is actually a good use for flip ;-).

Ordering by length is potentially inefficient — it checks the length of each
element twice, and unlike Perl (where a string knows its own length), a string is
just a list, so it has to descend the list to find it out. This is easy to optimize
by precalculating the lengths, using a technique that in Perl we call the "Schwartzian
transform", and I'll probably come back to this.

OK, so we have a list of subsets to compare, now we need to find a dictionary of
canonical representations of words. Luckily most unixy distributions ship with
one, often /usr/dict/words, but Ubuntu sticks is elsewhere.

I asked on #haskell, and was told I should use a Data.Map, Haskell's
basic equivalent of a hash or associative array, but implemented using a
Functional Programming friendly tree representation. In actual fact,
quicksilver, mrs, and mmorrow told me the answer straight away, but let's
pretend for the purpose of this post that we're working it out now :-)

Assuming I load that
module, as is common, as M, I'd essentially want to call
M.insertWith (++) on each element. The (++) is the
concatenation operator, and it's the right thing to use because the
dictionary is mapping String -> [String], for example

fromList [("admno", ["nomad","monad","Damon"]),...]

insertWith returns a new copy of the Map each time. It's like an accumulator
which gradually takes on the entries from the list of words. And whenever we think
about accumulators, we can think about folds.

foldl' (\m x -> M.insertWith (++) (canonicalize x) [x] m) mempty listOfWords

mempty is shorthand here for "an empty Data.Map". But we can go one better as
apparently fold/insertWith is so common that there is a shorthand, fromListWith!

fromListWith (++) . map (canonicalize &&& return)

Woah! That's quite compact, and I just introduced some new syntax too: The
&&& is basically saying "let's make a tuple with
the result of calling these 2 functions on my input!" so it's the same as

fromListWith (++) . map (\a -> (canonicalize a, return a))

And return just means "wrap this value in the appropriate Monad". So it's
a scary way of saying [a], because we're "in" the List monad. (In the same way
that mempty above was an empty Data.Map, because it was "in" the Map Monad.)

Whenever I play with Map, I get angry errors about the monomorphism restriction.
The way around that is to add an explicit type signature. If, like me, you're not
quite sure what to put there, you can add a compiler directive to quell the error,
then work out what the signature would be by calling :t my_function from the
GHCI command line. (You'll often find afterwards that you can remove the signatures
if you wanted to, because later on the compiler has more information to work out the
types of things. It's only really during incremental development that you
get the problem.

{-# LANGUAGE NoMonomorphismRestriction #-}
-- (that's the compiler directive, you can comment this out later)

makeAnag s = do
d <- dict
return $ take 4 $ getAnagrams s d

dict = do file <- readFile "/etc/dictionaries-common/words"
return $ mkdict $ lines file

mkdict :: [String] -> M.Map String [String]
mkdict = M.fromListWith (++) . map (canonicalize &&& return) . filter longEnough

longEnough = (>=3) . length

As you can see, for all the perceived difficulty of doing IO in a pure language
like Haskell, it doesn't seem all that hard in this simple case. readFile
reads the file, and lines splits it into an array of lines.

The final thing is to check each powerset against the dictionary.
To extract the value, we use M.lookup. This function fails if it
can't find a value. So we could do

  • For each powerset in the list
  • Check if it's present
  • And add it to the list if so

Which of course we could do with the Maybe type and a filter.
But we want a list, and in the List type, failure is represented by an empty list
[]. So we can just map and we'd get something like:

[ ["anagram"], [], [], ["anagram 1", "anagram 2"], [] ]

With an empty list for each failure. We can use concatMap to join these together. So it's:

concatMap (\v -> M.lookup v dict) listOfPowersets

Though that actually returns:

[ ["anagram"], ["anagram 1", "anagram 2"], ]

which I hadn't expected. (M.lookup returned a list like ["anagram
1", "anagram 2"]
.
Quite literally it returned it, which in List context means it
actually passed [["anagram 1", "anagram 2"]], which is why the list isn't completely
flattened by concatMap. I get around this by using join. This is another of those
monadic functions: in List context it does exactly what we want here, flattening this list.

getAnagrams s d = join 
. concatMap (flip M.lookup $ d)
$ filter longEnough -- 3 or more letters
. sortBy (flip $ comparing length) -- longest first
. nub -- unique entries only
. powerset -- all combinations of
. canonicalize -- canonical (sorted) string
$ s

You can look at the final Haskell Countdown code. I'll look at optimizing the sort and the powersets soon, any comments on
other improvements (including better algorithms) very welcome.