Will on #geekup has been working on a

Countdown

letters and numbers game

solver written in Python. I thought it'd be fun to try to do it in Haskell,

and started with the letters game (anagram) solver.

Starting with a string of jumbled letters, the goal is to make the longest possible

anagram. I remember the first time I tried to solve anagrams I jumped into the

problem without thinking and got mixed up in all kinds of complicated combinatorial

mess. The actual answer is very simple: let's take two words which are anagrams of

each other:

- monad
- nomad

Both of them contain the same letters, so they are identical in some form of "canonical

representation", for example

`{a:1, d:1, m:1, n:1, o:1}`— dictionary mapping letter to number of times used`"admno"`— a string with the letters sorted

So we just need to consider all the subsets of the original jumbled letters in turn, and

compare them against a map of:: `canonical representation -> [list of words]`.

So for example:

pmqnrdzoa

...

m n d oa

...

This function is called a *powerset*. I'm lazy so I googled

a definition.

We want the longest words first. The definition of powerset I found does a

*depth first search* so it's not in order of length. What we want to do

is to work on a list like this

list s =

sortBy (flip $ comparing length) -- longest first

. nub -- unique entries only

. powerset -- all combinations of

. canonicalize -- canonical (sorted) string

$ s

where *canonicalize* is just `sort . map toLower . filter isLetter`.

*comparing* is a nice litle utility sub that makes the above effectively the

same as (\a b -> length a `compare` length b). We then *flip* it to reverse the

ordering (and this is actually a good use for flip ;-).

Ordering by length is potentially inefficient â€” it checks the length of each

element twice, and unlike Perl (where a string knows its own length), a string is

just a list, so it has to descend the list to find it out. This is easy to optimize

by precalculating the lengths, using a technique that in Perl we call the "Schwartzian

transform", and I'll probably come back to this.

OK, so we have a list of subsets to compare, now we need to find a dictionary of

canonical representations of words. Luckily most unixy distributions ship with

one, often `/usr/dict/words`, but Ubuntu sticks is elsewhere.

I asked on #haskell, and was told I should use a `Data.Map`, Haskell's

basic equivalent of a hash or associative array, but implemented using a

Functional Programming friendly tree representation. In actual fact,

quicksilver, mrs, and mmorrow told me the answer straight away, but let's

pretend for the purpose of this post that we're working it out now :-)

Assuming I load that

module, as is common, as `M`, I'd essentially want to call

`M.insertWith (++)` on each element. The `(++)` is the

concatenation operator, and it's the right thing to use because the

dictionary is mapping `String -> [String]`, for example

fromList [("admno", ["nomad","monad","Damon"]),...]

*insertWith* returns a new copy of the Map each time. It's like an accumulator

which gradually takes on the entries from the list of words. And whenever we think

about accumulators, we can think about *fold*s.

foldl' (\m x -> M.insertWith (++) (canonicalize x) [x] m) mempty listOfWords

*mempty* is shorthand here for "an empty Data.Map". But we can go one better as

apparently fold/insertWith is so common that there is a shorthand, *fromListWith*!

fromListWith (++) . map (canonicalize &&& return)

Woah! That's quite compact, and I just introduced some new syntax too: The

`&&&` is basically saying "let's make a tuple with

the result of calling these 2 functions on my input!" so it's the same as

fromListWith (++) . map (\a -> (canonicalize a, return a))

And *return* just means "wrap this value in the appropriate Monad". So it's

a scary way of saying `[a]`, because we're "in" the List monad. (In the same way

that `mempty` above was an empty Data.Map, because it was "in" the Map Monad.)

Whenever I play with Map, I get angry errors about the monomorphism restriction.

The way around that is to add an explicit type signature. If, like me, you're not

quite sure what to put there, you can add a compiler directive to quell the error,

then work out what the signature would be by calling `:t my_function` from the

GHCI command line. (You'll often find afterwards that you can remove the signatures

if you wanted to, because later on the compiler has more information to work out the

types of things. It's only really *during* incremental development that you

get the problem.

{-# LANGUAGE NoMonomorphismRestriction #-}

-- (that's the compiler directive, you can comment this out later)

makeAnag s = do

d <- dict

return $ take 4 $ getAnagrams s d

dict = do file <- readFile "/etc/dictionaries-common/words"

return $ mkdict $ lines file

mkdict :: [String] -> M.Map String [String]

mkdict = M.fromListWith (++) . map (canonicalize &&& return) . filter longEnough

longEnough = (>=3) . length

As you can see, for all the perceived difficulty of doing IO in a pure language

like Haskell, it doesn't seem all that hard in this simple case. `readFile`

reads the file, and `lines` splits it into an array of lines.

The final thing is to check each powerset against the dictionary.

To extract the value, we use `M.lookup`. This function `fail`s if it

can't find a value. So we could do

- For each powerset in the list
- Check if it's present
- And add it to the list if so

Which of course we could do with the `Maybe` type and a `filter`.

But we want a list, and in the List type, failure is represented by an empty list

`[]`. So we can just map and we'd get something like:

[ ["anagram"], [], [], ["anagram 1", "anagram 2"], [] ]

With an empty list for each failure. We can use concatMap to join these together. So it's:

concatMap (\v -> M.lookup v dict) listOfPowersets

Though that actually returns:

[ ["anagram"], ["anagram 1", "anagram 2"], ]

which I hadn't expected. (`M.lookup` returned a list like `["anagram
1", "anagram 2"]`.

Quite literally it

`ed it, which in List context means it`

**return**actually passed

`[["anagram 1", "anagram 2"]]`, which is why the list isn't completely

flattened by concatMap. I get around this by using

`join`. This is another of those

monadic functions: in List context it does exactly what we want here, flattening this list.

getAnagrams s d = join

. concatMap (flip M.lookup $ d)

$ filter longEnough -- 3 or more letters

. sortBy (flip $ comparing length) -- longest first

. nub -- unique entries only

. powerset -- all combinations of

. canonicalize -- canonical (sorted) string

$ s

You can look at the final Haskell Countdown code. I'll look at optimizing the sort and the powersets soon, any comments on

other improvements (including better algorithms) very welcome.