The Countdown code I showed you isn't really taking advantage of Haskell's laziness. We should only have to check entries up until the point that we have enough matches (in the current code 'take 4 $ getAnagrams') and that's good. However, we have to generate the whole powerset first, so that we can sort it in reverse order of length. Ideally, we'd generate the powerset breadth-first, in order of length.

OK, so generating the powerset doesn't take all that much time and this isn't a good optimization as such, but I did think it might be fun trying to do the breadth first powerset, as all the examples I'd seen had been depth first.

But first: an interlude to marvel at one of the scariest things I've seen this year. A monadic definition of powerset.

import Control.Monad (filterM)
powerset = filterM (const [True, False])

I'm not even going to attempt to twist my head around that now, but it's very beautiful, though it's impossible to tell just by reading it what the intent of the code is.

I asked on #london.pm if anyone knew good breadth-first algorithms for powerset. Joel helpfully pasted the following

(define (combinations set n)
(if (zero? n)
(list '())
(let ((n2 (- n 1)))
(pair-fold-right
(lambda (pr acc)
(let ((first (car pr)))
(append (map (cut cons first <>)
(combinations (cdr pr) n2))
acc)))
'()
set))))

(define (power-set set)
(let ((size (length set)))
(let loop ((i 0))
(if (> i size)
'()
(append (combinations set i)
(loop (+ i 1)))))))
I think it surprised Joel (and actually, it surprised me a little) that I was more or less unable to read this at all. Yes, I know that the syntax of Lisp is incredibly simple, and you can learn all the syntax of Scheme in 20 minutes or whatever. The noise of the parenthesis, the cdrs etc. is just noise, you can filter it out if you look at it calmly. But I still don't understand where the pairs are in pair-fold-right, and cut apparently does something similar to currying, but what does that mean in context?

To cut a long story short, I was reading this as a foreign language rather than a piece of generic pseudocode. When I was very little, I read with my mother a picture book about frogs, in German. With the help of the pictures and a little imagination, it was easy to tell which word meant "frog", which meant "tree", and what the sense of the story was. After we finished, very puffed up with just how damn clever I was, I started trying to read it again, and got utterly confused about all these strange words like 'in' and 'auf' and 'dem' that I just hadn't worried about the first time around.

So... trying to see the frog for the trees, we can see that combinations gives every set of combinations of a particular length, and that power-set merely loops through 0..length, appending the combinations found for that length.

We can write combinations as a variant on the original powerset function, but which refuses to carry on once it's got enough letters:

combinations xs n = comb' xs n (length xs)

comb' xss@(x:xs) n l | n == 0 = [[]]
| l == n = [xss]
| otherwise = (map (x:) $ comb' xs (n-1) (l-1)) ++ (comb' xs n (l-1))
comb' [] _ _ = [[]]

And powerset is easy as:

powerset xs = powerset' xs (length xs)
powerset' xs l = if l < minLength
then []
else (combinations xs l) ++ (powerset' xs (l-1))

minLength = 3

We can now remove out the lines with sortBy and filter longEnough, as the new definition already presents the items in the right order.

Does this make it any faster? Apparently not: as I guessed, powerset is not the hotspot. I guess that the problem is the repeated lookups in the Data.Map — any suggestions on how to profile the Haskell code, and better algorithms to deal with it?