More Countdown: laziness, Scheme, and German frogs — Just another lambdabananacamel,

The Countdown code I showed you isn't really taking advantage of Haskell's laziness.
We should only have to check entries up until the point that we have enough matches
(in the current code 'take 4 $ getAnagrams') and that's good. However, we
have to generate the whole powerset first, so that we can sort it in reverse order
of length. Ideally, we'd generate the powerset breadth-first, in order of length.

OK, so generating the powerset doesn't take all that much time and this isn't a good
optimization as such, but I did think it might be fun trying to do the breadth
first powerset, as all the examples I'd seen had been depth first.

But first: an interlude to marvel at one of the scariest things I've seen this year.
A monadic definition of
powerset.

import Control.Monad (filterM)
powerset = filterM (const [True, False])

I'm not even going to attempt to twist my head around that now, but it's very beautiful,
though it's impossible to tell just by reading it what the intent of the code is.

I asked on #london.pm if anyone knew good breadth-first algorithms for powerset. Joel
helpfully pasted the following

(define (combinations set n)
  (if (zero? n)
    (list '())
    (let ((n2 (- n 1)))
      (pair-fold-right
       (lambda (pr acc)
         (let ((first (car pr)))
           (append (map (cut cons first <>)
                        (combinations (cdr pr) n2))
                   acc)))
       '()
       set))))
 
(define (power-set set)
  (let ((size (length set)))
    (let loop ((i 0))
      (if (> i size)
        '()
        (append (combinations set i)
                (loop (+ i 1)))))))

I think it surprised Joel (and actually, it surprised me a little) that I was more or less
unable to read this at all. Yes, I know that the syntax of Lisp is incredibly simple, and
you can learn all the syntax of Scheme in 20 minutes or whatever. The noise of the
parenthesis, the cdrs etc. is just noise, you can filter it out if you look at
it calmly. But I still don't understand where the pairs are in pair-fold-right,
and cut apparently does something similar to currying, but what does that mean
in context?

To cut a long story short, I was reading this as a foreign language rather than a piece of
generic pseudocode. When I was very little, I read with my mother a picture book about
frogs, in German. With the help of the pictures and a little imagination, it was easy to
tell which word meant "frog", which meant "tree", and what the sense of the story was.
After we finished, very puffed up with just how damn clever I was, I started trying to
read it again, and got utterly confused about all these strange words like 'in'
and 'auf' and 'dem' that I just hadn't worried about the first time
around.

So… trying to see the frog for the trees, we can see that
combinations gives every set of combinations of a particular length,
and that power-set merely loops through 0..length, appending the
combinations found for that length.

We can write combinations as a variant on the original powerset function, but which refuses
to carry on once it's got enough letters:

combinations xs n = comb' xs n (length xs)

comb' xss@(x:xs) n l | n == 0 = [[]]
                     | l == n = [xss]
                     | otherwise = (map (x:) $ comb' xs (n-1) (l-1)) ++ (comb' xs n (l-1))
comb' [] _ _ = [[]]

And powerset is easy as:

powerset xs = powerset' xs (length xs)
powerset' xs l = if l < minLength
                    then []
                    else (combinations xs l) ++ (powerset' xs (l-1))

minLength = 3

We can now remove out the lines with sortBy and filter longEnough,
as the new definition already presents the items in the right order.

Does this make it any faster? Apparently not: as I guessed, powerset is not the
hotspot. I guess that the problem is the repeated lookups in the Data.Map â€” any
suggestions on how to profile the Haskell code, and better algorithms to deal with it?