Crossword puzzles in Haskell

Every year or so I come back to the problem of writing a crossword puzzle
compiler/player. I think Javascript would be the most promising for a
web-based player, though I’ve given it a go in Java and Perl too.
Modeling the problem is interesting in an Object Oriented language – I would
find myself getting bogged down with “Lines” and the similarities between
rows (Across) and columns (Down). I have a suspicion that OO Roles might be
a more expressive way to model this. Anyway, given that I’ve not been writing
much about Haskell 1, this is a good time to redress the balance.

In the OO implementations, Cells would refer to the “Light” (group of adjacent
cells running Across/Down) that they’re in. And the Light would of course refer
to the cells… this idea filled me with terror in Haskell, as it involves
“Tying the Knot”, which seems terribly clever and confusing. As it happens, you
can often get away with having mutual references in Functional Programming, as
they just spontaneously turn out not to be necessary. So far, this seems to be
the case, though I think that if I take it to the point of navigating the grid
from a UI, I may need an explicit structure to manage this, like a zipper.

This is a “literate Haskell” post. You should be able to save it as
“crossword.lhs” and run it with runghc or ghci (calling
the main function to test it).

We start off with some imports: the List and Char modules
(which I’ve naughtily not specified) and the handy join
function to flatten a list.

> import Data.List
> import Data.Char
> import Control.Monad (join)

They say that much of the work in Haskell is defining the types.
Direction is an Enum for the direction of a Light.
Cell is either a Block (a black square) or a Cell (either blank,
or already filled in). I guess I could model Block | Blank | Cell
but don’t yet see an advantage to that.

> data Direction = Across | Down
>     deriving (Show, Eq, Ord)
> data Cell = Cell  (Maybe Char) Coord
>           | Block              Coord
>     deriving (Show, Eq)
> type Coord  = (Int, Int)
> coord (Cell _ c) = c
> coord (Block  c) = c

Directions can be sorted (which may come in useful for showing Across
clues before Down ones), and this can be done automatically by deriving
Eq and Ord. But how would we sort a Cell? We’ll do it by the (x,y)
coordinates, which means I can’t use automatic derivation. Perhaps
I could swap the order of arguments and still use that, but for now I
defined a custom compare.

> instance Ord Cell
>     where compare l r = compare (coord l) (coord r)

Lights (distinct from Clues, which may be spread over 1 or more
Lights) are a list of cells in a given direction. (It would be nice to
specify that the cells really are contiguous, not sure if this is something
that fundeps would be useful for?)

> data Light     = Light [Cell] Direction
>     deriving (Show, Eq, Ord)
> -- we'll sometimes want to know the first cell in a Light:
> headC (Light (c:_) _) = c

Of course Lights are numbered… but with the algorithm that I’m using, we
don’t know the number (like 5 Across) at the time we create it. I created
a new type, LightN, but perhaps I should have modeled with a
Maybe Int instead.

> data LightN    = LightN Int Light
>     deriving Show

Now we need some test data. I started with a grid from the wonderful
Crossword No. 24298

> grid = textToCells [ 
>     "TRIPOD#        ",
>     "# # # # # # # #",
>     "# # # # # # # #",
>     "###            ",
>     "# # # # # ### #",
>     "ANNA###        ",
>     "# # # # # # # #",
>     "        ###PARE",
>     "# ### # #U# # #",
>     "         S  ###",
>     "# # # # #E# # #",
>     "      #INFERNAL",
>     "# # # # #U# # #",
>     "FRAGMENT#L     "
>     ]

We’re going to want to output the grid, lights, and other objects, so let’s
define some functions to do that.

> showCell (Block _)         = '#'
> showCell (Cell (Just c) _) = c
> showCell _                 = ' '

> showLightN :: LightN -> String
> showLightN (LightN n l@(Light cs d)) =
>        show n ++ " " 
>     ++ show d 
>  -- ++ " " ++ show (coord $ headC l) 
>     ++ ": "
>     ++ show (map showCell cs)
>     ++ " (" ++ (show $ length cs) ++ ")"

Similarly, we want to parse the list of strings above into a list of
crossword cells. textToCells threads the row and column
number with every character in the grid by zipping the list with the
infinite list [1..], which I think is quite cute, though
there are no doubt more elegant versions (list comprehensions?)

> charToCell :: Char -> Coord -> Cell
> charToCell '#' = Block
> charToCell ' ' = Cell Nothing
> charToCell  c
>     | isAlpha c = Cell (Just c)
>     | otherwise = error $ "Invalid character '" ++ [c] ++ "'"
> textToCells :: [[Char]] -> [[Cell]]
> textToCells                     = zipWith  makeRow       [1..] 
>     where makeRow  row          = zipWith (makeCell row) [1..] 
>           makeCell row col char = charToCell char (row,col)

But working out what cells are is the easy part! We now want to
know which cells form a light — i.e. groups of more than 1 non-block
cell in either direction Across/Down. To get data for both directions,
it’s easiest to run in two passes, one in the normal direction, the
other transposed. (I did consider trying to do both at the
same time, but it hurt my brain: a one pass solution involving magical fumplicative
arroids or somesuch is left as an exercise to the very clever reader).

> lights dir grid = concatMap 
>                     (flip lightsInLine dir) 
>                     $ rot dir grid
>     where rot Across = id
>           rot Down   = transpose
> lightsInLine :: [Cell] -> Direction -> [Light]
> lightsInLine cells dir = 
>     let l  = filter isMultiCell 
>                $ groupBy areCells cells
>     in  map (\c -> Light c dir) l
> areCells x y = isCell x && isCell y 
> isCell (Cell _ _) = True
> isCell  _         = False
> isMultiCell (x:y:_) | areCells x y = True
> isMultiCell _ = False

So… where have we got with all of this modeling? Well, we can now
find all the Across and Down lights. But then we’ll want to number
them. To do that, we’d have to sort them (by the coordinate). Across and
Down lights can have the same number (like 5 Across and 5 Down in our
example grid) so we want to group by lights that have the same head cell.
Then we can thread the light number again, using the zipWith ... [1..]

> allLights = join $ zipWith (map . LightN) [1..] gs
>         where gs = groupBy eqHead ls
>               ls = sort $ (lights Across grid)
>                        ++ (lights Down   grid)
>               eqHead l r = (headC l) == (headC r)

And finally, we can see the result of all the hard work, with a list of
all the lights, and their current (partial) solutions:

> main = mapM_ putStrLn $ map showLightN allLights

Obviously this is only a start on the problem. For modeling, we now need a
concept of a Clue (Clue String [Light]) and a solution – should the
solution belong to the clue? or to the [Light]s that it’s made up of.
How do we link the answer grid (where the lights contain the correct characters)
with the play grid, which contains the current letters that the player believes
to be right? And how do we update the cells, lights, and grid while playing
(or creating) a crossword?

Suggestions on these questions, and improvements or advice on the current code
are greatly appreciated!

  1. I had a complaint about this from a Planet Haskell reader: and though
    FAQ does suggest
    that it’s ok, or even encouraged to write about other things, perhaps
    I should also write a little about Haskell… ;-)