Currying in Perl

“Currying” is a simple idea that is surprisingly powerful on the one hand,
and which was surprisingly hard (at least for me) to get my head around
- possibly because the concept didn’t exist natively in the languages I
learnt first.

When you declare a function in currying style, each argument is taken
one at a time, returning a new, more specialised function each time,
until the function is finally executed at the end.

Consider this function:

    sub add ($left, $right) {
        return $left + $right;
    }

(Note we’re assuming that Perl has argument lists… which it can do
with Devel::Declare, but we’ll come to that in a bit). If this was
currying style then we wouldn’t call it with

    my $answer = add (1, 3);  # 4

but instead

    my $answer = add(1) #  a function with $left bound to 1
                  ->(3) #  now executed with $right bound to 3

Implementation

This is actually quite simple to implement in pure vanilla Perl.

    sub add {
        my $left = shift;
        return sub {
            my $right = shift;
            return $left + $right;
        };
    }

That isn’t very pretty or convenient though… handily there are several
modules to encapsulate this behaviour on CPAN. One of them is my

Sub::Curried
. The docs mention some of the other modules with
similar functionality: mine, which uses the shiny goodness of

Devel::Declare
, has the advantage that you can declare a
curried subroutine with a simple, perlish syntax: in fact you do it more
or less exactly like the example I gave above. (The difference is that
we can’t override the sub keyword, so we create a new one,
curry‘:

    curry add ($left, $right) {
        return $left + $right;
    }

Using D::D, the moment the Perl parser sees a symbol ‘curry’ being
compiled, it hands control to our custom parser, which then injects code
into the source while it’s still being compiled. We can do some cunning
stuff, including telling it “Hey, when you get to the end of the scope
you’re compiling, inject some more text!”

I used to keep hold of an array @filled of the partially
applied arguments and then apply all in one go at the end. But it seems
to be more elegant to actually transform into sometihng like the
“vanilla” example I gave above. I say “something like” but it’s
actually little more complicated:

    sub add {
        return ROUTINE unless @_; # a reference to this subroutine
        check_args(2, @_);        # check we weren't called with >2 args
        my $f = sub {
            my $left = shift;
            return ROUTINE ...
            check_args...         # check we weren't called with >1 arg
            my $f = sub {
                my $right = shift;
                return $left + $right; # actually do the thing
                };
            $f = ...
            };
        # now call the subroutine for each 
        $f = $f->(shift) for @_;
        return $f;
    }

Yikes! The extra boilerplate is there to make sure that we get some
niceties:

  • Die with informative error message if we’re called with too many
    arguments.

  • Handle being called with multiple arguments: i.e. treat add(1,2)
    the same as add(1)->(2)

  • When called with zero arguments, the sub returned is logically (and
    cutely!) an alias. (I think the ROUTINE trick isn’t needed, will
    probably disappear with a restructure)

  • Handle functions that return multiple values smoothly (not shown
    above)

Uses of currying

OK, so we’ve done a lot of furious work behind the scenes to make
something look very simple while doing a whole lot of extra work. But
why? When you program in Haskell, you’ll find currying useful at every
turn: it’s a hunch that it’d be useful in Perl too. There are certainly
some cute examples of it in common use:

Currying the invocant

Modules that do setup on a class would often use class methods: for
example:

    package My::Class;
    use base 'Class::Accessor';
    __PACKAGE__->add_accessor('foo');
    __PACKAGE__->add_accessor('bar');
    __PACKAGE__->add_accessor('baz');

__PACKAGE__ refers to the current package, so this is the same as
writing:

    My::Class->add_accessor('foo');
    ...

It’s also utterly hideous. Moose on the other hand provides a syntax
like this:

    package My::Class;
    use Moose;
    has 'foo' => ...
    has 'bar' => ...
    has 'baz' => ...

What’s going on? Incredibly, ‘has’ is actually a class method just like
‘add_accessor’ was! But the leftmost argument (the invocant, usually
referred to as $self for object methods, or $class for
class methods) has been curried into it. This is because Perl’s
importing is dynamic and instead of just copying the method.

    *{CALLER::has} = \&has;

It can do somethingl like this:

    *{CALLER::has} = has($CALLER); # assuming a currying 'has'

Sections

In Haskell you can take references not just to functions, but to
operators.

    add = (+)  -- alias
    add 1 2    -- result is 3
    (+) 1 2    -- also 3

You can also take ‘sections’ of these operators, by ‘partially applying’
either the left or the right hand side.

    add2       = (+ 2)
    halve      = (/ 2)
    reciprocal = (1.0 /) 

This isn’t the same as currying, though you could implement sections
with curried functions:

    curry divide ($left, $right) {
        $left / $right
    }

    my $reciprocal = divide(1);         # 1    / $ARG
    my $halve      = flip(divide)->(2); # $ARG / 2

That’s rather ugly though, and remembering to flip the
arguments is annoying. So, again with Devel::Declare I
implemented
Sub::Section
(not yet on CPAN, the github repo is linked for now).

    my $add2         = op(+ 2);
    my $halve        = op(/ 2);
    my $contains_foo = op(=~ 'foo');

And of course:

    my $greet        = op("Hello " .);
    say $greet->('World');

Talk

I’ll be talking about Functional Perl at the
href="http://northwestengland.pm.org/meetings/004.html">NorthWestEngland
perlmongers tomorrow Tues 5th in Manchester, UK.

“Functional Pe(a)rls” at NorthWestEngland.pm, Manchester, 5th May

The nice chaps at NorthWestEngland PerlMongers have organized a technical meeting on 5th May, at the Manchester Digital Development Agency.

I’ll be doing a 3rd version of my Functional Pe(a)rls talk, about Haskell-inspired craziness in Pure Perl. And Matt Trout and Ian Norton will be talking about OO Database design, and Maildir migration, so there’s something for everyone.

Please come and say hello! Perl meetups are quite informal and friendly, and we’ll be meeting up for drinks at the Salisbury Pub afterwards (Perl talk strictly optional. Come for the beer, or to tell us how much you prefer C#, or that you’d rather be at PHPNW just down the road… d’oh!)

(rough) Grids in Haskell

(This isn’t a full blog post, but a note of a few things about implementing game grids in Haskell).

  • A [[Cell]] structure seems to make sense for a lot of boards. In fact, even the problems I’m looking at might be approached simply indexing into row then col each time you want to access a list. However it feels inelegant, and some of the things I want to do (looking at lines of game pieces in a given direction N/S/E/W) would be more elegant if I can simply traverse the grid in an arbitrary direction.
  • morrow remembered there had been a discussion on haskell-cafe about grids (+ +) recently. It seems to be mainly about infinite grids, but has lots of interesting stuff to absorb.
  • paolino remembered that comonads were useful in grid representation. I don’t understand what they are but google finds http://blog.sigfpe.com/2006/12/evaluating-cellular-automata-is.html which looks interesting.
  • I think I want a zipper, but didn’t know how it would work on [[Cell]]. Saizan clarified: “if Zipper [] a is the zipper for [a], then the one for [[a]] is Zipper [] (Zipper [] a), and depending on which layer you use the next/previous operations you move in one dimension or the other”. This isn’t quite true though, as when you move up/down by rows, you have to somehow remember the columnwise index.
  • (Saizan referred to fmap’ing operations to keep the columns in sync, which is, I think, a more higher-order way of doing the same thing :-)
  • dolio posted an ADT representation of a grid. I don’t understand that at all yet, but it is pretty… He also wrote a Traversable instance (this would appear to be a generic way to expose functions to move around in your arbitrary datastructure)
  • I decided to try to fold a [[Cell]] into a Cell { value=x, down=d, right=r }. This took me several hours last night, during which I cursed my feeble brain, and the fact that mutable references would have made the task trivial in other languages. (However, mutable references would make the grid useless for many useful things, like keeping intermediate copies of grid state). My no-doubt crappy code is on github (I’ll come back to it in more detail later)
  • While I’m whining about haskell being “too hard” I should mention that I’ve got bogged down in representing grids (for games or spreadsheet) in Java, Perl, and Javascript too, so I think it may be more a case of my brain being feeble. That or I’m just overcomplicating things…
  • (As I remember it, the OO representation of grids is simpler to get started with but then I get bogged down with issues of multiple inheritance etc.)

Comments and suggestions welcome, I should write a full post on this soon.

More longest paths, and sick folds.

This week’s simple longest path exercise seems to have had more mileage in it than I expected. Thanks to everyone’s comments and suggestions, I’ve updated with a number of times with, among other things, an improved Haskell version that acts on path elements instead of just characters.

But I had intended to do a version of this in Perl. I mentioned last time that the typical approach in this language would be nested hashes… which elicited the comment:

Ugh, one of the absolute worst things with perl: nonsensical handling of nested data structures. Every other modern language does this vastly better.

Em’s fightin’ words! So let’s try it in Perl and see how it looks…

First of all we need to create the multilevel hash. We want to end up with something like this:

        {
          '' => {
                  'a' => {
                           'a' => {}
                         },
                  'qux' => {
                             'wibble' => {}
                           },
                  'foo' => {
                             'bar' => {
                                        'baz' => {}
                                      }
                           },
                  'aa' => {}
                }
        };

The usual way to do this is to maintain an idea of the node that we’re currently in, and follow it to the next item of the path, changing the $node variable to the next level down. So, if we had a path /foo/bar/baz, we’d start off with an empty hash {}, create the next node { foo=> {} } and start the process again with the new empty hash {} and the remaining path bar/baz.

Remind you of anything? I thought it looked a little like a fold with an accumulator variable: the $node is the accumulator, which is initialised to {} and which gets set to each corresponding node in turn. The path is the list that we fold over. And it tickled me that we can write:

use List::Util 'reduce';
sub mk_hash {
    my %hash;
    reduce { $a->{$b} ||= {} }
           \%hash,
           split '/'
               for @_;
    return \%hash;
}

In polite company (well, in Perlish polite company at last), calling map in a void context is frowned upon, as it’s abusing a functional operator for its side effects. Here we are abusing reduce, making it destructive on a tree, and discarding its (useless) return value (the final leaf $node element). But it does work exactly as required: (the for @_ means that we run this sick fold on each path in turn) and returns a multilevel hash.

I’m aware that the example above might confirm the original commenter’s dislike of Perl hackishness. Personally I think it’s quite cute, mixing a beautiful functional idiom with pragmatic mutation. This may be brought on by my dabbling with Perl and FP programming languages, and is probably incurable. If you are either a) disgusted or b) confused by the Perl code above, you might like to try implementing the traditional algorithm (please let me know!) We’ll have a look at the equivalent in Haskell shortly.

There isn’t a leaves function built in for multilevel hashes (we do find some in the Tree:: and Graph:: namespaces, but let’s stick with the core datastructures for now), so let’s build one. This version is actually very similar to an FP language definition I think.

sub leaves {
    my ($node, $path) = @_;
    if (keys %$node) {
        # We still have to descend into all the leaves
        map {
              my ($k,$v)=@$_;
              leaves($v, [@$path, $k] );
            }
        kv_list $node
    } else {
        # the base case - we are at a leaf, so return path!
        join '/', @$path;
    }
}

Sadly, though there’s an inbuilt iterator each, there isn’t a flat list of keys/values, so we’ll have to define the function kv_list used above. The easiest way would be:

sub kv_list {
    my $hashref = shift;
    map { [$_ => $hashref->{$_}] } keys %$hashref;
}

Though we could also create an “iterator_to_list” function that acts on each.

And now, the Haskell version: let’s start with creating the tree. Instead of a hash, we’ll use Data.Map which is similar, but has an implementation better suited to purely functional usage. Of course, we can’t use the same technique as the Perl version: we don’t have destructive mutation, and if we did a fold, it would merely return the final leaf node of each path. So we start with a simpler recursive definition. I say “simpler”, but I needed a lot of help with this. Luckily #haskell came to the rescue: rwbarton pointed out that I’d need a newtype to do a recursive map, Heffalump improved it with unNode, sjanssen and quicksilver helped with a missing node constructors etc. So eventually my naive version of the code looked like:

import qualified Data.Map as M
import Text.Regex

newtype Node = Node { unNode :: M.Map String Node }
    deriving Show

dive :: Node -> [String] -> Node
dive n     [] = n
dive n (s:ss) = let v  = M.lookup s (unNode n) :: Maybe Node
                    n' = case v of
                            Nothing -> Node M.empty
                            Just v' -> v'
                            :: Node
                in Node $ M.insert s (dive n' ss) (unNode n)

splitpath = splitRegex $ mkRegex "/"

make_tree ps = foldl dive (Node M.empty) $ map splitpath ps

This is quite clumsy, and could be improved by replacing the lookup/insertion with a single insertWith. The Data.Map API is quite large and you need to play with it a bit to get your head around it! And of course I got a better version, from sjanssen:

import qualified Data.Map as M
import Text.Regex
import Data.Monoid

newtype Node = Node { unNode :: M.Map String Node }

instance Monoid Node where
    mempty = Node M.empty
    mappend (Node x) (Node y) = Node $ M.unionWith mappend x y

splitPath = splitRegex $ mkRegex "/"

nodeFromPath = foldr (\x n -> Node $ M.singleton x n) mempty . splitPath

nodeFromPaths :: [FilePath] -> Node
nodeFromPaths = mconcat  . map nodeFromPath

There are a number of clever things here! First of all, he returns the root node using a fold, which I was dubious about being able to do. That’s because instead of diving from the root, he starts from the right (foldr) and constructs the leaf node, then inserts that as the value of the node above, all the way up to the top. Very cute. Then note that he’s using the Data.Map API elegantly: notice how he uses M.singleton where I would have naively done a fromList of a single element, for example. Also, instead of having to descend each tree, updating, he simply creates a set of single path trees, then merges them together at the end with M.unionWith. Finally, it’s using Monoids, which (as well as being a classic Doctor Who monster) is a fancy way of saying “they behave like appendable things” (more or less). In fact you could write the snippet above without, but it does give us the convenient mempty and mconcat functions.

So, which version do you prefer? I mentioned to sjanssen that I thought the Haskell one had more “concepts”, but actually both versions use folds and some sort of mapping data structure. One is destructive, the other pure.
Of course, as he pointed out, not knowing Perl, my version looked incomprehensible.
But if you didn’t know how to write the sick fold above in Perl, you could have easily written a simple recursive version. Whereas the Haskell version does require some knowledge, like the use of recursive newtypes (which I found very confusing — and hard to compile/debug) and as I’ve mentioned, the Data.Map API is large, as befits its power.

I’m too tired to attempt the leaves function in Haskell — please do comment if you give it a go!

There’s the nub (snippet in Perl and Haskell)

Here’s a simple problem, with solutions in Perl and Haskell.

@joel: suppose I have a list of strings (they happen to be paths) -
       how might I find only the longest instance of each path?
@joel: that is give /foo /foo/bar /foo/bar/baz /qux I only want back
       /foo/bar/baz and /qux
@joel: do I need to build a trie myself? my cpanfu is failing me

As this was a Perl channel, Penfold suggested splitting on “/” into a multilevel hash, then
outputting the leaf nodes. Then pjcj suggested, performance permitting, to go through the
list deleting each entry that is a substring of the previous entry: that is, treating the
strings simply as strings of characters.

This makes it sound like you have to compare each string with the strings shorter than it,
but it turns out that you can get exactly the right behaviour on an asciibetically sorted
list. For example, a set of paths similar to the ones Joel quoted might get sorted as:

  • /foo
  • /foo/bar
  • /foo/bar/baz
  • /qux
  • /qux/wibble

You can see that this should reduce to

  • /foo/bar/baz
  • /qux/wibble

as the list collapses the previous element every time we realise that it is a substring
prefix of the following element.

We can do this in a single pass using a fold, which is a pattern that’s less common in
Perl than other idioms from functional programming like map and grep, but
which is implemented in List::Util‘s reduce. A typical example of reduce
might be this:

  sub sum {
    reduce { $a+$b } @_;
  }
  print sum (1,2,3); # = 6

Of course there the two elements $a and $b are the same kind of thing (a number) and the
reducing function ($a+$b) returns a number in turn. But for the longest paths problem,
we need something else: a list of results. This is a common idiom in Haskell: using an
“accumulator” as the initial value. However Perl’s reduce uses the first element
of the list as the initial value. That’s often a very good strategy often known as
foldl1, but for a long time I thought that the Perl implementation was weak as it
didn’t allow a more flexible arbitrary value. I was wrong. LeoNerd pointed out in #moose
that you can just pass the init value as the first element of the list! (This works in Perl
of course, as lists are untyped). For example, this subroutine would be a (rather silly)
way of turning a list into a list reference:

  sub as_list {
    reduce { [@$a,$b] } [], @_
  }

So, we’re going to call our function like this:

  my $longest_list = longest qw( /foo /qux/wibble /foo/bar/baz /qux /foo/bar );

And we can implement it like this:

  sub longest {
    reduce {
        my ($acc, $val) = ($a, $b); # 'accumulator' and 'value'
        my $last = pop @$acc || ''; # The "last" value is an empty string
                                    # the first time around

        # We return a list reference, which will be the accumulator 
        # the next time around
        [ @$acc,
          $val =~ /^\Q$last\E/ ? () : ($last), # collapsed, if substring
          $val ]
      }
      [],      # initial acumulator (empty list)
      sort @_; # sorted input list
  }

OK, so I promised myself I’d sketch this in Haskell too. It should be slightly simpler
as the hackish popping off the end of the list is replaced with a nice
pattern match.

  longest :: [String] -> [String]
  longest = foldl aux [] . sort
    where aux []         v = [v]
          aux xss@(x:xs) v = v :
                            (if x `isPrefixOf` v
                                then xs
                                else x:xs)

As often happens when converting an algorithm to Haskell, the result list is
backwards, as we’re consing the new result rather than appending to it. But in
fact, it’s not especially elegant: the conditional concatenation is a bit
clumsy for example. I asked for feedback on #haskell. And, as so often
happens on that channel, I got an altogether more compact version: augustss
suggested something like:

  longest2 = nubBy (flip isPrefixOf) . sortBy (flip compare)

nubBy is precisely the pattern that reduces a list based on whether adjacent elements
are to be collapsed or not. (Apparently you may need to reverse the sense of nubBy in
GHC 6.10 — I’m using 6.6.1 — which simplifies to nubBy isPrefixOf. Nothing
like backwards compatibility…)

Update: sortBy (flip compare) is dolio’s suggested improvement to the original reverse . sort. The original version does read better I think, but flipping the comparison is a bit more efficient than having to reverse the list after sorting it. It also starts to produce values lazily before finishing the sort.

Update: Pumpkin noted that a trie would be more efficient than nub… D’oh! nub, while it looks elegant and compact above is actually losing the benefit of us having sorted the list in the first place (it’s implemented using a set of chinese box filter functions – effectively the same as a linked list of comparisons. If we match the nub, this is very efficient — that’s the outermost box already! However if we’re not a prefix, then we’ll uselessly check if we’re a prefix of every other path, even though we couldn’t possibly be.

What we really need is precisely the semantics of Unix’s uniq, which expects a sorted list, and can therefore do a single pass, like our reduce version. Olathe pointed out a solution with map head . group which is almost what we want… except we need the -By version. Given a new function uniqBy, we just need to substitute it for nubBy and we’re suddenly efficient again!

  uniqBy eq = map head . groupBy eq
  longest3 = uniqBy (flip isPrefixOf) . sortBy (flip compare)

Update 2009-01-27: It occurred to me that instead of all the flipping, we could simplify uniqBy by making it take the last element instead of the head.

  longest4 = uniqBy' isPrefixOf . sort
  uniqBy' eq = map last . groupBy eq

Update: X pointed out that there’s a bug: ["/a", "/aa", "/aaa"] will all get smushed
together as they’re substrings. Actually whether this is a bug to the original specification
is debatable – Joel did mention “strings”, however he did also mention that they are file
paths, so let’s try to make them do the right thing:

In fact the basic core uniqBy' isPrefixOf . sort can remain exactly the same.
But instead of a single string "/foo/bar/baz" we’ll want to be passing
it a list of strings like ["foo","bar","baz"]. As Haskell’s list processing
is completely generic, all the building blocks like isPrefixOf and sort
will Just Work exactly as before! The only complication is splitting and joining the
strings, something that’s trivial in Perl. I
wrote about
splitting words in Haskell
previously, but in the mean time Brent Yorgey has created
Data.List.Split
which should do exactly what we want. However I can’t install it on my laptop’s ubuntu packaged
ghc 6.6.1, so for now I’ll fall back to Text.Regex.splitRegex. Then, to join the
path separator again we need to intersperse it, but that doesn’t flatten the list,
so we end up with something like ["foo", "/", "bar"]. Control.Monad‘s
join function, generic as it is, actually does the right thing here. So we end
up with:

  longest5 :: [String] -> [String]
  longest5 = map rejoin . uniqBy' isPrefixOf . sort . map split
      where split   = splitRegex $ mkRegex "/"
            rejoin  = join . intersperse "/"

We can test this against an input like [ "/a", "/aa", "/a/a" ] to check it’s all
correct.

The same approach should work in Perl, except that splitting/joining will be slightly simpler,
while the equivalent isPrefixOf logic is more complex, as it doesn’t have the same
generic list comparisons. I’ll leave that as an exercise to the reader :-)

Talking of which, thanks to Will for the Python/Erlang versions in the
comments, “Anon” for an Ocaml solution, and “Programmer” for a Perl challenge
I’ll come back to if time allows.

Update 2009-01-30: Will pointed out that my “improved” testcase isn’t. Better would be ["/a", "/aa", "/b", "/b/b"] to check that /a and /aa are distinct, but /b is merged into /b/b.

He also sent an improved Erlang version, thanks!

Crossword puzzles in Haskell

Every year or so I come back to the problem of writing a crossword puzzle
compiler/player. I think Javascript would be the most promising for a
web-based player, though I’ve given it a go in Java and Perl too.
Modeling the problem is interesting in an Object Oriented language – I would
find myself getting bogged down with “Lines” and the similarities between
rows (Across) and columns (Down). I have a suspicion that OO Roles might be
a more expressive way to model this. Anyway, given that I’ve not been writing
much about Haskell 1, this is a good time to redress the balance.

In the OO implementations, Cells would refer to the “Light” (group of adjacent
cells running Across/Down) that they’re in. And the Light would of course refer
to the cells… this idea filled me with terror in Haskell, as it involves
“Tying the Knot”, which seems terribly clever and confusing. As it happens, you
can often get away with having mutual references in Functional Programming, as
they just spontaneously turn out not to be necessary. So far, this seems to be
the case, though I think that if I take it to the point of navigating the grid
from a UI, I may need an explicit structure to manage this, like a zipper.

This is a “literate Haskell” post. You should be able to save it as
“crossword.lhs” and run it with runghc or ghci (calling
the main function to test it).

We start off with some imports: the List and Char modules
(which I’ve naughtily not specified) and the handy join
function to flatten a list.


> import Data.List
> import Data.Char
> import Control.Monad (join)

They say that much of the work in Haskell is defining the types.
Direction is an Enum for the direction of a Light.
Cell is either a Block (a black square) or a Cell (either blank,
or already filled in). I guess I could model Block | Blank | Cell
but don’t yet see an advantage to that.


> data Direction = Across | Down
>     deriving (Show, Eq, Ord)
> 
> data Cell = Cell  (Maybe Char) Coord
>           | Block              Coord
>     deriving (Show, Eq)
>
> type Coord  = (Int, Int)
>
> coord (Cell _ c) = c
> coord (Block  c) = c

Directions can be sorted (which may come in useful for showing Across
clues before Down ones), and this can be done automatically by deriving
Eq and Ord. But how would we sort a Cell? We’ll do it by the (x,y)
coordinates, which means I can’t use automatic derivation. Perhaps
I could swap the order of arguments and still use that, but for now I
defined a custom compare.


> instance Ord Cell
>     where compare l r = compare (coord l) (coord r)

Lights (distinct from Clues, which may be spread over 1 or more
Lights) are a list of cells in a given direction. (It would be nice to
specify that the cells really are contiguous, not sure if this is something
that fundeps would be useful for?)


> data Light     = Light [Cell] Direction
>     deriving (Show, Eq, Ord)
>
> -- we'll sometimes want to know the first cell in a Light:
> headC (Light (c:_) _) = c

Of course Lights are numbered… but with the algorithm that I’m using, we
don’t know the number (like 5 Across) at the time we create it. I created
a new type, LightN, but perhaps I should have modeled with a
Maybe Int instead.


> data LightN    = LightN Int Light
>     deriving Show

Now we need some test data. I started with a grid from the wonderful
Araucaria,
Cryptic
Crossword No. 24298
.


> grid = textToCells [ 
>     "TRIPOD#        ",
>     "# # # # # # # #",
>     "PARSIFAL#RESCUE",
>     "# # # # # # # #",
>     "###            ",
>     "# # # # # ### #",
>     "ANNA###        ",
>     "# # # # # # # #",
>     "        ###PARE",
>     "# ### # #U# # #",
>     "         S  ###",
>     "# # # # #E# # #",
>     "      #INFERNAL",
>     "# # # # #U# # #",
>     "FRAGMENT#L     "
>     ]

We’re going to want to output the grid, lights, and other objects, so let’s
define some functions to do that.


> showCell (Block _)         = '#'
> showCell (Cell (Just c) _) = c
> showCell _                 = ' '

> showLightN :: LightN -> String
> showLightN (LightN n l@(Light cs d)) =
>        show n ++ " " 
>     ++ show d 
>  -- ++ " " ++ show (coord $ headC l) 
>     ++ ": "
>     ++ show (map showCell cs)
>     ++ " (" ++ (show $ length cs) ++ ")"
>     

Similarly, we want to parse the list of strings above into a list of
crossword cells. textToCells threads the row and column
number with every character in the grid by zipping the list with the
infinite list [1..], which I think is quite cute, though
there are no doubt more elegant versions (list comprehensions?)


> charToCell :: Char -> Coord -> Cell
> charToCell '#' = Block
> charToCell ' ' = Cell Nothing
> charToCell  c
>     | isAlpha c = Cell (Just c)
>     | otherwise = error $ "Invalid character '" ++ [c] ++ "'"
> 
> textToCells :: [[Char]] -> [[Cell]]
> textToCells                     = zipWith  makeRow       [1..] 
>     where makeRow  row          = zipWith (makeCell row) [1..] 
>           makeCell row col char = charToCell char (row,col)

But working out what cells are is the easy part! We now want to
know which cells form a light — i.e. groups of more than 1 non-block
cell in either direction Across/Down. To get data for both directions,
it’s easiest to run in two passes, one in the normal direction, the
other transposed. (I did consider trying to do both at the
same time, but it hurt my brain: a one pass solution involving magical fumplicative
arroids or somesuch is left as an exercise to the very clever reader).


> lights dir grid = concatMap 
>                     (flip lightsInLine dir) 
>                     $ rot dir grid
>     where rot Across = id
>           rot Down   = transpose
>
> lightsInLine :: [Cell] -> Direction -> [Light]
> lightsInLine cells dir = 
>     let l  = filter isMultiCell 
>                $ groupBy areCells cells
>     in  map (\c -> Light c dir) l
>
> areCells x y = isCell x && isCell y 
> isCell (Cell _ _) = True
> isCell  _         = False
> 
> isMultiCell (x:y:_) | areCells x y = True
> isMultiCell _ = False

So… where have we got with all of this modeling? Well, we can now
find all the Across and Down lights. But then we’ll want to number
them. To do that, we’d have to sort them (by the coordinate). Across and
Down lights can have the same number (like 5 Across and 5 Down in our
example grid) so we want to group by lights that have the same head cell.
Then we can thread the light number again, using the zipWith ... [1..]
trick:


> allLights = join $ zipWith (map . LightN) [1..] gs
>         where gs = groupBy eqHead ls
>               ls = sort $ (lights Across grid)
>                        ++ (lights Down   grid)
>               eqHead l r = (headC l) == (headC r)

And finally, we can see the result of all the hard work, with a list of
all the lights, and their current (partial) solutions:


> main = mapM_ putStrLn $ map showLightN allLights

Obviously this is only a start on the problem. For modeling, we now need a
concept of a Clue (Clue String [Light]) and a solution – should the
solution belong to the clue? or to the [Light]s that it’s made up of.
How do we link the answer grid (where the lights contain the correct characters)
with the play grid, which contains the current letters that the player believes
to be right? And how do we update the cells, lights, and grid while playing
(or creating) a crossword?

Suggestions on these questions, and improvements or advice on the current code
are greatly appreciated!



  1. I had a complaint about this from a Planet Haskell reader: and though
    the
    FAQ does suggest
    that it’s ok, or even encouraged to write about other things, perhaps
    I should also write a little about Haskell… ;-)

    back

Pocket Perl (introduction in Italian): short review


Pocket Perl cover

First of all, a disclaimer: I know “larsen” (Stefano Rodighiero) not just as an
ex-colleague (he is one of the most respected senior programmers at DADA in
Italy), or through the Italian Perl
community
, but also as a good friend.

But of course, those were excellent reasons to look forward to his book: an
introduction to Perl in a pocket series of roughly the same format and price
point as the UK Teach Yourself
series. The community’s perl.it website lists
various resources including tutorials and books in English and Italian, but till now
they haven’t recommended an introductory book in Italian. I think that could be about
to change.

Modern Perl good practices

I’ve seen a lot of criticism of introductory books for propagating the kind of
cowboy Perl that gives the language a bad name. Right from the first chapter,
Larsen introduces:

  • CPAN for installing modules
  • make test
  • the Perl Community: where to find out more, get help, and improve your skills
  • strict and warnings
  • perldoc

He continues with this throughout: every chapter finishes with a list of
PODs and other resources to learn more. Variables are my‘d and code
is well formatted. Small sections going into good practices are dotted
strategically throughout the text, rather than consigned to an unread appendix:
for example, commenting and POD are detailed by page 55.

Teaching

The first half of the book, Chapters 2-4 cover basic string and array
operations, control structures (with foreach covered before C-style
for, references and subroutines, scope, context, subroutine
references, map/grep (with an example on Schwartzian
Transform) and exceptions.

The book also covers Perl 5.10 features like say (which it uses
throughout) and given/when.

This is obviously quite a whistlestop tour, but is clearly structured,
with concepts introduced as they are needed, and with explicatory
diagrams where helpful.

Chapters 5-6 introduce packages and OO programming. While it’s a short
overview, it covers the bases with inheritance, DESTROY, AUTOLOAD
(though is that a good thing? ;-), and some good practices like Module::Starter
and Class::Accessor.

Chapter 7 is a nice 25-page section on Regular Expressions with helpful diagrams,
examples and reference tables. Pleasingly, he finishes off by mentioning
Regexp::Common and cases not to use regexes for (HTML::TokeParser)

Finally, Chapter 8 takes us through Perl and the outside world. This is a mere
10 or so pages on file system tasks, and the same again split between CGI and
GUI programming. Given the importance of Perl for sysadmin glue and web
programming, this could be considered the book’s principal (only?) weakness.
On the other hand, trying to teach these disciplines in a short introduction
would make it a completely different book, and would have left out the
excellent introduction to the core matter. The interested reader who has
absorbed the contents of the book will have a much better base to learn the
next steps in whichever direction they need to go, so on balance, this is very
much the right decision.

But it doesn’t quite stop there. The appendix lists many main books like the
classic O’Reilly Camel Book

Conway’s
Object Oriented Perl,
Dominus’s
Higher Order Perl
etc.
It also gives a short guide to how to choose between the multiple modules
available on CPAN, and a brief overview of some useful modules and frameworks:
(Catalyst, TT etc.) Even if you disagree with some of the
featured choices (I did ;-) at least you know that the student has enough
pointers to learn better for themselves: as if to confirm that point, the book
ends with a 2-page reminder on the Perl community and the resources it offers.

Look and feel

Briefly, the typography is excellent, very clean and easy to read. The
diagrams are well laid out, and my only minor quibble are the little grey icons
which are unattractive and hard to distinguish from each other (they each have
a different symbol and apparently have distinct meanings, but I ended up
parsing them as “interesting note”). There are some oddities with punctuation
that lead me to suspect the editorial proofreading may not have been especially
thorough? Likewise, there are some minor errors in the diagrams on regexes,
which is a shame as those are otherwise very useful for a learner.

Conclusion

I think it’s excellent news for the state of Perl in Italy that this book has
been published. For a mere EUR 7.90 an Italian reader can now pick up a
compelling and thorough introduction to Perl in their native language.
Apparently it is in general release in Italian bookshops, otherwise you can
buy it online from one of the links on
the Pocket Perl site.

Functional Pe(a)rls v2 (now with Monads!) at the London Perl Workshop 2008

On Saturday I gave an updated version of my Functional Pe(a)rls talk. This time around I cut the whistlestop tour of builtin FP techniques in Perl (map/grep/join) and added a section on Monads – what they are and how to implement them. I’d originally worried that the slides might have been over-academic and hard to understand, but luckily Dave Cantrell had presented closures that morning, and lots of people had spoken about the cool and evil things you can do with Devel::Declare. Also the Monad talk did handwave over some minor details (like munit/join :-) and it helped to play a little for laughs – I should probably subtitle the talk

“Imperative programming… in Pure Perl!”

I think the talk went down well and I also won a book (of maps of Old London Town) from the nice people at Nestoria for “Best Topic”, which I guess means I can refer to it as an “award-winning” talk.

I’ve uploaded the slides of my award-winning talk on functional programming in Perl to slideshare ;-)

The conference

Mark Keating, the Shadowcat team, all the volunteers, sponsors, and Josette’s O’Reilly bookstall, and of course the speakers, delivered a fantastic event which was not only free, but even had free beer! (Courtesy of Venda and Shadowcat, we drank the pub out of Witchfinder ale within 30 minutes, but the free booze was flowing till around 10pm, which was incredibly generous and appreciated — less so the following morning).

It was great to see some quality talks, a uniformly excellent lineup of lightning talks (including one which broached Italy’s candidacy for YAPC::EU::2010), meet up with old friends and colleagues, and put some more names to nicks and faces.

Update 2009-02-06: Mark uploaded the videos from room 1 this week, and today they’re up on yapc.tv, including Functional Perls. Yay, mdk++, andy.sh++! The sound is better than many conference videos, so you can hear me um and er in high quality, but the slides aren’t included on the video, so you may wish to follow along on slideshare or download the slides.

London Perl Workshop tomorrow!

I’ll be at the London Perl Workshop tomorrow: it’s a free 1-day conference, and was lots of fun last time I attended. I’ll be talking on “Functional Pe(a)rls”, an extended and updated version of the talk I gave at the Italian workshop, and I’ll be discussing Sub::Curried and Monads in Perl. See you there ;-)