Italian Perl Workshop 2008 looking tempting

1 Aug 2008 In: perl

(Cross posted from my use.perl blog)

The Italian Perlmongers are finalising their preparations for IPW 2008, Pisa. I managed to get to the last 2 while I was working in Florence, and the organizers have always managed to get a great venue, coffee breaks with unusually nice biscuits, and put on a fantastic mix of talks and a lively hallway track.

Like all the national conferences, talks in previous years have been largely in the local language, with maybe a handful in English from expats and visitors. This year though, the organizers have also managed to get great sponsorship from Opera, Booking.com and others and as well as making the conference free, they're able to sponsor some fantastic guest speakers. Bepi has already confirmed:

  • Marcus Ramberg of Catalyst fame (speaking in English. Or possibly Norwegian :-)
  • Rafael Garcia Suarez, Perl 5.10 pumpking (speaking in English... or French -- or Spanish? ;-)
  • ... other speakers to be advised (Update Oops, I pre-announced one of the speakers currently in discussion with Bepi, must have misunderstood the email, apologies to all, but I do hope that does get confirmed as it's tremendously exciting).
  • And looking at the talk schedule, it looks like Andrew Shitov (ash) from Moscow.pm and organizer of the Russian Perl Workshop will be talking on distributed programming with WWW::Page and Gearman, as part of his European tour (also in English)
All in all, there has never been a better time to go to an Italian Perl Workshop as a visitor, even if you don't speak Italian. It's all looking quite tempting, though I already have YAPC::EU the month before, hmmm...

Some notes on logistics if you're thinking of travelling to Pisa from outside Italy:

  • Pisa airport flies various low-cost routes (Ryanair, Easyjet and others) as well as some real airlines (I like the Meridiana Gatwick-Pisa flight). Some airlines fly to nearby Florence, Bologna, or Rome, and Pisa is a convenient transport hub with trains from these cities, and Europe (Paris, Vienna, Geneva). The conference page summarizes travel options to Pisa.
  • Some useful links for accomodation in Pisa
  • The most important tourist sites in Pisa are conveniently clumped together in the "Campo dei Miracoli": the leaning tower, the Duomo and the Baptistery.
  • Florence is about 1h30 away by train or coach. Damn I miss Florence...

CPAN updates

27 Jul 2008 In: perl
I recently posted some things to Perl's CPAN that I've previously discussed in this blog.

10,25,50… sequence fun

27 Jul 2008 In: perl
Debolaz asked in #moose about the best way to create:
a list of numbers like 10,25,50,100,250,500,1000,etc without tracking any other state than the number itself
Nope, this isn't A020179 but the much more prosaic A112024. Debolaz wanted this sequence, or something similar to it, for the same reason the US used it for currency - the numbers are human meaningful, and are useful as the numbers get exponentially bigger: in his case, for the labels for a chart. This may not be the most interesting of sequences, but, as autarch pointed out, there are many algorithms that work, and exploring some of them is fun and/or instructive.

First steps

My first idea was as follows:
  • if the number is a power of 10, multiply by 2.5
  • otherwise multiply by 2
Of course, knowing whether the number is a power of 10 is the only problem here. The most trivial way is to treat it as a string, and check if it begins with "1".
sub seq_1 {
    my $n = shift;
    my $val = 10;
    my @ret;
    for (1..$n) {
        push @ret, $val;
        $val *= $val =~ /^1/ ?
                2.5
              : 2;
    }
    return @ret;
}

say for seq_1(10);
autarch suggested dividing by 10 until you get to 1 (or not) as an alternative. So we were back to a state variable like the position of the element in the sequence.

Mappy

Rather than doing it recursively, you can calculate the value from each position trivially:
    map { 10 * 10**(int $_/3) * [1,2.5,5]->[$_ % 3] } 0..10
That is to say
    10    10    10    100    100    100
*    1   2.5     5      1    2.5      5
=   10    25    50    100    250    500

i.e
10 * 10 ^
     0     0     0      1      1      1
*
     1   2.5     5      1    2.5      5
Not much else to say really, though I think indexing into [1,2.5,5] with the modded value is cute. awwaid suggested an improvement:
    map { 10**(int $_ / 3 +2) * 2**($_ % 3 -2) } -1..9
Which hurt my head for a while: it comes from the realization that the sequence is actually
    10   100   100   100   1000   1000
/    1     4     2     1      4      2

Recursive

frodwith suggested a recursive solution with an explicit parameter:
 sub your_seq {
  my $n = shift;
  return 10 unless $n;
  my @prev = your_seq($n-1);
  return (@prev, $prev[-1] * ($n % 3 == 1 ? 2.5 : 2));
}
That has a rather odd recursion style: turning it into tail-recursive style, and then eliminating the recursion using the techniques from Higher Order Perl are left as an exercise to the reader.

Is Perl an acceptable Haskell?

Finally though, it is possible to do this without any additional state, by turning the function into a little state machine. On the first iteration, the value is multiplied by 2.5 and sets the next iteration to be one that multiplies by 2, leaving the next iteration to multiply by 2 but queue up the original (x2.5) function next. Instead of actually writing a state machine, I decided I'd rather have an infinite cycle of the 3 functions, and iterate through them. Haskell has this as a builtin, but we have to define it in Perl
sub cycle {
    my @list = @_;
    my @curr = @list;
    return sub {
        @curr = @list unless @curr;
        return shift @curr;
        };
}
Now we can create a list of functions that multiply by 2.5, 2, 2:
cycle([ map { times($_) } @multipliers ])
Where "times" is a function that given one multiplier, returns a function that multiplies by that number. That is to say, it's the * operator "curried". And by great coincidence, I just uploaded Sub::Curried to CPAN, so we'll use that:
curry times ($x,$y) { $x * $y }
We'll use the "curry" declarator for the rest of this example, largely because of the automatic argument unpacking magic (I love Matt Trout's Devel::Declare). Now that we have a sequence, we need to write something similar to Haskell's scanr, that repeatedly applies a transformation, returning a list of the intermediate values.
curry scan_iterator ($it, $start) {
    my $next = $start;
    return sub {
        my $ret = $next;
        $next = $it->()->($next); # prepare next value;
        return $ret;
        };
};
And as Perl doesn't like infinite calculations, we may as well transform a slice of this sequences into an array:
curry iterator_to_array ($it, $count) {
    return map { $it->() } 1..$count;
};
At which point we can put the whole thing together like so:
say for iterator_to_array(
        scan_iterator(
            mk_seq( [2.5, 2, 2] )
        )->(10) # start value
    )->(12); # number of iterations
I've uploaded the final source code as an example for Sub::Curried. Elegant? Overcomplicated? You decide!

Update:

  • Joeri Samson pointed out a thinko (multiple/power)
  • augustuss suggests the haskell version: scanl (*) 10 $ cycle [2.5,2,2] Much neater than my faffy version. But we can do the same in Perl (we just need to define scanl and rename iterator_to_array to "take". Link to complete source code is updated.
    say for take 12 => scanl(times)->(10 => cycle [2.5, 2, 2] );
    Notice how (times) with no arguments is equivalent to a function reference, due to the autocurrying!

More Perl hate, and what to do about it: AUTOLOAD

3 Jul 2008 In: perl

A couple of interesting comments to my Five things I hate about Perl. Quicksilver pointed out that I'd missed the difference between arrays and lists. Yup, that's pretty subtle, and of course it's bitten me a few times (though oddly enough, I don't have that strong a dislike of it). Luqui on the other hand pointed out that Perl is so much more fun if you do take the plunge and use language-mutating modules, irrespective of the fact that they're not standard. I do agree, though it can sometimes be difficult getting to use these modules for work, as everyone else has to buy in to the benefits and (perceived or real) risks (learning curve, stability, speed, etc.).

So here's an example of something that I'd forgotten I hated until I banged my head against it in work's codebase: AUTOLOAD.

AUTOLOAD like other languages' "method-missing" feature, allows you to dynamically respond to a method invocation without having explicitly defined the method. So, for example, someone might call get_name on an object, and the accessor for the 'name' field will be automatically be generated.

Actually, that sounds pretty cool, no? The problems are really in the implementation.

  • AUTOLOAD subs are ugly. You have to do something like
    use vars '$AUTOLOAD';
    sub AUTOLOAD {
        my $sub_name = $AUTOLOAD=~/(\w+)$/;
        ...
    }
  • Autoloading a sub is slower than normal dispatch. This isn't a big deal of course in Perl, as you can hack the symbol table. The first time the sub is called, you generate a coderef to deal with the thing, then register that sub. On subsequent invocations, AUTOLOAD will never get called.
  • ...Which does lead to the point that in very many cases you can just pre-generate a whole lot of methods during late-compilation/early runtime.
  • If AUTOLOAD doesn't handle a method, then what? It's all too easy to forget to handle this case, and suddenly you have method calls that do nothing silently, an insidious and subtle bug.
  • And if it didn't handle a method, but the next class in the inheritance chain could have... tough — it'll never get a chance to deal with it.
  • You can't use 'can' to find out whether your object provides the method or not.

Yuck. Of course, someone has already dealt with most of the problem in a module: Ben Tilly's Class::AutoloadCAN. Using this module, you just declare a CAN sub, which may or may not return a coderef (the subroutine that will deal with the problem). Behind the scenes, overriding of the standard 'can' meta-method and an auto-generate AUTOLOAD will automagically do the right thing. Yay!

The syntax isn't that pleasant though. If we want to handle multiple autoloaded subs, we have to stick them all into this CAN routine. I'd much rather do something like:

sub foo :handles(/^foo_(.*)/) {
    print "Fooing $1";
}
sub bar :handles(/^bar_(.*)/) {
    print "Bar'ing $1";
} 
So I thought, let's use Attribute::Handlers, which lets us define these handlers. But I don't really like Perl's attributes feature, and to be honest, I'm not keen on the redunancy of naming a sub and then the regex that it handles.

I'd rather have some syntax like:

autosub (get_(.*)) {
    ...
}
and of course that suggests Matt Trout's beautiful Devel::Declare.

And it's surprisingly simple* to knock up a prototype AutoSub module which implements this. From the accompanying test.pl:

autosub (^take_(.*)) {
    my($what, @args) = @_;
    return "Took a $what";
};
autosub (^get_([^_]+)_(.*)) {
    my($adj, $noun, @args) = @_;
    return "Got a $adj $noun";
};
autosub (^do_the_.*$) {
    my ($what, @args) = @_;
    return join "," => $what, @args;
};

* Where "surprisingly simple" glosses over a couple of points, noted here briefly for reference:

  1. Devel::Declare isn't all that well documented yet. Basically, install_declarators takes 2 subs:
    1. add code at the beginning of the sub, e.g. for argument unpacking. We don't use this, so we just return an empty string.
    2. install the subroutine reference. Other examples install this into the symbol table. We just add the sub into our list of @CANS.
  2. Yay for closures! Closing over @CANS means we don't have to mess about with the name of the variable.
  3. Yes, I don't really know what I'm doing with sub exporting. I think Ricardo Signes's Sub::Exporter would do the right thing, certainly with the CAN sub. But I'm not sure where to hang the Devel::Declare logic with it, I'll investigate.
  4. Oh, and there are certainly better ways of exporting the goodness of Class::AutoloadCAN into caller() than eval "package $package; use ..." Update: mst pointed me at export_to_level, which is a standard flag handled by Exporter and other modules. But C::AC has a custom import... The other trick he suggests works: goto &Class::AutoloadCAN::import at the end of my custom import sub.
  5. If you call a regex in list context, it returns a list of all the captures found. Unless there weren't any captures, in which case it returns a list containing (1). This sucks, as I want to return the captured text in that case. (I guess I'd prefer it if it returned () but true, but that doesn't exist in Perl...) So right now, I'm checking if $1 was defined, and using the despised $& if not. Suggestions?

Hello world

1 Jul 2008 In: Uncategorized

I've enjoyed using Vox.com over the last couple of months but I'm finally taking the plunge and moving the blog here to greenokapi.net/blog.

Though I've occasionally wanted more flexibility, the trade-off with a hosted service is that you waste no time tinkering, the only thing you can do is write. But the fact that Vox doesn't, at time of writing, handle comments gracefully reduces the value of the blog -- I've had some great comments and suggestions from readers, and I'd say that the feedback is one of the most useful effects of having a blog at all. Vox's insistence on signing up in order to comment is just a barrier to feedback, but I got tired of comments like "Hey, where did my last comment go?" because the default formatter apparently doesn't like code.

Of course, Wordpress has its hatefulness too. For a code-oriented blog you need the Text Control plugin, which is the only way to prevent it from moronizing newlines and making quotes "smart". (By default, Wordpress even turns ascii smileys into horrid yellow gifs, though this at least can be switched off easily).

Five things I hate about Perl

23 Jun 2008 In: perl

Enough advocacy, let's get to the nitty gritty of 5 things I hate about Perl. (After brian d foy.)

  1. The difference between list and scalar context is subtle. One bug that bites people too often is this (for some value of "people", including me, far too often):

    sub myfunc { return }
    my $scalar = myfunc(); # contains undef

    my %hash = (
    key => 'value',
    key => myfunc(),
    );
    In scalar context, we get undef. But in list context, myfunc returns a list of zero elements. Guess which context hash declarations are in...

    Luckily in this case we'll get a "Odd number of elements in hash declaration" warning. Perl's warnings are mostly very useful and surprisingly helpful. However:

  2. Some warnings suck. Yes, some of them almost always point out an error (the void context error is useful: I usually find it means I've written: my $x => 1 instead of my $x = 1) but some are more irritating.

    When was the last time an 'uninitialized' warning had any effect on your code apart from making you have to sprinkle $x ||= '' throughout your code? Yet I usually restrain myself from adding no warnings 'uninitialized' because maybe one time in a hundred there's a genuine bug I need to know about.

    Similarly 'once' warnings sound useful, but in practise are usually because you've referred to a magic global that's used plenty of times in the library you want to use. For example, from List::Util's documentation,

    my $sum = reduce { $a + $b } @list;
    will warn about $main::a and $main::b being used only once. $a and $b are the canonical magical variables, wtf is that about? (mst suggests that these warnings are inhibited for sort only, which was their first use case).
  3. Argument passing. Yes, it's really flexible to be able to unpack @_ in any way you want but I'd like a proper way of doing it. (And yes, there are all kinds of solutions including Devel::Declare, but none are standard yet). Oh, and we don't have full destructuring bind, so yes, you can do ($x,$y) = ($y,$x) and my ($self, %args)=@_, but not my ({foo=>$foo}, [undef,undef,$bar]) = @_.

  4. It's a big language, with lots of syntax, several powerful minilanguages, a vast array of standard idioms, a large set of standard libs (including various incompatible ways of doing similar tasks) and a truly staggering number of 3rd party libs, CPAN, with even more ways of doing it. Actually, learning a big language is fine, but the bigness is one of the things making Perl difficult to parse. The other is the sheer amount of flexibility that Perl gives you to shoot yourself in the foot, change the way the language is parsed etc. Only Perl can parse Perl, and usually only after executing part of your code first. Yay for exploits on your language-aware text editor!

  5. Functional programming isn't easy or elegant. Yes, we have first class functions, but things like argument unpacking being ugly make it less convenient. Little kludges like the functions passed to map and grep taking $_ rather than the usual argument list @_ just add to the fun ($_ seems like a special-cased points-free hack, and it's far less consistent or common than Haskell's currying)

    I also dislike that regex substitution etc. can't be done as a non-destructive function.

Hmmm, there are probably more things, but those are the main ones. In particular I don't hate references (yes, they take a little learning, and I'm aware that's a stumbling block for most learners, but they are quite sensible once you do) or OO (baroque, but quite capable, and see Moose for a cute modern take on it).

The Countdown code I showed you isn't really taking advantage of Haskell's laziness. We should only have to check entries up until the point that we have enough matches (in the current code 'take 4 $ getAnagrams') and that's good. However, we have to generate the whole powerset first, so that we can sort it in reverse order of length. Ideally, we'd generate the powerset breadth-first, in order of length.

OK, so generating the powerset doesn't take all that much time and this isn't a good optimization as such, but I did think it might be fun trying to do the breadth first powerset, as all the examples I'd seen had been depth first.

But first: an interlude to marvel at one of the scariest things I've seen this year. A monadic definition of powerset.

import Control.Monad (filterM)
powerset = filterM (const [True, False])

I'm not even going to attempt to twist my head around that now, but it's very beautiful, though it's impossible to tell just by reading it what the intent of the code is.

I asked on #london.pm if anyone knew good breadth-first algorithms for powerset. Joel helpfully pasted the following

(define (combinations set n)
(if (zero? n)
(list '())
(let ((n2 (- n 1)))
(pair-fold-right
(lambda (pr acc)
(let ((first (car pr)))
(append (map (cut cons first <>)
(combinations (cdr pr) n2))
acc)))
'()
set))))

(define (power-set set)
(let ((size (length set)))
(let loop ((i 0))
(if (> i size)
'()
(append (combinations set i)
(loop (+ i 1)))))))
I think it surprised Joel (and actually, it surprised me a little) that I was more or less unable to read this at all. Yes, I know that the syntax of Lisp is incredibly simple, and you can learn all the syntax of Scheme in 20 minutes or whatever. The noise of the parenthesis, the cdrs etc. is just noise, you can filter it out if you look at it calmly. But I still don't understand where the pairs are in pair-fold-right, and cut apparently does something similar to currying, but what does that mean in context?

To cut a long story short, I was reading this as a foreign language rather than a piece of generic pseudocode. When I was very little, I read with my mother a picture book about frogs, in German. With the help of the pictures and a little imagination, it was easy to tell which word meant "frog", which meant "tree", and what the sense of the story was. After we finished, very puffed up with just how damn clever I was, I started trying to read it again, and got utterly confused about all these strange words like 'in' and 'auf' and 'dem' that I just hadn't worried about the first time around.

So... trying to see the frog for the trees, we can see that combinations gives every set of combinations of a particular length, and that power-set merely loops through 0..length, appending the combinations found for that length.

We can write combinations as a variant on the original powerset function, but which refuses to carry on once it's got enough letters:

combinations xs n = comb' xs n (length xs)

comb' xss@(x:xs) n l | n == 0 = [[]]
| l == n = [xss]
| otherwise = (map (x:) $ comb' xs (n-1) (l-1)) ++ (comb' xs n (l-1))
comb' [] _ _ = [[]]

And powerset is easy as:

powerset xs = powerset' xs (length xs)
powerset' xs l = if l < minLength
then []
else (combinations xs l) ++ (powerset' xs (l-1))

minLength = 3

We can now remove out the lines with sortBy and filter longEnough, as the new definition already presents the items in the right order.

Does this make it any faster? Apparently not: as I guessed, powerset is not the hotspot. I guess that the problem is the repeated lookups in the Data.Map — any suggestions on how to profile the Haskell code, and better algorithms to deal with it?

Schwartzian transform in Haskell

18 Jun 2008 In: perl

In the last post, I mentioned that we might be able to improve the performance of our sort using a "http://en.wikipedia.org/wiki/Schwartzian_transform".

This basically involves precaching the expensive calculation (in this case, length), sorting by the cached value, then taking just the original values.

Let's test with a list of words:

words "If on a winter's night a traveller"

So instead of the original sortBy (flip $ comparing length), we'd have something like:

  map fst 
. sortBy (flip $ comparing snd)
. map (id &&& length)

Let's read it from the bottom. First of all we create a list of tuples using the rather wonderful &&& operator from Control.Arrow.

[("If",2),("on",2),("a",1),("winter's",8),("night",5),("a",1),("traveller",9)]

Then we sort comparing the new second field of this tuple.

[("traveller",9),("winter's",8),("night",5),("If",2),("on",2),("a",1),("a",1)]

Finally we map again, getting just the first element of the tuple.

["traveller","winter's","night","If","on","a","a"]

And we can easily abstract this with a new function

sortST cmp f = map fst
. sortBy (cmp snd)
. map (id &&& f)

Now we can write:

sortST       comparing  length listOfWords
sortST (flip.comparing) length listOfWords

This is very similar to the sortBy syntax, except that we've separated out the "comparing" from the "length" clause, in order to compose the two separately for the new transformation.

Will on #geekup has been working on a Countdown letters and numbers game solver written in Python. I thought it'd be fun to try to do it in Haskell, and started with the letters game (anagram) solver.

Starting with a string of jumbled letters, the goal is to make the longest possible anagram. I remember the first time I tried to solve anagrams I jumped into the problem without thinking and got mixed up in all kinds of complicated combinatorial mess. The actual answer is very simple: let's take two words which are anagrams of each other:

  • monad
  • nomad

Both of them contain the same letters, so they are identical in some form of "canonical representation", for example

  • {a:1, d:1, m:1, n:1, o:1} -- dictionary mapping letter to number of times used
  • "admno" -- a string with the letters sorted
So we just need to consider all the subsets of the original jumbled letters in turn, and compare them against a map of:: canonical representation -> [list of words].

So for example:

pmqnrdzoa
...
 m n d oa
...
This function is called a powerset. I'm lazy so I googled a definition. We want the longest words first. The definition of powerset I found does a depth first search so it's not in order of length. What we want to do is to work on a list like this
list s =
sortBy (flip $ comparing length) -- longest first
. nub -- unique entries only
. powerset -- all combinations of
. canonicalize -- canonical (sorted) string
$ s
where canonicalize is just sort . map toLower . filter isLetter.

comparing is a nice litle utility sub that makes the above effectively the same as (\a b -> length a `compare` length b). We then flip it to reverse the ordering (and this is actually a good use for flip ;-).

Ordering by length is potentially inefficient — it checks the length of each element twice, and unlike Perl (where a string knows its own length), a string is just a list, so it has to descend the list to find it out. This is easy to optimize by precalculating the lengths, using a technique that in Perl we call the "Schwartzian transform", and I'll probably come back to this.

OK, so we have a list of subsets to compare, now we need to find a dictionary of canonical representations of words. Luckily most unixy distributions ship with one, often /usr/dict/words, but Ubuntu sticks is elsewhere.

I asked on #haskell, and was told I should use a Data.Map, Haskell's basic equivalent of a hash or associative array, but implemented using a Functional Programming friendly tree representation. In actual fact, quicksilver, mrs, and mmorrow told me the answer straight away, but let's pretend for the purpose of this post that we're working it out now :-)

Assuming I load that module, as is common, as M, I'd essentially want to call M.insertWith (++) on each element. The (++) is the concatenation operator, and it's the right thing to use because the dictionary is mapping String -> [String], for example

fromList [("admno", ["nomad","monad","Damon"]),...]

insertWith returns a new copy of the Map each time. It's like an accumulator which gradually takes on the entries from the list of words. And whenever we think about accumulators, we can think about folds.
foldl' (\m x -> M.insertWith (++) (canonicalize x) [x] m) mempty listOfWords
mempty is shorthand here for "an empty Data.Map". But we can go one better as apparently fold/insertWith is so common that there is a shorthand, fromListWith!
fromListWith (++) . map (canonicalize &&& return)
Woah! That's quite compact, and I just introduced some new syntax too: The &&& is basically saying "let's make a tuple with the result of calling these 2 functions on my input!" so it's the same as
fromListWith (++) . map (\a -> (canonicalize a, return a))
And return just means "wrap this value in the appropriate Monad". So it's a scary way of saying [a], because we're "in" the List monad. (In the same way that mempty above was an empty Data.Map, because it was "in" the Map Monad.)

Whenever I play with Map, I get angry errors about the monomorphism restriction. The way around that is to add an explicit type signature. If, like me, you're not quite sure what to put there, you can add a compiler directive to quell the error, then work out what the signature would be by calling :t my_function from the GHCI command line. (You'll often find afterwards that you can remove the signatures if you wanted to, because later on the compiler has more information to work out the types of things. It's only really during incremental development that you get the problem.

{-# LANGUAGE NoMonomorphismRestriction #-}
-- (that's the compiler directive, you can comment this out later)

makeAnag s = do
d <- dict
return $ take 4 $ getAnagrams s d

dict = do file <- readFile "/etc/dictionaries-common/words"
return $ mkdict $ lines file

mkdict :: [String] -> M.Map String [String]
mkdict = M.fromListWith (++) . map (canonicalize &&& return) . filter longEnough

longEnough = (>=3) . length
As you can see, for all the perceived difficulty of doing IO in a pure language like Haskell, it doesn't seem all that hard in this simple case. readFile reads the file, and lines splits it into an array of lines.

The final thing is to check each powerset against the dictionary. To extract the value, we use M.lookup. This function fails if it can't find a value. So we could do

  • For each powerset in the list
  • Check if it's present
  • And add it to the list if so
Which of course we could do with the Maybe type and a filter. But we want a list, and in the List type, failure is represented by an empty list []. So we can just map and we'd get something like:
[ ["anagram"], [], [], ["anagram 1", "anagram 2"], [] ]
With an empty list for each failure. We can use concatMap to join these together. So it's:
concatMap (\v -> M.lookup v dict) listOfPowersets
Though that actually returns:
[ ["anagram"], ["anagram 1", "anagram 2"], ]
which I hadn't expected. (M.lookup returned a list like ["anagram 1", "anagram 2"]. Quite literally it returned it, which in List context means it actually passed [["anagram 1", "anagram 2"]], which is why the list isn't completely flattened by concatMap. I get around this by using join. This is another of those monadic functions: in List context it does exactly what we want here, flattening this list.
getAnagrams s d = join 
. concatMap (flip M.lookup $ d)
$ filter longEnough -- 3 or more letters
. sortBy (flip $ comparing length) -- longest first
. nub -- unique entries only
. powerset -- all combinations of
. canonicalize -- canonical (sorted) string
$ s
You can look at the final Haskell Countdown code. I'll look at optimizing the sort and the powersets soon, any comments on other improvements (including better algorithms) very welcome.

Monads in Perl (take 1)

13 Jun 2008 In: perl

I've been away for a while from Haskell so I thought I should do some revision and really get my head around Monads. While I plodded through the wonderful "meet the monads" tutorial, I decided that the best way to learn would be to do. By implementing Monads in Perl. I'd highly recommend trying to implement monads in Your Favourite Language, if it supports lambdas. Perl has already been done by Greg Buchholz and rather nicely too, but there's no Monad library on CPAN so I thought it would be worth a try.

First of all, the question of how to model "types" is easily resolved. We bless each monad into the Monad class or a subclass. These can then have methods for bind and return etc.

Now I do like the haskell >> and by a stroke of good fortune, Perl allows us to overload that symbol too.

use overload '>>'   => 'Bind';

I use the string 'Bind' rather than the reference \&Bind, so that the subclasses can easily override it.

Some default bind methods in Monad.pm and Monad::Maybe etc., available here and we have some simple examples like this one (in test.pl):

my $result = 
(Writer 2) >>
L { my $x = shift; (Writer $x*2, "Doubled. ") >>
L { my $y = shift; (Writer $y+1, "Plus 1. ") >>
L { my $z = shift; (Writer $z*3, "Tripled $z. ")
}}};

Woot! OK, that's not entirely beautiful, but it's been slightly improved by the overloading of >>.

The L lambda generator is also there for readability. It's basically defined as

sub L (&) { shift }

i.e. it's an identity function, but it's an L (like lambda) and to my mind, lined up on the left, it looks pleasingly like "and then".

Nests

This didn't just fall straight out of the text editor into fully working code, of course. A blow-by-blow account of me getting confused wouldn't be especially interesting, but one big "aha" moment is worth pointing out. I realised that I was thinking of monads as being a chain of lambdas, each one passing control to the next, like OO chaining:

Chain of lambdas?

But that doesn't work, as of course then the $x, $y, $z of each scope would be separate, whereas in fact, in "later" sections, you can refer to $x too. This implies that the model is more like a nest of lambdas:

Nest of lambdas

This is made fairly clear in the Perl above, with its delimited braces, if you look at where the closing "}" are, and which opening "{" they match up with.

This is an interesting mind shift, and one that I still haven't really fully grasped, as I'll demonstrate a bit later.

Polymorphic functions on monads

In Haskell, you can call "return" in a monadic block to "lift" a value to the appropriate monad. Similarly, you can call "fail", and the function will fail in the right way (returning Nothing in a Maybe, throwing an error in IO). This is a function call, not a method, so how does it know which monad to behave as?

Of course Haskell does this with its strong inferencing typechecker. The compiler "knows" that we are in Maybe, so "fail" will be fail :: Maybe.

Perl on the other hand doesn't have a strong type-inferencing compiler... Right now I'm doing some shonky magic with caller() that works in this very simple test case (and I believe only in this test case). I think I could just simplify things and set a dynamic variable "$Monad::current_monad" on the first occurrence of Bind. Yeah, global variables, yuck. The final alternative that occurs to me would be to run the whole thing in a Reader monad which just passes the name of the monad... but I'm fairly sure that's slightly insane.

So what can it do right now?

The test script shows the current capabilities. As of r246, I have Writer, Maybe, and List implemented (the Monad superclass is effectively Identity).

I think Maybe is very useful - with some wrapper functions that raise Perl functions to monadic ones using a variety of strategies (fail on undef/0/die etc.) it could be a useful addition to the toolbox, simplifying a nested set of if checks.

The List monad already does list comprehensions, albeit with a rather yucky syntax. Which is of course the big problem, 'cos Perl programmers (and this statement may surprise non Perl programmers :-) are often obsessive about syntax.

Making it look pretty

OK, so we already added a bit of sugar with the >> overloading, and the L function for lambda generators, but it's still rather ugly with the mix of Perlish argument unpacking (my $x = shift), scope delimiters (}}}) etc.

Source filters!

The original Perl monad tutorial used a source filter to give a monadic Do notation. It's a fairly nice one as they go, but I don't really want to treat my program as a string if I can help it, so let's look at some other techniques first!

Devel::Declare

Matt Trout has been working on some crazy parsing magic in Devel::Declare. This isn't a source filter, but (I think) hooks into Perl's parser to change the way that subroutine declarations are parsed. It'd designed to give us parameter unpacking, so that we could substitute:

L {my $x = shift; .... }

with:

L ($x) { .... }

In the current version this doesn't work (you can define L like that easily, but the overloaded >> evidences a minor parsing bug (you'd have to put the expression between parentheses to get the precedence right, which loses the syntactic advantage we gain).

Still, hopefully will be fixed in a future release.

Generators

"Valued Lessons" has a beautiful post on Monads in Python (with nice syntax!). The parenthesis is not hyperbole: the post describes a monadic do block which looks about as pretty as Haskell's, but which works in a different way. We spell 'bind' (Haskell's <-) as 'yield'. So a control sub calls the 'do' block, gets out monadic values one by one as they are yielded back, and deals with the nitty gritty of binding them to the rest of the generator.

It took quite a while to understand the Python code: in fact I'm not sure I understand it fully, I really don't buy into the "Python is so easy to read" meme, and certainly the "@whatever" syntax, which seems to be 'decorators' that modify the subroutine that follows them, are rather confusing at first. But it's quite impressive, and it took me a while to replicate in Perl.

First hurdle: Perl doesn't have generators. OK, that shouldn't be an issue, I thought, because we have the CPAN. And yes, I found Brock Wilcox's Coro::Generator.

This doesn't quite do what I want though. The yield only works one way, so

my $x = yield (Monad 3);

doesn't actually bind $x to 3. I asked Brock on IRC, and apparently this behaviour is desired (I'm not quite sure why) so I forked his code to play with it :-) Also, the coroutine restarts immediately it finishes, which is inconvenient. Brock suggested yielding undef at the end, which is fine, I can do that from the control sub. (The Python version deals with finishing by throwing an exception, so perhaps it has the same semantics?)

After a lot of ugly pain, I finally got this working, and we can now do:

my $result = Do {
my $x = yield (Just 3);
my $y = yield (Nothing);
my $z = yield (Just 5);
warn "x=$x, y=$y, z=$z";
Just 6;

Why the pain? Failing to understand coroutines while trying to use them to implement monads (which I understand only very slightly) was a bad start. I found myself using the Do function to repeatedly take a value from the generator and bind it with the next value (rather than letting the monadic bind deal with those details). And even when I'd realised that the sub that I needed to bind was a lambda that would abstract the details of invoking the coroutine, I still ended up flailing around more or less at random till I finally got it working.

The current code is ugly (declared inline in test.pl rather than modularized) but the result is pleasantly magical and readable.

Props of course to Python for having powerful techniques like yield and decorators in core!

Hold the champagne

Of course the final test example, in the List monad doesn't work. Why? The List monad's bind strategy is to call the function on every element of the list, so the coroutine will get called repeatedly. And every time it's called, the execution pointer will move on.

I wonder whether the Python version has the same problem? I looked again at the Coro modules on CPAN, and noted that they are advertised as being able to implement "(non-clonable) continuations". I think this is the problem: I want to be able to take the point at which the next Bind will be called, and call exactly that same point multiple times (for the List monad). I asked various people including Brock again, and Scott Walters (the authors of Continuity, a continuation-based web application framework in Perl) and got the answer that Perl really doesn't do proper continuations. (As far as I understood it, they're more or less practically impossible, due to the way Perl models its execution context).

So, unless I've misunderstood (and please let me know if I have!) this technique is limited to monads that only call the bound function once (e.g. most of them except List). That's a shame though, as the List comprehension semantics would be lovely to express in a monadic do block.

Meta continuations

The Valued Lesson post does implement continuations monadically... Could we do that and then implement monadic do using these monadic continuations? I think the answer might be "Yes but my brain would explode trying to implement it".

Plan B

I think that the most sensible method may be to take the contents of the monadic do block and use the B:: modules to convert them from what looks like

my $x = bind ...;

to

... >> sub { my $x = shift;

. Which is pretty much the approach of Greg Buchholz's source filter. But I think a parse tree transformation may be more elegant. (This said, I don't know the Perl source or understand the opcodes, so it may just be slightly crazy).

Update: Some discussion on reddit, as Vox still doesn't support OpenID

About this blog

Osfameron's blog.


Categories