Perl, Haskell, stuff
One of the advantages of demonstrating your ignorance in public is that you may receive useful corrections... thanks to everyone who replied on these recent posts, I found the comments very instructive, and thought it was worth writing up as a new post.
ddarius got in touch to mention that I might want to use “strict fields”. This might be an issue if I'm incrementing, say, turn, but not actually using the value. I'd end up building up a “thunk” (an unevaluated expression) like 1+1+1+1+1+1+1, which will get evaluated later (and if too much later, it could cause some problems like stack overflow). Actually, I don't think this will happen in this particular case (I'll be printing the turn count every time) but it's not hard to implement (just need to put a “!” before the strict fields)
> data GameState = GameState {
> turn :: !Integer,
> score :: Integer,
> location :: Location
> } deriving Show
Also, as nominolo suggested, in conjunction with -funbox-strict-fields, it can open up some possible optimizations.
Now this is an interesting one. I was whining in my last post about Data.Map.lookup
but which monad is it in, and more to the point, why?
As you might imagine, I really wasn't getting it... and the code I wrote around it rather reflects that...
Vincenz, Rich Neswold, and “rm” all pointed out in rapid succession that the function I'd created for parseMap was completely redundant. Here, for comparison, is my first version.
> parseMap m s | M.member s m = do v <- M.lookup s m
> Just v
> | otherwise = Nothing
I wrote this because, from the ghci command line, it looked like M.lookup threw an error if it couldn't find the key. The suggestion, which is rather briefer is as follows:
> parseMap = flip M.lookup
The flip is only there because parseMap and Data.Map.lookup take their arguments in opposite orders. Otherwise lookup and parseMap are identical!
But how does this return a “Just” or a “Nothing” appropriately? Apparently, on success, it returns a Monad of the appropriate type by default. If on the other hand it doesn't work, it will fail.
The IO monad maps a fail to an error (which is why I saw the exception I mentioned in the post!) But Maybe will map it to Nothing.
So from the ghci command line, we can create a small test Data.Map and run some lookups against it “in” various monads.
Prelude Data.Map> let m = Data.Map.fromList ("one", 1)
-- success
Prelude Data.Map> Data.Map.lookup "one" m :: Maybe String
Just "uno"
Prelude Data.Map> Data.Map.lookup "one" m :: [String]
["uno"]
Prelude Data.Map> Data.Map.lookup "one" m :: IO String
"uno"
-- fail
Prelude Data.Map> Data.Map.lookup "two" m :: Maybe String
Nothing
Prelude Data.Map> Data.Map.lookup "two" m :: [String]
[]
Prelude Data.Map> Data.Map.lookup "two" m :: IO String
*** Exception: user error (Data.Map.lookup: Key not found)
ddarius gave a name to this technique, “Not Just Maybe”. That is, if you were going to write a function that returns Maybe “Just 1“ and Maybe “Nothing”, then you might as well just write it as a generic monad. This will then be usable within Maybe, as planned, but also in IO and List too.
This sparked an interesting discussion about “Common Idioms” in Haskell. Apparently the pages that used to exist on this topic haven't yet been migrated to the new wiki. But there are some notes, for example on this snapshot of the NotJustMaybe page.
ddarius also suggested I could rewrite parseInt similarly
> import Control.Monad
>
> parseInt :: MonadPlus m => String -> m Int
> parseInt s | all isDigit s = return $ read s
> | otherwise = mzero
Using MonadPlus, 1) requires the Control.Monad import. 2) seems to require the type signature. 3) it looks like IO doesn't have an mzero, so you can't now type
*Main> parseInt "4"
<interactive>:1:0:
Ambiguous type variable `m' in the constraint:
`MonadPlus m'
at the command line and have it Do The Right Thing. I'd read that fail is considered bad style (for some reason), but it seems to be rather more convenient on these 3 counts at least:
> -- type will be inferred if omitted
> parseInt :: Monad m => String -> m Int
> parseInt s | all isDigit s = return $ read s
> | otherwise = fail "not an int"
This time around, we're going to look at how we'll turn user input into commands in Monad Wars.
I think that the easiest option to implement will also be very convenient to play with: a command line where we issue commands like:
$ buy 4 foo
$ sell 20 bar
$ jet bronx
or with abbreviations
$ b 4 f
and where it's unambiguous, collapse spaces:
$ b4f
We'll start by tokenising the command line string. This is almost as easy as using the Prelude function words. The only complication is that we want to tokenise alternate letters and numbers separately, like the “b4f” example above.
We'll use groupBy from Data.List, and isLetter from Data.Char.
> import Data.List
> import Data.Char
>
> tokenizeLine :: [Char] -> [[Char]]
> tokenizeLine = concatMap
> (groupBy ((==) `on` isLetter))
> . words
Annoyingly, on (in the Prelude in GHC 6.8) doesn't exist in my 6.6.1 installation, so we'll have to define it:
> op `on` p = (\a b -> p a `op` p b)
Let's just see how groupBy works:
*Main> groupBy (==) "aabbbcdd"
["aa","bbb","c","dd"]
*Main> groupBy (\a b -> isLetter a == isLetter b) "abc123def"
["abc","123","def"]
The on expression is a nicer way of writing the second case. We then map this grouping over each of the tokens, getting our final list.
*Main> tokenizeLine "jet quuxville"
["jet","quuxville"]
*Main> tokenizeLine "b3p"
["b","3","p"]
Now we'll want to do things with the tokens. Yes, there are libraries to do this (Parsec etc.) I know that in the Perl world I'd often tell other people not to reinvent the wheel and to use CPAN, so I do feel a little naughty that I'm going to ignore these and handroll something myself. In my defence m'ludd,
In an expression like buy 4 foo I'm imagining that “buy” will map to some command. Then we'll need to parse “4“ as a number, and “foo” as a some merchandise. The first case is the simplest:
> parseInt :: String -> Maybe Int
> parseInt s | all isDigit s = Just $ read s
> | otherwise = Nothing
This reads tantalisingly close to English: If all the string is made up of digits, we just read it as an Integer. Otherwise we return nothing. OK, I'm glossing rather over the “Just” and “Nothing” which indicate whether the parse succeeded using the “Maybe” monad.
*Main> parseInt "42"
Just 42
*Main> parseInt "wibble"
Nothing
Now for parsing the merchandise: if there is an item called “foo”, we'd want to match “foo”, “fo”, “f”. By amazing coincidence, I recently wrote about a function that does exactly what we need:
> import qualified Data.Map as M
> import Control.Arrow
> getPrefixes :: [([a], t)] -> [([a], t)]
> getPrefixes = concatMap $ uncurry zip .
> (tail . inits . fst &&& repeat . snd)
> getPrefixMap :: (Ord a) => [([a], t)] -> M.Map [a] t
> getPrefixMap = M.fromList . getPrefixes
So we'd need a list of merchandise. Which means we should think a little about what the datatype will look like. I'm going with a record type again, because I know there is more information that we'll need to store:
> data Merchandise = Merchandise {
> name :: String,
> min :: Int, -- minimum price
> max :: Int -- maximum price
> }
> deriving Show
Next, we define the products on offer, by looking at the Dope Wars configuration pages, we can copy the prices, but of course we have to theme the names...
> merchandise :: [Merchandise]
> merchandise = [
> Merchandise "Arrows" 1000 4400,
> Merchandise "Curry" 15000 29000,
> Merchandise "Kleisli" 480 1280,
> Merchandise "Haskell" 5500 13000,
> Merchandise "Lambdas" 11 60,
> Merchandise "STM" 1500 4400,
> Merchandise "Monads" 540 1250,
> Merchandise "GHC" 1000 2500,
> Merchandise "Peyton" 220 700,
> Merchandise "Fundeps" 630 1300,
> Merchandise "Zipper" 90 250,
> Merchandise "Endo" 315 890
> ]
Now this isn't in the form we need it yet. First of all, we'll map this list to a list of tuples of [(string, thing),...], which is exactly what getPrefixMap is expecting:
> merchandiseMap = getPrefixMap $
> map (\i -> (toLowerS $ name i, i))
> merchandise
> -- toLowerS over a string isn't defined by default, but it's just:
> toLowerS = map toLower
OK, for a bit of fun, we could write the map as:
> -- map (toLowerS . name &&& id)
(I don't yet understand why “arrows” are supposed to be an “even more generic model of computation than monads”, but they're certainly good for putting things in tuples :-)
Now we can build the parser parseMerchandise. Or rather (as we'll probably use this technique again, for example for the names of locations, and even the commands like “buy” and “jet”), we'll create parseMap
Interestingly Data.Map's lookup function throws an exception (as a “user error”!) if you try to look up a key that doesn't exist. (This is rather different from Perl hashes, but it makes sense in a strongly typed language - there is no “undef” value which is of the requested type!) So that I can avoid having to learn exceptions just yet, I'm going to check first of all if the key exists using member
> parseMap m s | M.member s m = do v <- M.lookup s m
> Just v
> | otherwise = Nothing
I spent about half an hour trying to get the above to work. After a number of rather unhelpful error messages about monads, I changed from my original attempt v = M.lookup s m to the do-notation form. This is rather odd, as it implies that lookup is monadic. And its type does indeed suggest that the result is in some monad...
*Main> :t M.lookup
M.lookup :: (Ord k, Monad m) => k -> M.Map k a -> m a
but which monad is it in, and more to the point, why?
In any case, now it's easy as pie(*) to create our merchandise parser as a specialization of our general function:
> parseMerchandise :: String -> Maybe Merchandise
> parseMerchandise = parseMap merchandiseMap
and we can now recognize the names and abbreviations of elements in the list!
*Main> parseMerchandise "pey"
Just (Merchandise {name = "Peyton", min = 220, max = 700})
*Main> parseMerchandise "vb"
Nothing
Next time around, we'll create “buy” and “sell” handlers that parse the whole command line, and stub in the actual interaction with the game state!
(*) Well, I say easy, but at this point (and ongoing) we get bitten by the “monomorphism restriction”, whatever that is. The error message suggests that you add explicit types to all the functions involved, but when I try that, I regularly get even stranger error messages about rigid variables and monads. The easier solution seems to be to add -fno-monomorphism-restriction to your ghci call. (I don't know enough to know whether this is a Bad Thing). Of course, now this error message doesn't come up. Pah!
A lot of learning projects involve writing games: people have written clones of Tetris, Asteroids, Space Invaders, and even first person shooters (Frag) in Haskell. As I'm far less clever than these people, I thought I'd start with something a bit simpler: Dope Wars.
Dope Wars is basically a trading game. In 30 turns, you move from one location to another, buying and selling, er, drugs on the streets of New York. It's a fairly simple concept, but one which includes elements like:
all of which seem like a good way to learn Monads and the other building blocks that you need to actually do anything useful in a functional programming language like Haskell.
So I had the idea, wrote up a couple of datatypes and some functions, and then forgot about it. Then, when Greg McCarroll mentioned that he'd accept talks about other languages for the London Perl Workshop on December 1st, I thought it would be a great opportunity to push myself to do it by proposing a lightning talk.
Only problem: I now have to actually write the game in order to talk about it (*). So... here goes. In this post, I'm going to show a first draft for the game prompt.
OK, people often find it most convenient to do State using “Monads”. I think I'm going to leave that for now, and just thread state explicitly. Mainly because I haven't yet got around to learning how to use the State Monad. Hopefully this will eventually become annoying enough that it will give me impetus to learn the monadic version.
Anyway, the idea here is that we'd have some function playTurn that will look a bit like this:
> playTurn :: GameState -> IO ()
> playTurn gs = let gs' = doSomething gs
> in playTurn gs'
(There is an interesting post on haskell-cafe about a more sophisticated monadic representation of the prompt).
So we just need to work out how to represent this GameState object. To start off with, we'll want to store information like
We could create a normal tuple:
> data GameState = GameState Integer Integer Location
> -- turn score location
and then pattern match on this, but it's going to get horrible if we add any fields later on! In Perl I'd just use a hash, but remember that Haskell Data.Map objects map from one type to another, and we might well have values of various types.
When I asked on #haskell, dons and firefly told me about the record syntax:
> data GameState = GameState {
> turn :: Integer,
> score :: Integer,
> location :: Location
> } deriving Show
>
> -- for now:
> type Location = Integer
Defining the original state is easy:
> startState = GameState {
> turn = 1,
> score = 0,
> location = 0
> }
And to “modify” it (or rather to clone it, overriding certain fields) there is a convenient syntax that just lets us declare those fields which have changed:
> movedToBronx = gs { location = bronx }
Setting a field to a value is easy, but we might want to define some mutators to change the field relative to its current value:
> nextTurn :: GameState -> GameState
> nextTurn gs = gs { turn = succ $ turn gs }
>
> modScore :: Integer -> GameState -> GameState
> modScore d gs = gs { score = score gs + d }
The record syntax will work even when we inevitably add new fields later. Yay!
Now, the game cycle is a bit more complicated than the version I suggested above, as it will allow IO actions in it. Something perhaps like this:
> playTurn gs = do showStatus gs
> putStr prompt
> s <- getLine
> let f = parseLine s
> let gs' = f gs
> if isEnd gs'
> then endGame gs'
> else playTurn gs'
For now we'll just stub some of the declarations we need. showStatus can just show the GameState record (which is why we derived the Show class).
> showStatus gs = putStrLn $ show gs
We may as well set the prompt to the dollar sign, appropriately:
> prompt = "$ "
Though we'll need to parse the line read from standard input to one of the commands like “Buy 4 X” or “Goto location 3“, right now, we'll just stub in a function that increments the score and the turn counter:
> parseLine s = nextTurn
> . modScore 10
We need to know if we're at the end of the game, and take action appropriately.
> isEnd gs = turn gs > maxTurns
>
> maxTurns = 3
>
> endGame gs = do putStrLn "Game over!"
> putStrLn $ "Your score was " ++ (show $ score gs)
> return ()
And here's a transcript
*Main> playTurn startState
GameState {turn = 1, score = 10, location = 0}
> buy 2 foo
GameState {turn = 2, score = 20, location = 0}
> sell 4 bar
GameState {turn = 3, score = 30, location = 0}
> goto quuxtown
Game over!
Your score was 40
(*) to be fair, it's only a 5 minute “Lightning Talk”, so I could probably get away without even writing the whole game, but I'll feel better if I actually know what I'm talking about...
This morning, Ranguard asked an interesting question on #london.pm:
11:27 <@ranguard> What's the best way of finding the common root of two paths,
e.g. for /a/b/c/d/e, /a/b/c/1/2/3 I want /a/b/c/ returned,
Path::Class::Dir has 'contains', but I'm not sure that's quite right?
In my copious free lunchtime, I thought I'd write a version of it in Haskell. Though Ranguard was very sensibly using Path::Class in Perl, I decided to just work on a list of values (in this case, characters).
First of all, let's zip the list of paths, together with a boolean indicating whether we matched or not:
> zipWith (\a b -> (a==b, a))
We'll want to only take the ones from the beginning for which the tuple contains True in its first element:
> takeWhile fst
And we only want the second element
> map snd
So we just build a pipeline of these by composing the functions with the dot operator:
> commonPrefix = map snd .
> takeWhile fst .:
> zipWith (\a b -> (a==b, a))
Of course I've cheated here. The map and the takeWhile are expecting just one argument, so can be composed fine in this pipeline, but zipWith takes two arguments. I've seen people rewrite their pipelines to accomodate this, but I think it's horrible and derived a function to compose a 1-arg function with a 2-arg one. Saizan suggested the conventional name (.:) and roconnor gave a cuter implementation than mine:
> infixr 8 .:
> (.:) = (.).(.)
(The infixr declaration is because (.) is infixr 9, but I couldn't get it to compile like that so played around with it at random. Yes, very scientific.)
> x = commonPrefix "abcde" "abc123"
I asked lambdabot for a points-free version of the lambda expression I pass to zipWith, but it came up with utter horrors. EvilTerran suggested the very clever looking:
> flip ((&&& id) . (==))
> -- or alternatively:
> ((,) =<<) . (==)
Vincenz came up with another, perhaps more sensible option.
> commonPrefix = map fst . takeWhile (uncurry (==)) .: zip
This just zips all the elements together, and takes only the equal ones.
Meanwhile, back in the Perl world, Ranguard posted a solution, which Ilmari modified to use the lovely List::MoreUtils:
#!/usr/bin/perl -l
use strict;
use warnings;
use File::Spec::Functions qw(catdir splitdir);
use List::MoreUtils qw(all each_arrayref);
my $paths = each_arrayref map { [ splitdir $_ ] } (
'/v/f/d/index.html',
'/v/f/d/i/a/s/s.gif',
'/v/f/a/s/s.gif',
);
my @root;
while (my ($part, @rest) = $paths->()) {
last unless all { $_ eq $part } @rest;
push @root, $part;
}
print catdir(@root);
Unlike the Haskell version, it takes any number of paths.
Which suggests an additional exercise: modify or create a Haskell function to find the longest common root of an arbitrary number of lists. Do let me know if you try this, and post your code (if you can get it past Vox's comments system ;-).
Greg confessed on #london.pm to having written a average function in Perl in FP style... recursively.
I asked him about this, as a perfectly good average function (using List::Util would be:
sub avg { sum(@_) / @_ }
which is perfectly fine FP style too. As far as I could understand it, he was reducing a tuple of sum and count. Though he said he wrote it just-for-fun, it's not entirely unreasonable if you're using a list-based language, where you'd otherwise have to traverse the list twice (in Perl, arrays know how long they are).
Out of curiosity, I tried to check how Haskell defined avg, but it seems that it doesn't.
LoganCapaldo suggested the following definition:
> import Data.List
> avg xs = sum xs / genericLength xs
We use genericLength because the normal length function insists on returning an integer, while genericLength returns a generic Number, which can be divided appropriately. (We'd otherwise have to write sum xs / (fromIntegral $ length xs))
The recursive function would work by summing a tuple, so that
avgs [1,3,5] => (9,3)
Once we have that we could divide the tuple using uncurry.
> avg' = uncurry (/) . avgs
As we're going to fold the number values into a tuple, we need a function like:
> sum_count s (t,c) = (t+s, c+1)
after which our definition is easily written as a fold
> avgs = foldr sum_count (0,0)
Which works just fine. But seeing the tuples, I thought of the comments from Joao and doserj, and thought of (&&&).
> import Control.Arrow
> sum_count s = (+s).fst &&& succ.snd
And byorgey commented that the .fst and .snd pair can be abstracted away using (***). So I tried
> sum_count s = (+s) *** succ
and was really rather surprised that it worked! Admittedly, the definition is now something that scares me and whose intent is far less clear than the first naive definition above:
> avg2 = uncurry (/) . foldr (\s -> (+s) *** succ) (0,0)
For extra bonus obfuscation points, making the lambda expression points-free is left as an exercise to the reader ;-)
Update: mauke pointed out that my Perl avg sub didn't work... added parens to sum (@_). Looks like naughty me didn't bother testing that code...
Some interesting comments on yesterday's Haskell post. I thought I'd write this in Perl to compare. Of course, we don't have an "inits" function in the standard library, but that's easily written:
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
my %hash = (
"key1" => 1,
"key2" => 2,
"other" => 3,
);
sub inits {
my $string = shift or return;
return ($string, inits( substr($string, 0, -1)) );
}
sub getPrefixMap {
my %hash = @_;
return map {
my $k = $_; my $v = $hash{$_};
map { ($_ => $v) } inits $k;
} keys %hash;
}
my %map = getPrefixMap( %hash );
print Dumper( \%map );
This is a bit noisier, but perhaps more straight-forward than the final haskell version...
Update: broquaint pointed out that the code listing above was missing the zero in the substr, breaking it. Thanks!
Here's a little snippet I worked on yesterday, while preparing my talk for the London Perl Workshop.
It takes a list of tuples ("string", whatever) and maps all the prefixes of the string
("string", "strin", "stri", etc.) to the whatever.
I was quite impressed at how easily this came together. The functional composition (pipelines connected with ".") is coming a bit more easily, but the heavy lifting is all done really by the "inits" function from Data.List.
Some of this is made "prettier" (but harder to understand) by various features. "uncurry" changes a function that takes a tuple (a,b) into one that takes 2 arguments "a b". The (flip (,) r) is actually harder to read than the equivalent lambda, (\s -> (s,r)). (Annoyingly, you can't section the comma operator to do "(,r)" because tuples are "special"...)
--
import Data.List
import qualified Data.Map as M
-- can contain duplicates!
getPrefixes :: [([a], t)] -> [([a], t)]
getPrefixes = concatMap (uncurry gps)
where gps s r = map (flip (,) r) -- make a tuple (s',r)
. reverse -- shortest prefixes last
. tail -- ignore the empty prefix ""
. inits -- get all the prefixes
$ s
getPrefixMap :: (Ord a) => [([a], t)] -> M.Map [a] t
getPrefixMap = M.fromList . getPrefixes
-- without the type, we'd have to write like so (because of the
-- monomorphism restriction)
-- > getPrefixMap xs = M.fromList $ getPrefixes xs
Update: after Joao's comments below, I got the version
> getPrefixes :: [([a], t)] -> [([a], t)]
> getPrefixes = concatMap gps
> where gps (s, r) = let ss = tail . inits $ s
> rs = repeat $ r
> in zip ss rs
Joao meanwhile mentioned the cross product, which I didn't
immediately get the point of.
> infix 1 ><
> (><) f g a = (f a, g a)
As you can see, this applies the remaining parameter "a"
to two functions. So it's useful in this case to curry
the "(s,r)" parameter. doserj pointed out that this is
spelt (&&&) so that we just need to:
> import Control.Arrow
> getPrefixes = concatMap $ uncurry zip .
> (tail . inits . fst &&& repeat . snd)
As often happens in Haskell you could argue whether this is
actually any clearer, but it's cute!
Haskell's prelude has a function words that splits a string by spaces.
Prelude> words "nice cup of tea"
["nice","cup","of","tea"]
Apparently the question comes up quite regularly on irc or haskell-cafe as to why this function is specialised to split only on whitespace. Perl's split, for example, can split on any character, or indeed string or regular expression.
As quicksilver has suggested, the split function is more complicated than you might think:
and therefore perhaps the reason that there is no general function, is that it would require a very complex type.
I thought about this some more, and if I've got anything at all about functional programming, it's that you can build up progressively more complicated functions from smaller pieces. I wondered if I could do the same with split.
After playing a little with unfoldr, I decided I was better off using a simple recursive solution for now until I understand it better. But the first thing I need is a function to split a string just once.
> splitOne = splitOne' []
> where splitOne' acc p [] = (acc, Nothing, Nothing)
> splitOne' acc p xs@(x:xs') =
> let m = p xs
> in case m of
> Just (s, rest) -> (acc, Just s, Just rest)
> Nothing -> splitOne' (acc++[x]) p xs'
splitOne takes a function which either returns the separator and the rest of the string, or Nothing. We iterate the list of characters, stopping at the first where this function matches. Examples of these functions:
> onCharP p xs@(x:xs') | p x = Just ([x], xs')
> | otherwise = Nothing
> onChar c = onCharP (==c)
> onSpace = onCharP isSpace
> onComma = onChar ','
At which point we can do:
*Main> splitOne onSpace "nice cup of tea"
("nice",Just " ",Just "cup of tea")
*Main> splitOne onSpace "nice"
("nice",Nothing,Nothing)
So we now need to run for the whole length of the string, which is where the actual split function comes in.
> split t p [] = []
> split t p xs = let (tok,sep,rest) = splitOne p xs
> res = t (tok,sep)
> in case rest of
> Nothing -> res
> Just rest' -> res ++ split t p rest'
split takes a transformation function as well as a predicate. This takes the lists of (separator,token) and transforms them as required.
> onlyToken :: (t, t1) -> [t]
> onlyToken (x,_) = [x]
> -- onlyWord ("",_) = []
> onlyWord ([],_) = []
> onlyWord (x, _) = [x]
> tokenAndSep (t, Nothing) = [t]
> tokenAndSep (t, Just s) = [t,s]
This means that you can write the words function, as well as a function to split on commas, with different behaviours.
> words = split onlyWord onSpace
> commas = split tokenAndSep onComma
As quicksilver suggested, split does indeed have a rather complicated type:
*Main> :t split
split :: (([a1], Maybe a2) -> [a])
-> ([a1] -> Maybe (a2, [a1]))
-> [a1]
-> [a]
but the final function is simple enough. I did promise that we'd be able to split on words as well as characters, and this is why splitOne runs the predicate against xs instead of just the head of the list.
> onPrefix :: Eq a => [a] -> [a] -> Maybe ([a], [a])
> onPrefix = onPrefix' []
> where onPrefix' :: Eq a => [a] -> [a] -> [a] -> Maybe ([a], [a])
> onPrefix' acc [] s2 = Just (acc, s2)
> onPrefix' acc _ [] = Nothing
> onPrefix' acc s1 s2
> | (head s1) == (head s2)
> = onPrefix' (acc++[head s1]) (tail s1) (tail s2)
> | otherwise = Nothing
Which gives us:
*Main> split onlyWord (onPrefix "and") "sex and drugs and rock and roll"
["sex "," drugs "," rock "," roll"]
OK, this is still missing two important things from the Perl function:
DB> x split /,/, "red,green,yellow,blue", 3
0 'red'
1 'green'
2 'yellow,blue'
This one, I haven't really thought enough about how to implement, without complicating things.
As usual when I start on Haskell, I'm missing some obvious way to make the API not suck horribly, so comments and criticism very welcome.
Update Nov 15:
Thanks to everyone who responded here and on reddit (it is a constant source of pleasure and amazement that people find these silly posts worth commenting on :-)
Andrew, vincentk, and geezusfreak commented that splitting a string is really a form of parsing, and therefore they would use a proper parser library like Parsec. Good point, though split in Perl is often “just good enough” as a lightweight solution. I'd like to see a solution in Parsec or similar though.
Mamie Camacho suggested to use Text.Regex:
I had a play with this and it's quite simple to do something like:
> splitRegex (mkRegex "\\s*,\\s*") "eggs,ham, whatever"
Haskell's string quoting is less pleasant than Perl's (having to quote the backtick) and this version doesn't seem to have the semantics of keeping the separator in capture brackets. (e.g. if you use the string “(\\s*,\\s*)“)
Other good suggestions included starting from the source of words and deriving a more general solution from that. Thanks again!
Functional programmers often seem to complain that for loops are better off done with maps, and that, in any case, they're just degenerate cases of the same thing, as all imperative constructs can be created in a fully functional way...
That being the claim, I thought it might be a fun exercise to try to implement for loops in Haskell.
First of all, I tried to work out the type, and guessed at:
> for :: a -> (a->Bool) -> (a->a) -> (a-> IO ()) -> IO ()
Where a usage might be something like:
> for 1 (<10) (+1) (\x -> putStr $ show x)
First of all, I tried the “straightforward” recursive version, and got as far as compiling:
> for i p pp f =
> do if p i
> then
> (f i)
> -- but now what?
> else
> return ()
The do‘s and the return are of course there pretty much at random, by adding, removing, and moving things till I could get it to compile. But it does work, partially, certainly it gets through a single iteration! I tried various options to recurse, and thought that something like this should work.
> for (pp i) p pp f
But it doesn't, and ghc‘s error messages are typically cryptic. Half an hour later I was about to give up, and then remembered the iterate function. After which the following higher order version sort of wrote itself.
We construct the list of iterators:
> iterate pp i
But of course we only need to take them while the predicate is true
> takeWhile p
And we need to sequence the actions
> mapM_ f
(OK, I remembered that someone had pointed mapM_ at me and found it in an earlier blog entry...)
Leading to the following definition:
> for i p pp f = mapM_ f $
> takeWhile p $
> iterate pp i
OK, so this version of a for loop doesn't have a last keyword to break out of the loop early, but otherwise it does pretty much what you'd expect. I don't know whether the earlier thunks (iterated values of i that have already been used) will be kept by ghc even after they have already been used or not, but that's an implementation / optimization detail.
I'm not sure why the recursive version seems harder to write: I guess because I don't understand what the do-notation is doing. (Generally speaking, programming tasks tend to be easier when you actually know what the code is doing rather than just trusting the voodoo.)
As a final note, this is the type that ghc infers from this definition (more general than the one I'd started with).
> for :: (Monad m) => a -> (a -> Bool) -> (a -> a) -> (a -> m b) -> m ()
The next obvious functions after “Pick” are “Unpick” to eject a value randomly from the list, and “Pick N” values from the list.
unpick is fairly straight-forward. To do in a single pass, you have to thread through the list of values, but otherwise it's more or less the same as the previous code for pick
> unpick :: [a] -> IO [a]
> unpick [] = undefined
> unpick [x] = do return []
> unpick (x:xs) = do zs <- unpick' [] [x] xs 2
> return (reverse zs)
>
> unpick' :: (Num p, Random p) => [t] -> [t] -> [t] -> p -> IO [t]
> unpick' curr orig [] _
> = do return curr
> unpick' curr orig (next:rest) prob
> = do r <- getStdRandom (randomR (1,prob))
> let curr' = if r == 1 then orig else (next:curr)
> unpick' curr' (next:orig) rest (prob+1)
To do pickN I'd originally thought of picking the required number, and then following an algorithm similar to that of pick but dropping one (with unpick) every time I selected a new entry. That would be quadratic, and also insane. I discussed with dakkar, and we agreed that it might just be simpler to call something like pick N times (maybe with a function that returned the picked and the rejected part).
But, if we know the length of the array things are much easier. We can just scan the array, selecting elements with a probability of (r/e) where r is the number of required elements left to be selected, and e the number of elements left in the array.
> pickN :: Int -> [a] -> IO [a]
> pickN n xs = let len = length xs
> in pickN' n len xs
And we just need
> pickN' :: Int -> Int -> [a] -> IO [a]
> pickN' n l [] = do return []
> pickN' n l (x:xs)
> = do b <- roll n l
> if b
> then x: (pickN' (n-1) (l-1) xs)
> else pickN' n (l-1) xs
and to define the auxhiliary function roll, because I got tired of writing it out all over the place:
> roll p q = do r <- getStdRandom (randomR (1,q))
> return $ r <= p
Which is all great apart from not working. The problem is that pickN’ returns a monadic value, while x is a pure one. I asked on #haskell and got help from oerjan, scook0, and DRMacIver.
First of all they suggested:
> if b
> then (x:) `liftM` (pickN' (n-1) (l-1) xs)
> else pickN' n (l-1) xs
Now I intuitively sort-of-understood what this was doing but not why. (Bear in mind that I'm still confused about things like “why do I return this but not that?” and “How does it know which Monad it's in?”)
It turns out that liftM lifts a function with one pure argument into a function with one monadic argument. But I'm concatenating x (pure) with picked xs (monadic)! So the section (x:) is actually a pure function specialised to concatenate “x” with a pure list (which is the single argument).
My first reaction was a mix of wonder and “good god that's horrible”. As they pointed out though, I could just use another do block.
> if b
> then do xs <- pickN' (n-1) (l-1) xs
> return (x:xs)
> else pickN' n (l-1) xs
Hurrah! Now I should go and check out a random monad like MonadRandom...