Perl, Haskell, stuff
Apparently the question comes up quite regularly on irc or haskell-cafe as to why this function is specialised to split only on whitespace. Perl's split, for example, can split on any character, or indeed string or regular expression. As quicksilver has suggested, the split function is more complicated than you might think:Prelude> words "nice cup of tea" ["nice","cup","of","tea"]
splitOne takes a function which either returns the separator and the rest of the string, or Nothing. We iterate the list of characters, stopping at the first where this function matches. Examples of these functions:> splitOne = splitOne' [] > where splitOne' acc p [] = (acc, Nothing, Nothing) > splitOne' acc p xs@(x:xs') = > let m = p xs > in case m of > Just (s, rest) -> (acc, Just s, Just rest) > Nothing -> splitOne' (acc++[x]) p xs'
> onCharP p xs@(x:xs') | p x = Just ([x], xs') > | otherwise = Nothing > onChar c = onCharP (==c)
At which point we can do:> onSpace = onCharP isSpace > onComma = onChar ','
*Main> splitOne onSpace "nice cup of tea" ("nice",Just " ",Just "cup of tea") *Main> splitOne onSpace "nice" ("nice",Nothing,Nothing)
So we now need to run for the whole length of the string, which is where the actual split function comes in.
split takes a transformation function as well as a predicate. This takes the lists of (separator,token) and transforms them as required.> split t p [] = [] > split t p xs = let (tok,sep,rest) = splitOne p xs > res = t (tok,sep) > in case rest of > Nothing -> res > Just rest' -> res ++ split t p rest'
> onlyToken :: (t, t1) -> [t] > onlyToken (x,_) = [x]
> -- onlyWord ("",_) = [] > onlyWord ([],_) = [] > onlyWord (x, _) = [x]
This means that you can write the words function, as well as a function to split on commas, with different behaviours.> tokenAndSep (t, Nothing) = [t] > tokenAndSep (t, Just s) = [t,s]
As quicksilver suggested, split does indeed have a rather complicated type:> words = split onlyWord onSpace > commas = split tokenAndSep onComma
but the final function is simple enough. I did promise that we'd be able to split on words as well as characters, and this is why splitOne runs the predicate against xs instead of just the head of the list.*Main> :t split split :: (([a1], Maybe a2) -> [a]) -> ([a1] -> Maybe (a2, [a1])) -> [a1] -> [a]
Which gives us:> onPrefix :: Eq a => [a] -> [a] -> Maybe ([a], [a]) > onPrefix = onPrefix' [] > where onPrefix' :: Eq a => [a] -> [a] -> [a] -> Maybe ([a], [a]) > onPrefix' acc [] s2 = Just (acc, s2) > onPrefix' acc _ [] = Nothing > onPrefix' acc s1 s2 > | (head s1) == (head s2) > = onPrefix' (acc++[head s1]) (tail s1) (tail s2) > | otherwise = Nothing
OK, this is still missing two important things from the Perl function:*Main> split onlyWord (onPrefix "and") "sex and drugs and rock and roll" ["sex "," drugs "," rock "," roll"]
This one, I haven't really thought enough about how to implement, without complicating things.DB> x split /,/, "red,green,yellow,blue", 3 0 'red' 1 'green' 2 'yellow,blue'
Haskell's string quoting is less pleasant than Perl's (having to quote the backtick) and this version doesn't seem to have the semantics of keeping the separator in capture brackets. (e.g. if you use the string “(\\s*,\\s*)“) Other good suggestions included starting from the source of words and deriving a more general solution from that. Thanks again!> splitRegex (mkRegex "\\s*,\\s*") "eggs,ham, whatever"
Osfameron's blog on Haskell, Perl programming, stuff.
There’s the nub (snippet in Perl and Haskell) - Just another lambdabananacamel,
January 27th, 2009 at 11:08 pm
Warning: preg_replace_callback() [function.preg-replace-callback]: Unknown modifier '|' in /var/www/blog/wp-content/plugins/text-control/text-control/markdown.php on line 766