Your first Haskell project: building a monadic parser combinator library

The goal of this project is to build our own parser library from scratch.

You will understand how monads work by building your own Parser monad! and its actually useful + beautiful!

The first step is to understand what a ‘Parser a’ means

a value of type ‘Parser a’ is a ‘runnable computation’ that can consume a String or a part of String, and produce a value of type ‘a’ along with the remaining unconsumed String.

newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }

i.e. if it succeeds, it consumes part of the input String and produces a value of type ‘a’ along with the remaining unconsumed String. If it fails, it returns Nothing.

for example, we can define a ‘just’ parser that tries to consume a specific character from the input String:

just :: Char -> Parser Char
just c = Parser $ \input -> case input of
    (x:xs) | x == c -> Just (c, xs)
    _               -> Nothing

There are also many simple parsers we can define, such as:

-- | This parser always succeeds without consuming any input,
unit :: a -> Parser a
unit a = Parser $ \input -> Just (a, input)

-- | This parser always fails no matter what the input is.
failure :: Parser a
failure = Parser $ \_ -> Nothing

For example, let’s see them in action:

-- Using the 'just' parser
runParser (just 'a') "abc"
-- >>> Just ('a', "bc"), the 'a' is consumed, "bc" is the remaining input
runParser (just 'a') "xyz"
-- >>> Nothing

-- Using the 'unit' parser
runParser (unit 42) "hello"  -- Just (42, "hello")
-- Using the 'failure' parser
runParser failure "anything"  -- Nothing

The interesting part comes when we see how to build more complex parsers by combining simpler ones. This is where the power of parser combinators comes into play.

Let’s have some ‘combining’ operations:

-- | it inputs two parsers, tries the first one; if it succeeds, returns its result;
-- otherwise, tries the second one.
(<|>) :: Parser a -> Parser a -> Parser a
(<|>) (Parser p1) (Parser p2) = Parser $ \input ->
    case p1 input of
        Just result -> Just result
        Nothing     -> p2 input

-- | tries a lot of parsers!
anyOf :: [Parser a] -> Parser a
anyOf = foldr (<|>) failure
-- see, the 'failure' is like the 'zero' element owo


-- we can now define something more interesting:
-- | parses a single letter (a-z or A-Z)
letter :: Parser Char
letter = anyOf $ map just $ ['a'..'z'] ++ ['A'..'Z']

-- | parses a single digit (0-9)
digit :: Parser Char
digit = anyOf $ map just ['0'..'9']

It is very clear that Parser is a Functor.

instance Functor Parser where
  fmap f (Parser p) = ... try to fill in here owo

But something stronger hold: it is a Monad o.o

that basically means, it can be ‘combined’, when something m is a monad, you can merge something like m (m a) into m a effectively wrapping two layers of m into one layer owo

In terms of Parsers, that means Parser (Parser a) can be merged into Parser a

so… a parser that returns a parser, can be treated as a single parser o.o

join :: Parser (Parser a) -> Parser a
join (Parser p) = Parser $ \input ->
    case p input of
        Just (Parser p', rest) -> p' rest
        Nothing                -> Nothing

It can be proved that, giving out this join :: m (m a) -> m a function is equivalent to giving the ‘binding operator’ (>>=) :: m a -> (a -> m b) -> m b, a ‘sequential’ combining operator: the first m a produces a value a, which is then fed into the function (a -> m b) to produce the next m b.

And in Haskell community, that (>>=) is what people use to define a Monad:

but Monad class has a super class Applicative, any Monad is naturally an Applicative.

-- | This is the definition for your reference, you don't need to write this in your code.
class Functor f => Applicative f where
  pure  :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

class Applicative m => Monad m where
  (>>=) :: m a -> (a -> m b) -> m b
  -- ^ This is the 'multiplication' operator for Monads

  return :: a -> m a
  return = pure
  -- ^ This thing is the 'unit' for that multiplication
  -- that's why we defined our 'unit' to be called 'unit' earlier

instance Applicative Parser where
  pure = unit
  Parser pf <*> Parser pa = (...try owo)

instance Monad Parser where
  (Parser p) >>= f = (...try owo)

With the Applicative and Monad in hand, we can write parsers in a more convenient way, the do notation can replace (>>=) (as you already knew)

-- | This parser runs the parser in the input argument
-- zero or more times (until it fails / ends), collecting the results in a list.
many :: Parser a -> Parser [a]
many p =
  (do
    first <- p
    rest  <- many p
    return (first : rest)
  ) <|> unit [] -- actually, this unit is the samething as pure / return

-- | This parser runs the parser in the input argument
-- **One or more** times, collecting the results in a list.
some :: Parser a -> Parser [a]
some p = do
  first <- p
  rest  <- many p
  return (first : rest)

This many parser is very useful, it allows us to parse sequences of elements.

-- | parses a sequence of letters (a-z or A-Z)
letters :: Parser String
letters = some letter

digits :: Parser String
digits = some digit

space :: Parser Char
space = anyOf $ map just [' ', '\n', '\t']

spaces :: Parser String
spaces = many space

For example, we can now very easily parse something that is really useful, let’s say

an identifier: a letter followed by zero or more letters or digits
an IPV4 address: four groups of one to three digits separated by dots

identifier :: Parser String
identifier = do
  first <- letter
  rest  <- many (letter <|> digit)
  return (first : rest)

ipv4 :: Parser (Int, Int, Int, Int)
ipv4 = do
  part1 <- intRange 0 255
  _     <- just '.'
  part2 <- intRange 0 255
  _     <- just '.'
  part3 <- intRange 0 255
  _     <- just '.'
  part4 <- intRange 0 255
  return (part1, part2, part3, part4)

parseByRead :: Read a => Parser String -> Parser a
parseByRead p = do
  str <- p
  case readMaybe str of -- you need to import Text.Read (readMaybe)
    Just val -> return val
    Nothing  -> failure

intRange :: Int -> Int -> Parser Int
intRange low high = do
  n <- parseByRead (many digit)
  if n >= low && n <= high
    then return n
    else failure