static  ·  no trackers  ·  built to be lightweight

Tools of the Trade

Introduction

Hello again, this tutorial is going to cover the instances for the functor, monad and applicative for parsers. Then, we can begin the action with some primitive combintors that we will be able to use to construct more complex parsers soon.

Just as a little refresher, let’s remind ourselves of the type of a parser.

    Parser a :: Parser (String -> Maybe (a, String))

The Functor Instance

To begin, let’s implement a functor instance for our parsers. This means that we will need to create a definition for fmap :: (a -> b) -> Parser a -> Parser b . Essentially, if we think of our parsers as structures that take strings and turn them into some value of type a , then fmap so apply a function to the parser output to turn the output into a value of type b . Let’s try and write this !

    instance Functor Parser where 
      fmap (Parser p) f 
        = Parser (\inp -> case (p inp) of 
                            Just res -> Just (f res)
                            Nothing  -> Nothing) 

Hopefully it’s clear what is happening here. If our original parser succeeds, then we take the result and apply f to it, but if the result fails, then we just return a failure. However, we can make this much more elegant! The Maybe type is already a functor itself, so we can instead write

    instance Functor Parser where 
        fmap (Parser p) f = Parser (\inp -> f <$> p inp) 
        -- Note that f <$> p inp === fmap f (p inp)

This is very useful in itself - see the first exercise to see how this can be used in practice. But the truth is that this is stepping stone to what we really want to do, which is make our parsers monadic.

The Applicative Instance

To implement an applicative instance we need to implement pure and <*> . The definition for the former is not particularly complicated, the minimal definition is that pure x is the parser that doesn’t parse anything, and returns x

Next we want to implement <*> :: Parser (a -> b) -> Parser a -> Parser b . What this means is that the first parser returns a function when it successfully parses, so the semantics of <*> is that it creates a new parser that first parses the inout to return a function, then parses the remaining input to return a value of type a , and finally it returns a Maybe (b, String) by applying the function to the term of type a . Hopefully this gives an intuition of how to chain these parsers together, let’s first begin with a definition that is functionally correct, but terribly written:

    Parser a <*> Parser b 
      = Parser $ \inp -> case res of 
                     Nothing        -> Nothing 
                     Just (f, inp') -> case res' of 
                                         Nothing         -> Nothing 
                                         Just (x, inp'') -> Just (f x, inp'')
      where   
        res  = a inp  
        res' = b inp'

This is an extremely naive implementation, at each step we have to check if the computation has failed, which leads to these nested cases. Fortunately though, Maybe is the monad that represents the success or failure of a computation, and the implementation of its monadic functions represents this, so we can simply use do notation in the context of the Maybe monad to handle the case of a failed computation at any stage in the computation, so putting these ideas together, along with the definition of pure , we get

    instance Applicative Parser where 
      pure x = Parser $ \input -> Just (x, inp)

      Parser p <*> Parser q
        = Parser $ inp -> do 
            (f, inp') <- p inp 
            (x, inp'') <- q inp' 
            return (f x, inp'')

Hopefully you can understand how this new implementation works, if not, it may be worth experimenting with the Maybe monad for a bit - essentially though, if any step in the do notation returns a Nothing , the entire thing returns a Nothing .

The Monad Instance

Fortunately, the monad instance is very similar to the applicative instance, so we don’t have to spend much time thinking about this one. return has to have the same definition as pure , so the only thing we must worry about is the definition of (>>=) :: Parser a -> (a -> Parser b) -> Parser b .

From the type signature of bind, we can kind of get a idea of what’s going to happen. First, we parse the input to get a value of type a , then we use that to produce a Parser b , which we then use to parse the remaining input to produce a value of type b . Hopefully this gives a bit of an intuition and now we can look at implementation.

    instance Monad Parser where 
      return = pure 
      Parser a >>= f 
        = Parser $ \inp -> do 
            (x, inp') <- a inp 
            let Parser b = f x
            (y, inp'') <- b inp' 
            return (y, inp'') 

It’s worth taking some time meditating on this to fully understand what’s going on, but we use the monadic context of Maybe to catch failed computations like we did when we implemented the applicative instance.

For the most part, we want to use applicatives as much as possible as they give a static program structure, but monadic parsers allow us to perform branching execution like we would have in an if statement, look at the exercises for an example.

The Alternative Instance

At this point, you are probably sick of defining instances for our parsers, but I promise this is the last one. Now, what even is an Alternative ? If you’ve never heard of it, let’s ask ghci what Alternative is

  > :i Alternative
  class Applicative f => Alternative (f :: * -> *) where
  empty :: f a
  (<|>) :: f a -> f a -> f a
  some :: f a -> f [a]
  many :: f a -> f [a]
  {-# MINIMAL empty, (<|>) #-} 

Okay, so there’s quite a bit going on here. If you search up alternatives in the Hackage documentation, it says that an Alternative is “a monoid on applicative functors.” At this point, the algebraist is very happy, but for the non-algebraist, let’s take a look at this functions and try to understand what’s happening.

A way to get an intuition for what’s going on (although not the true mathematical way to understand the motivation behind the structure - but such is the way of Haskell, see monads…) is to think of Alternative as defining a choice scheme for a type. empty refers to making no choice at all (the “nothing choice” probably captures the concept better), and <|> refers to picking between two things, although generally we can see it as we try to pick the first argument, unless it returns a bad choice (what exactly a bad choice is depends on the implementation).

I think to understand this it would be useful to look at how Alternative is defined for Maybe . First, the nothing choice is very obviously Nothing , and then

  Just a  <|> _ = Just a 
  Nothing <|> x = x  

So if the first choice is good, we pick that, otherwise we just choose whatever the second choice is - so if both a “bad” choices, then we return Nothing . However, we also have functions called some and many . They are both very similar: many f keeps running f until it fails (ie zero or more times), and some f tries to run f one or more times. I think it’s a bit hard to understand with Maybe , but it’s very clear with our parsers, so we will go into more detail after we implement Alternative for our parsers.

The “nothing choice” in the context of parsers is the parser that always fails. Making a choice between two parsers corresponds to trying to parse the input with the first parser, and if that fails, trying with the second parser. What this looks like in code is

  instance Alternatice Parser where 
    empty = Parser $ \inp -> Nothing 
    Parser p <|> Parser q 
      = Parser $ \inp -> case p inp of 
                           Just (x, inp) -> Just (x, inp)
                           Nothing       -> q inp  

I wrote this implementation with the intention of being expressive, we can write a much prettier implementation though - the main improvement coming from the fact that Maybe is already alternative so this pattern match is unneeded.

  instance Alternatice Parser where 
    empty = Parser $ const Nothing
    Parser p <|> Parser q = Parser $ \inp -> p inp <|> q inp

This is much more elegant, the good thing here is that we now have the ability to choose between parsers. We also now have many and some , so many p creates a new parser that will repeatedly try and apply the parser p until it fails.

In terms of our parsers, the usefulness of alternatives are very intuitive, suppose we have a grammar like

  Instruction := "Up" | "Down"

What this means is that an instruction can be "Up" or "Down" . We can encode the choice between the two options very simply, it would just be

  pInstruction = pUp <|> pDown

So as you might imagine, this is going to be ubiqutous throughout our entire parsing journey.

Finishing Up

This part of the series might not have been the most exciting, but we know have all the setup we need to start doing anything we could ever want to do with our parsers, now we can start constructing more and more exciting parsers to be able to parse an entire grammar!