Image goes here
Parsing
CptS 355 - Programming Language Design
Washington State University
Today's main topic is to complete the discussion of grammars. This supplemental set of notes covers parsing: the process of going from a string to a parse tree which you may be wondering about. I won't lecture on this material and you are not responsible for it for the exam, but I thought you might be wondering how parsing works.

We looked at parsing strings representing reverse polish expressions using the following grammar:

<expr> -> <expr> <expr> <binop>
| <expr> <unop>
| <num>
<binop> -> + | - | * | /
<unop> -> ~ -- meaning negation
Now consider the problem of constructing a parse tree that corresponds to the string
   6 7 8 + * 10 +
We will take two different approaches. First we look at a top-down approach -- we create the parse tree beginning at the root. For this language, that works if we parse from right to left.

Seeing the + at the right of the string and looking at the grammar we see that the only possible rule for the root of the tree is

<expr> -> <expr> <expr> <binop>
so we write
                                  expr 
                                /  |   \ 
                           expr   expr   binop 
                                           |
                                           +
Now we see 10 in the input. This must correspond to the second expr on the second line above. So we now have:
                                  expr 
                                /  |   \ 
                           expr   expr   binop 
                                   |        |
                                  num       +
                                   |
                                  10
The next input symbol (moving to the left) is * which must be derived from the first expr on the second line. Again the only way is using the expr rule for binary operators so
                                  expr 
                                /  |   \ 
                           expr   expr   binop 
                         /  |  \     |       |
                        /   |   \   num      +
                       /    |    |   |
                      /     |    |   10
                  expr   expr   binop 
                                 |
                                 *
The next input symbol, +, must be derived from the second expr on the last line above the *, and again it must be using the binop rule for expr.
                                  expr 
                                /  |   \ 
                           expr   expr   binop 
                         /  |  \     |       |
                        /   |   \   num      +
                       /    |    |   |
                      /     |    |   10
                  expr   expr   binop 
                        /  |  \   |
                       /   |   \  *
                      /    |    \
                    expr  expr  binop 
                                  |
                                  +
Now the 8, 7 and 6 correspond to the three remaining expr nodes from right to left.
                                  expr 
                                /  |   \ 
                           expr   expr   binop 
                         /  |  \     |       |
                        /   |   \   num      +
                       /    |    |   |
                      /     |    |   10
                  expr   expr   binop 
                  /     /  |  \   |
                 /     /   |   \  *
                /     /    |    \
               num  expr  expr  binop 
                |    |     |      |
                6   num   num     +
                     |     |
                     7     8
This concludes the construction of the parse tree using a top-down technique. I'll stress again that the right-to-left input in this case is an artifact of the grammar and language. Programming languages are normally parsed left to right.

The other way to parse is bottom-up. In this case we look at the input and determine the lowest levels of the parse tree that could correspond to that input, gradually building up the tree. For our example we will parse left to right and bottom up.

The first input symbol is 6. This can only correspond to a num nonterminal so we get

    num 
     |
     6
A num nonterminal can only be derived in one way, from an expr so we get
    expr 
     |
    num 
     |
     6
Now there's nothing else that can be done so we read the next symbol, 7, the same observations apply, so we end up with two "chunks" of tree.
    expr    expr 
     |       |
    num     num 
     |       |
     6       7
and similarly for 8
    expr    expr   expr 
     |       |      |
    num     num    num 
     |       |      |
     6       7      8
We get to the + and observe that it can only be a binop .
    expr    expr   expr   binop 
     |       |      |       |
    num     num    num      +
     |       |      |
     6       7      8
Observe now that at the tops of the tree fragments in our list
 expr  expr  binop 
which is the RHS of the binop expr rule so we can construct a new expr node over these 3 fragments
                   expr    
                /   |   \
    expr    expr   expr   binop 
     |       |      |       |
    num     num    num      +
     |       |      |
     6       7      8
After reading the next symbol, *, a similar thing happens:
                   expr             binop 
                /   |   \             |
    expr    expr   expr   binop       *
     |       |      |       |
    num     num    num      +
     |       |      |
     6       7      8
And again we have expr expr binop at the tops of the tree fragments so we construct a new expr node over them
                   expr           
                 /  |  \   
       ---------   expr  ---------- 
      /         /   |   \          \
    expr    expr   expr   binop    binop
     |       |      |       |        |
    num     num    num      +        *
     |       |      |
     6       7      8
You can work out on your own the next steps to parse the 10 and the +.

Bottom up parsing for infix arithmetic expressions is similar but involves looking ahead to the next symbol to decide at each step whether to construct a new node over top of the existing nodes or to just retain the current list of fragments and move on to the next symbol.

(c) 2003 Curtis Dyreson, (c) 2004-2006 Carl H. Hauser           E-mail questions or comments to Prof. Carl Hauser