|
Parsing
CptS 355 - Programming Language Design Washington State University |
|||||||||||||||||||
|
Today's main topic is to complete the discussion of
grammars.
This supplemental set of notes covers parsing: the process of
going from a string to a parse tree which you may be wondering about.
I won't lecture on this material and you are not responsible for it for
the exam, but I thought you might be wondering how parsing works.
We looked at parsing strings representing reverse polish expressions using the following grammar:
6 7 8 + * 10 +We will take two different approaches. First we look at a top-down approach -- we create the parse tree beginning at the root. For this language, that works if we parse from right to left. Seeing the + at the right of the string and looking at the grammar we see that the only possible rule for the root of the tree is
expr
/ | \
expr expr binop
|
+
Now we see 10 in the input. This must correspond to the second expr on the second
line above. So we now have:
expr
/ | \
expr expr binop
| |
num +
|
10
The next input symbol (moving to the left) is * which must be derived from the
first expr on the second line. Again the only way is using the expr rule for
binary operators so
expr
/ | \
expr expr binop
/ | \ | |
/ | \ num +
/ | | |
/ | | 10
expr expr binop
|
*
The next input symbol, +, must be derived from the second expr on the
last line above the *, and again it must be using the binop rule for
expr.
expr
/ | \
expr expr binop
/ | \ | |
/ | \ num +
/ | | |
/ | | 10
expr expr binop
/ | \ |
/ | \ *
/ | \
expr expr binop
|
+
Now the 8, 7 and 6 correspond to the three remaining expr nodes from right to left.
expr
/ | \
expr expr binop
/ | \ | |
/ | \ num +
/ | | |
/ | | 10
expr expr binop
/ / | \ |
/ / | \ *
/ / | \
num expr expr binop
| | | |
6 num num +
| |
7 8
This concludes the construction of the parse tree using a top-down
technique. I'll stress again that the right-to-left input in this case
is an artifact of the grammar and language. Programming languages are
normally parsed left to right.
The other way to parse is bottom-up. In this case we look at the input and determine the lowest levels of the parse tree that could correspond to that input, gradually building up the tree. For our example we will parse left to right and bottom up. The first input symbol is 6. This can only correspond to a num nonterminal so we get
num
|
6
A num nonterminal can only be derived in one way, from an expr so we get
expr
|
num
|
6
Now there's nothing else that can be done so we read the next symbol,
7, the same observations apply, so we end up with two "chunks" of
tree.
expr expr
| |
num num
| |
6 7
and similarly for 8
expr expr expr
| | |
num num num
| | |
6 7 8
We get to the + and observe that it can only be a binop .
expr expr expr binop
| | | |
num num num +
| | |
6 7 8
Observe now that at the tops of the tree fragments in our list
expr expr binopwhich is the RHS of the binop expr rule so we can construct a new expr node over these 3 fragments
expr
/ | \
expr expr expr binop
| | | |
num num num +
| | |
6 7 8
After reading the next symbol, *, a similar thing happens:
expr binop
/ | \ |
expr expr expr binop *
| | | |
num num num +
| | |
6 7 8
And again we have expr expr binop at the tops of the tree fragments so we
construct a new expr node over them
expr
/ | \
--------- expr ----------
/ / | \ \
expr expr expr binop binop
| | | | |
num num num + *
| | |
6 7 8
You can work out on your own the next steps to parse the 10 and the +.
Bottom up parsing for infix arithmetic expressions is similar but involves looking ahead to the next symbol to decide at each step whether to construct a new node over top of the existing nodes or to just retain the current list of fragments and move on to the next symbol. |
|||||||||||||||||||