```October 1

* Context-Free Grammars

Conventions
a,b,c,d,... are terminal symbols
A,B,C,D,... are nonterminal symbols
u,v,w, ... are strings consisting of terminal symbols
alpha, beta, gamma, ... are strings in the alphabet
X,Y,Z, ... is a symbol in the alphabet

Example

G = ({S,0,1}, {0,1}, R, S)
R = {S->aSb, S->e}

For any strings u,v in V*, we write u =>G v iff there are strings
x,y in V* and A in NT s.t. u = xAy, v=xv'y, and A ->G v'.

The relation =>_G* is the reflexive, transitive closure of =>G.

L(G) is the "language generated by G", or
{w in Sigma*:  S =>_G* w}
G generates each string in L(G).
A language L is a "context-free language" if L = L(G) for some
context-free grammar G.

If the choice of grammar is obvious, we will use -> instead of ->G and =>

S -> aSb -> aaSbb -> aabb

A sequence of this form is a "derivation" in G of aabb from S.
w0 =>G w =>G ... =>G wn is a derivation of wn in G from w0.

The natural number n here is the "length" of the derivation, and the
derivation has n "steps".

What is L(G)?

L(G) = {0^n 1^n : n >= 0}.

In fact, all regular languages are context free and some languages
that are not regular are context free.

Prove that S -> 0S0 | 1S1 | e | 0 | 1 generates palindromes.

1.  PAL <= L(G)

If w is a palindrom, then show by induction in |w| that S =>* w.

If |w| = 0, w = e and we are done because S -> e.

If |w| = n+1, then w = 1u1 or 0u0 where u is a palindrome of length n-1.

By (unstated) inductive hypothesis, S =>* u, so
S -> 1S1 => 1u1 = w.  Use similar argument for 0u0.

2.  L(G) <= PAL

If S=>* w then w is a palindrome.

Prove by induction on |w|.  Clear for |w| = 0,1.

If |w|>1, then in derivation of w, 1st production must be
S -> 0S0 or S -> 1S1

where the inner S generates middle subword of w of length
|w| - 2.  Because of the inductive hypothesis, this word is a
palindrome, and hence so is w.

Tricks for Grammars.

Union.  Use S->A, S->B for A v B
Repeated Characters.  Use A-> aA | a.
Repeated sets of characters in equal numbers.  Use A -> bAc | e.

Problem 3.1.3 part a.

Construct a context-free grammar for L(G) = {wcw^R:  w in {a,b}*}

G = ({S,a,b}, {a,b,c}, R, S)
R = S-> c | aSa | bSb

* Parse trees

Consider grammar S->E, E->E+E | E*E | a

Derivation 1
S -> E -> E+E -> E*E+E -> a*E+E -> a*a+E -> a*a+a
Show the parse tree
This corresponds to (a*a)+a

Derivation 2
S -> E -> E+a -> E*E+a -> a*E+a -> a*a+a
Show the parse tree
This corresponds to (a*a)+a

Derivation 3
S -> E -> E*E -> a*E -> a*E+E -> a*a+E -> a*a+a
Show the parse tree
This corresponds to a*(a+a)

Root of a parse tree is the start symbol
Each branch corresponds to the application of a rule
Internal nodes are nonterminals
Leaves are terminals

Two specific types of derivations:
1.  Leftmost derivation
Replace the leftmost nonterminal with each rule application
2.  Rightmost derivation
Replace the rightmost nonterminal with each rule application

Theorem 3.2.1
Let G = (V, Sigma, R, S) be a CFG, and let A be a NT and w in Sigma*.
Then the following statements are equivalent:

(a) A =>* w.
(b) There is a parse tree with root A and yield w.
(c) There is a leftmost derivation A =>^{L*} w.
(d) There is a rightmost derivation A =>^{R*} w.

Grammars with two or more distinct parse trees are "ambiguous".
Ambiguous grammars are difficult because more than one parsing of a
sentence is possible.
Programming languages are not "inherently ambiguous"
(not all CFG that generate the language must be ambiguous).

* Exam Review
```