October 1 * Context-Free Grammars Conventions a,b,c,d,... are terminal symbols A,B,C,D,... are nonterminal symbols u,v,w, ... are strings consisting of terminal symbols alpha, beta, gamma, ... are strings in the alphabet X,Y,Z, ... is a symbol in the alphabet Example G = ({S,0,1}, {0,1}, R, S) R = {S->aSb, S->e} For any strings u,v in V*, we write u =>G v iff there are strings x,y in V* and A in NT s.t. u = xAy, v=xv'y, and A ->G v'. The relation =>_G* is the reflexive, transitive closure of =>G. L(G) is the "language generated by G", or {w in Sigma*: S =>_G* w} G generates each string in L(G). A language L is a "context-free language" if L = L(G) for some context-free grammar G. If the choice of grammar is obvious, we will use -> instead of ->G and => instead of =>G. S -> aSb -> aaSbb -> aabb A sequence of this form is a "derivation" in G of aabb from S. w0 =>G w =>G ... =>G wn is a derivation of wn in G from w0. The natural number n here is the "length" of the derivation, and the derivation has n "steps". What is L(G)? L(G) = {0^n 1^n : n >= 0}. In fact, all regular languages are context free and some languages that are not regular are context free. Prove that S -> 0S0 | 1S1 | e | 0 | 1 generates palindromes. 1. PAL <= L(G) If w is a palindrom, then show by induction in |w| that S =>* w. If |w| = 0, w = e and we are done because S -> e. If |w| = n+1, then w = 1u1 or 0u0 where u is a palindrome of length n-1. By (unstated) inductive hypothesis, S =>* u, so S -> 1S1 => 1u1 = w. Use similar argument for 0u0. 2. L(G) <= PAL If S=>* w then w is a palindrome. Prove by induction on |w|. Clear for |w| = 0,1. If |w|>1, then in derivation of w, 1st production must be S -> 0S0 or S -> 1S1 where the inner S generates middle subword of w of length |w| - 2. Because of the inductive hypothesis, this word is a palindrome, and hence so is w. Tricks for Grammars. Union. Use S->A, S->B for A v B Repeated Characters. Use A-> aA | a. Repeated sets of characters in equal numbers. Use A -> bAc | e. Problem 3.1.3 part a. Construct a context-free grammar for L(G) = {wcw^R: w in {a,b}*} G = ({S,a,b}, {a,b,c}, R, S) R = S-> c | aSa | bSb * Parse trees Consider grammar S->E, E->E+E | E*E | a Derivation 1 S -> E -> E+E -> E*E+E -> a*E+E -> a*a+E -> a*a+a Show the parse tree This corresponds to (a*a)+a Derivation 2 S -> E -> E+a -> E*E+a -> a*E+a -> a*a+a Show the parse tree This corresponds to (a*a)+a Derivation 3 S -> E -> E*E -> a*E -> a*E+E -> a*a+E -> a*a+a Show the parse tree This corresponds to a*(a+a) Root of a parse tree is the start symbol Each branch corresponds to the application of a rule Internal nodes are nonterminals Leaves are terminals Two specific types of derivations: 1. Leftmost derivation Replace the leftmost nonterminal with each rule application 2. Rightmost derivation Replace the rightmost nonterminal with each rule application Theorem 3.2.1 Let G = (V, Sigma, R, S) be a CFG, and let A be a NT and w in Sigma*. Then the following statements are equivalent: (a) A =>* w. (b) There is a parse tree with root A and yield w. (c) There is a leftmost derivation A =>^{L*} w. (d) There is a rightmost derivation A =>^{R*} w. Grammars with two or more distinct parse trees are "ambiguous". Ambiguous grammars are difficult because more than one parsing of a sentence is possible. Programming languages are not "inherently ambiguous" (not all CFG that generate the language must be ambiguous). * Exam Review