January 11, 2008 * Context-free grammars * Chomsky Normal Form * Push-Down Automata * Pumping Lemma for CFLs CONTEXT-FREE LANGUAGES Context-free grammars are more powerful method for describing languages. Superset of regular grammars. Naturally captures fragments of natural language and programs. A CFG is a 4-tuple (V, Sigma, R, S), where 1. V is a finite set of variables 2. Sigma is a finite set of terminals 3. R is a set of rules of the form v -> string of variables and terminals 4. S is the start symbol There are several forms for writing down context-free grammars. One canonical form is the Chomsky Normal Form. A CFG is in Chomsky Normal Form if every rule is of the form A -> BC A -> a where A, B, and C are any variables -- except that B and C cannot be the start symbol S. In addition, the grammaer may have the rule S -> eps Regular languages are accepted by finite state automata. How about CFL? They are accepted by Pushdown Automata, equivalent in power. A pushdown automaton is a nondeterministic finite state automaton plus a stack which you can read/write in a LIFO order. Examples of CFL: -- Palindromes over {a,b} -- {a^ib^jc^k | i,j,k >= 0 and i=j or i = k} A CFG for Palindromes. S -> eps S -> a S -> b S -> aSa S -> bSb CNF for Palindromes S -> eps S -> a S -> b A -> a B -> b S -> AX S -> BY P -> AX P -> BY X -> PA Y -> PB PDA for Palindrome Guess whether the string is odd length or even. Guess the mid-point of the string. Push what is read in the first half into the stack, and then compare with what is read in the second half. Pumping lemma for CFLs: If A is a context-free language, then there is a number p where if s is in A and of length at least p, then s can be written as uvxyz satisfying the conditions 1. for each i >= 0, uv^ixy^iz in A, 2. |vy| > 0, and 3. |vxy| <= p. Examples of non-CFLS: -- {a^ib^jc^k| 0 <= i <= j <= k} -- {ww: w in {0,1}^*} Deterministic PDAs do not have the same power as nondeterministic PDAs. They are important from a PL point of view since many parsers implement deterministic PDAs. Languages accepted by DPDAs lie between regular languages and CFLs. -- {wcw^R} is accepted by DPDA but is not regular -- {ww^R} is CFL but is not accepted by any DPDA History: CFGs proposed for natural languages by Chomsky (1956). For PL, Backus (1959) for Fortran and Naur et al (1960) for Algol. CFG essential for implementation of compilers. Also for description of the structure of documents. (DTDs in XML). PDAs defined by Oettinger (1961) and Schutzenberger (1963). Equivalence with CFGs due to Chomsky (1961) and Evey (1963). LR(k), defined by Knuth (1965), are equivalent to DPDAs and form the basis for YACC.