September 17 * Applications http://www.ibc.wustl.edu/~zuker/Bio-5495/sequences-html/node5.html Use FSM to look for a pattern in a DNA sequence * L(M) = ((ab v aab)*a*)* M = ? a b +-++--------+ v |v | --->+===+ a +-+ | +->|| ||--->| |-+ | +===+ +-+ | a| | v | +-+ b | | | | +-+ | a| | v | +-+ | | | | +-+ | ^ +----+ * NFAs with epsilon-transitions Allow change of state without reading input delta: K x (Sigma union {e}) -> P(S) Example: 0 1 +--+ +--+ v | v | +--+ e +--+ |q0|--->|q1| +--+ +--+ | e \ 1 | \ v v +--+ 0 +====+<+ |q3|--->||q2|| | 0 +--+ +====+-+ Provides flexibility in constructing FSM (0 v 11)101* +-+ 0 +-+ --->| |---->| | +-+ +-+ 1| | v | +-+ e| | | | +-+ | 1 1| | +-------------+ v | v | +-+ e +===+ 1 +-+ 0 +-+ | |---->|| ||--->| |--->| | +-+ +===+ +-+ +-+ * Equivalence of Techniques We have shown four methods of representing regular languages DFAs, NFAs, NFAs with epsilon-transitions, regular expressions We will now show that these are equivalent We will show methods of converting from one representation to another +-------->DFAs----------+ | | | | v v NFAs REs ^ | | | | | +------>NFAs with<------+ epsilon-transitions * DFAs <-> NFAs DFAs -> NFAs Theorem: if L is accepted by some DFA, then L is accepted by some NFA. Proof: Obvious. NFAs subsume DFAs. NFAs -> DFAs Theorem: if L is accepted by some NFA, then L is accepted by some DFA. Proof: Build a DFA that keeps track of all the states the NFA could be in at any time while reading the input. Note: Finite automata M1 and M2 are said to be "equivalent" iff L(M1) = L(M2). Example M1 = ({q0,q1,q2}, {0,1}, delta, q0, {q2}) d(q0,0) = q0, d(q0,0) = q1, d(q0,1) = q2, d(q1,0) = q2, d(q2,1) = q2, d(q2,1) = q1 M2 = ({{}, q0, q1, q2, q0q1, q0q2, q1q2, q0q1q2}, {0,1}, delta, q0, {q2, q0q2, q1q2, q0q1q2}) Note that * DFA states correspond to all possible sets of states of NFA * Final states correspond to all states containing NFA final states * Transitions correspond to NFA transitions d(q0,0) = q0q1 d(q0,1) = q2 d(q1,0) = q2 d(q1,1) = {} d(q2,0) = {} d(q2,1) = q1q2 d({},0) = {} d({},1) = {} d(q0q1,0) = q0q1q2 d(q0q1,1) = q2 d(q0q2,0) = q0q1 d(q0q2,1) = q1q2 d(q1q2,0) = q2 d(q1q2,1) = q1q2 d(q0q1q2,0) = q0q1q2 d(q0q1q2,1) = q1q2 Proof Let M = (K, Sigma, delta, s, F) be an NFA. Construct a DFA M' = (K', Sigma, delta', s', F') s.t. L(M) = L(M'). K' = P(K) = power set of K Note that we will remove states that cannot be reached s' = s F' = {k' in K': k' intersect F neq empty} delta'(q0q1...qi, a) = p0p1...pj iff delta({q0, q1, ..., qi}, a) = {p0, p1, ..., pj} Note that delta' is computed by applying delta to each state of K represented by q0q1...qi. When we apply delta to each of these states and take the union we get a new set of states, p0 ... pj. The new set of states has a representative state p0...pj, which is the value of delta'(q0q1...qi, a). Prove by induction on |w| that delta'(s', w) = q0q1...qi iff delta(s, w) = {q0, q1, ..., qi}. Base Case: Let |w| = 0. In this case s' = s and w must be e, and we know delta'(s',e) = s' and delta(s,e) = {s}. Inductive Hypothesis: Assume property holds for |w| <= n. Inductive Step: Prove property then holds for |w| = n+1. Let a represent the last symbol in the string (string is xa), |x| = n. delta'(s', xa) = delta'(delta'(s', x), a). From the inductive hypothesis we know delta'(s', x) = q0q1...qj iff delta(s, x) = {q0, q1, ..., qj}. From the definition of delta' we know delta'(q0q1...qj, a) = [r0r1...rk] iff delta({q0, q1, ..., qj}, a) = {r0, r1, ..., rk}. Thus we know that delta'(s', xa) = r0r1...rk iff delta(s, xa) = {r0, r1, ..., rk}. Thus the property holds. We also know that delta'(s', w) is in F' exactly when delta(s, w) contains a state of K that is in F. Thus L(M) = L(M'). Q.E.D. * NFAs <-> NFAs with epsilon-transitions NFAs -> NFAs with epsilon-transitions Theorem: if L is accepted by some NFA, then L is accepted by some NFA with epsilon-transitions. Proof: Obvious. NFAs with epsilon-transitions subsume NFAs. NFAs with epsilon-transitions -> NFAs Theorem: if L is accepted by some NFA with epsilon-transitions, then L is accepted by some NFA without epsilon-transitions. Proof: Given M = (K, Sigma, delta, s, F) which is an NFA with epsilon-transitions, construct M' = (K', Sigma, delta', s', F') which is an NFA without epsilon-transitions, s.t. L(M) = L(M'). K' = K s' = s delta' is an extension of delta. We add extra edges to compensate for lack of epsilon-edges. delta' has all non-epsilon-transitions that delta has. In addition, forall p,q in K, forall a in Sigma, if e a e p ~~~>--->~~~>q in M, where ~~~> indicates a finite-length path of e edges a then add transition p--->q in M' e F' = F unless s~~~>S_{accept} in M, in which case F' = F union {s}. For the previous example we note: q0 on 0 can lead to q0, q1, q2, q3 1 q1, q2, q3 q1 on 0 can lead to q2 1 q1, q2, q3 q2 on 0 can lead to q2 1 empty q3 on 0 can lead to q2 1 empty This defines our NFA without epsilon-transitions, F = {q2} because there was not a path from s to a final state consisting of epsilon-edges only in the original NFA. Leave the proof as an exercise. * Example of the conceptual power of nondeterminism Let L be a language. Define 1/2(L) to be {x: for some y such that |x| = |y|, xy is in L}. 1/2(L) is the set of first halves of strings in L. Prove that for each regular L, 1/2(L) is regular. Proof: Let M be a DFA for L. M' will be an NFA constructed to accept 1/2(L). M' uses the "finger method" as follows. M' places the left finger on s in M, and the right finger on s_i in M where s_i is a nondeterministic "guess" as to where input x will lead. As aharacters of x are read, M' moves the left finger along transitions as dictated by characters read, and simultaneously moves the right finger along a nondeterministically chosen edge (which may be labeled with any character, not necessarily the one of x being read). M' accepts x iff at the end its left finger is on s_i and its right finger is on the accept state. Thus M' accepts x iff * x leads to some state s_i in M, and * there exists y |y| = |x| s.t. y leads from s_i to an accept state in M. Formally: Use tuple for fingers (we need a middle finger to remember the guessed s_i state). M' = (K', Sigma, delta', s', F'), where K' = K_{left finger} x K{middle finger} x K{right finger} and s in K = new start state * delta'(s, epsilon) = {: si in S} "Guess" the state si to which input x will lead * delta'( , a) = { : delta(si, a) = sl and there exists b in Sigma s.t. delta(sk, b) = sm} Note that the middle finger does not change after the initial guess F' = { : si in S, sj in F} As you can see, the left finger moved on input x to where the middle finger guessed (si, si). The right finger moved from si to some accepting state sj of M (si, sj).