September 15 * FSM We begin the study of computing devices by studying "Finite State Machines". Computers consist of three main components: 1. cpu 2. auxiliary memory (distinct from cpu storage) 3. I/O devices FSM 1. cpu 2. no auxiliary memory at all 3. Input: string on input tape Output: yes / no This is a language recognition device Why study such limited machines? * Good foundation for studying machines with more abilities * Useful for some applications (lexical analysis in compilers, controllers) Input tape: +-+-+-+-+-+-+----------------------+ |0|1|1|0|0|1|... | +-+-+-+-+-+-+----------------------+ ^ | -------"Reading head" \ | / --- +-------/ o \-------+ + q4 --- q0 | | * | | \ |-----> ONE WAY | q10 \> q3 | | | | q7 | +-------------------+ O O Finite automaton M = (K, Sigma, delta, s, F) consists of * K, a finite set of states * Sigma, the alphabet * delta: K x Sigma -> 2^K, the "transition function", where 2^K is the set of all subsets of K * s in K is the "initial state" * F <= K is the set of "final states" Transition rules: "If in state q_i scanning 0, go to state q_j (and move right one cell)" If the light is on "after" reading the last character of the input, then M accepts the input. Otherwise, M rejects the input. We we first look at "Deterministic Finite Automata", because their operation is fully determined by their input. In a DFA, the transition function is constrained to be K x Sigma -> K, or with each rule there is exactly one resulting state. We can also view this as the transition delta(q, a) containing exactly one state for every (state, input) pair. A "nondeterministic" finite automaton is an unrestricted FA which may or may not satisfy this restriction. A DFA acts in the following manner: it begins in the starting state. Each input symbol causes a transition to a new state; if the machine is currently in state s and a is the input, the machine makes a transition to she state in the set delta(s, a). If delta(s, a) = empty (there is nowhere to go from state q with the input a), the machine immediately rejects the input string. This process continues until all of the input has been read. The machine then accepts the input string if it is in an accepting (final) state, and rejects otherwise. A convenient way to express a FA M = (K, Sigma, delta, s, F) is by its "state diagram" or "transition graph". This is a directed graph with * Sigma-labeled edges * Start vertex * Set of "final" vertices For example, let M = ({S,A}, {a,b}, delta, S, {S}), where delta(S,a) = {S}, delta(S,b) = {A}, delta(A,a) = {A}, delta(A,b) = {S} The transition graph for M is shown below. a a ----- --- v | b v | +===+---------->+-+ -->||S|| |A| +===+<----------+-+ b We represent the status of the machine at any moment using the DFA "configuration". The configuration of a DFA is any element of K x Sigma*. The configuration shown in the first machine is (q3, 01100). The binary relation |-M holds between two configurations of M iff the machine can pass from one to the other in a single move. (S, ab) |-M (A, b) for the machine above. We can also say that "(S, ab) yields (A, b)" in one step". If we use the symbol |-*M, we say that one configuration yields another after some number, possibly zero, of steps. A string w in Sigma* is "accepted" by M iff there is a state q in F s.t. (s, w) |-*M (q, e). The "language accepted by M", L(M), is the set of all strings accepted by M L(M) = {w: M accepts w} * Examples M = ({q0,q1,q2,q3}, {0,1}, d, q0, {q3}) d(q0,0) = {q1}, d(q0,1) = {q0} d(q1,0) = {q2}, d(q1,1) = {q0} d(q2,0) = {q3}, d(q2,2) = {q0} d(q3,0) = {q3}, d(q3,1) = {q3} Note that this machine is a DFA, because for each state s and each possible symbol a in Sigma, there is exactly one edge leaving s labeled with a. Final (accepting) states are double-circled. Q: Where does 001 lead? 10010? etc.? Q: Which strings end up in a final state? Prove it. M accepts w iff the unique path starting from s labeled with w ends at a final state. L is "regular" iff there exists a FA M s.t. L = L(M) (some DFA accepts exactly the strings in L). What is L(M)? M = ({q0,q1,q2,q3}, {0,1}, d, q0, {q0}) d(q0,0) = {q2}, d(q0,1) = {q1}, d(q1,0) = {q3}, d(q1,1) = {q0}, d(q2,0) = {q0}, d(q2,1) = {q3}, d(q3,0) = {q1}, d(q3,1) = {q2} All strings with an even number of 0s and 1s. M = ({q1,q1,q2}, {0,1}, d, q1, {q1,q2}) d(q1,0) = {q1}, d(q1,1) = {q2}, d(q2,0) = {q2}, d(q2,1) = {q3}, d(q3,0) = {q3}, d(q3,1) = {q3} All strings with at most one 1. Try to construct M from L. Problem 2.1.3 from textbook b) L(M) = {w in {a,b}*: w has abab as a substring} M = ? c) L(M) = {w in {a,b}*: w has neither aa nor bb as a substring} M = ? M = delta(q0,a) = q1 delta(q0,b) = q3 delta(q1,a) = q5 delta(q1,b) = q2 delta(q2,a) = q1 delta(q2,b) = q5 delta(q3,a) = q4 delta(q3,b) = q5 delta(q4,a) = q5 delta(q4,b) = q3 delta(q5,a) = 15 delta(q5,b) = q5 * NFA NFAs are an important concept in Computer Science. You see them in compiler construction and in Augmented Transition Networks (for Natural Language Processing). They seem unintuitive, because they allow many computation paths to be followed simultaneously. Graphical Definition: Allow 0, 1, or more edges labeled with a given symbol to emanate from the same state. Finite automaton M = (K, Sigma, delta, s, F) consists of * K, a finite set of states * Sigma, the alphabet * delta: K x Sigma -> 2^K, the "transition function", where 2^K is the set of all subsets of K * s in K is the "initial state" * F <= K is the set of "final states" M = ({q0,q1,q2,q3}, {0,1}, d, q0, {q3}) d(q0,0) = {q0}, d(q0,0) = {q1}, d(q0,1) = {q0} d(q1,0) = {q0}, d(q2,0) = {q0}, q1 has two edges labeled 0 q1,q2 have no edges labeled 1 q3 has no edges at all M accepts w iff there exists path labeled +-+ +===+ |s|-> ... -> || || +-+ +===+ L(M) = {w: exists path labeled w from q0 to accepting state }. In above example, L(M) = {u000: u in Sigma*} The NFA M accepts iff we can find any sequence of "choices" of transitions that will lead to an accepting state. It is helpful to think of M as splitting into several copies when it must make a choice on which transition to take, with each copy taking a different transition. Of course, a copy may split into further copies when confronted with another choice. M accepts iff any copy accepts. In this way, we explore all possible execution paths and accept if any path leads to acceptance. The term "non-deterministic" may not be the best term to use, because "non-determinism" implies a random or unpredictable process, which is not the case (as you will see when we show that for any NFA, a DFA can be constructed which accepts the same language). NFAs are valuable because they simplify many proofs and machine constructions. A transition can now also be represented as a triple (q, u, p), meaning that from state q a transition can be made to p when processing u. A configuration of M is an element of K x Sigma*, and the relation |-M (yields in one step) holds between (q,w) and (q',w') iff there is a u in Sigma* s.t. w = uw' and (q,u,q') in Delta (NFA transition rules). L(M) = {w: delta(s,w) intersect F neq empty} i.e., there exists a final state among the states you could end up in on reading w. So if there is any path from the start state to a final state parsing string w, then w in L(M). In the machine below, one path accepts 101 and others do not. As long as one does, 101 is in the language of the machine. To test a string, we would have to try all possible paths. 0,1 --- v | +-+ 0 +-+ 1 +-+ 0 +===+ |s|--->| |--->| |--->|| || +-+ +-+ +-+ +===+ | 1 v +-+ | | +-+ | 0 v +-+ | | +-+ | 1 v +===+ || || +===+ L(M) = (ab)*(ba)* v aa* M = ? L(M) = ((ab v aab)*a*)* M = ?