```September 29

* Algorithms for DFAs

* DFA State Minimization

Although FSM cannot perform all tasks, they are important building blocks
As such, we need to find ways to minimize FSM representation
(minimize the number of states in a DFA)

Minimization algorithm
1.  Delete unreachable states
2.  Merge indistinguishable (equivalent) states

b
+====+ a  +--+--->+====+  b +--+
--->||q1||--->|q2| a  ||q3||<---|q8|
+====+    +--+<---+====+    +--+
|  ^      |       |  ^     |  ^
b|  |a    a|      b|  |a   a|  |b
|  |      |       |  |     |  |
v  |      v       v  |     v  |
+--+  b  +--+  b  +--+  a +====+
|q4|---->|q5|<----|q6|<---||q7||
+--+     +--+     +--+    +====+
^  |
|  |
+--+
a,b

The language of this machine is (ab v ba)*

1.  Delete unreachable states

R = {s}
while there is a state p in R and a in Sigma s.t. delta(p,a) not in R do
add delta(p,a) to R

In this way we remove all states that have no path from the start state.
Reachable states are the closure of {s} under the relation
{(p,q): delta(p,a) = q for some a in Sigma}.

R = {q1}
R = {q1,q2,q4}
R = {q1,q2,q3,q4,q5}
R = {q1,q2,q3,q4,q5,q6}

Delete q7 and q8

b
+====+ a  +--+--->+====+
--->||q1||--->|q2| a  ||q3||
+====+    +--+<---+====+
|  ^      |       |  ^
b|  |a    a|      b|  |a
|  |      |       |  |
v  |      v       v  |
+--+  b  +--+  b  +--+
|q4|---->|q5|<----|q6|
+--+     +--+     +--+
^  |
|  |
+--+
a,b

2.  Merge indistinguishable (equivalent) states

Two states are equivalent if, from either state, the same set of
strings is accepted.

States x and y are "equivalent with respect to language L", L in Sigma*,
or x ~~_L y ("~~" is actually one ~~ on top of another), if for all
z in Sigma* xy in L iff yz in L.

~~_L here is an equivalence relation (clusters states).

So x ~~_L y if both strings belong to L or neither is in L AND
if appending any string to both of them results in strings both in L
or both not in L.

Remember that we denote an equivalence class by listing any member of
one of the classes in square brackets.

An equivalence class here is the set of strings such that for any two
elements x and y in the class, x ~~_L y

The example DFA has four equivalence classes:

(1) [e]    This is the language itself
(2) [a]    This is a string in the language followed by a single a
(3) [b]    This is a string in the language followed by a single b
(4) [aa]   This is a string in the language followed by either aa
or bb followed by anything (no string after this will lead
to a final state)

Two strings x,y in Sigma* are "equivalent with respect to M",
x ~_M y, if they both drive M from s to the same state.
x ~_M y if there is a state q such that (s,x) |-*M (q,e) and
(x,y) |-* (q,e).  The relation "~_M" is also an equivalence relation -
its equivalence classes can be identified by states of M that are
reachable from s and have at least one string in the corresponding
equivalence class.  The class corresponding to state qi is E_qi.

The example DFA has six equivalence classes:

(1) E_q1 = (ba)*   All of these strings end up in state q1
(2) E_q2 = abLa    String ab followed by any string in L followed by a
(3) E_q3 = abL
(4) E_q4 = b(ab)*
(5) E_q5 = L(b v aa)Sigma*
(6) E_q6 = abLb

For any DFA M and strings x,y in Sigma*, if x ~_M y, then x ~~_L y.
Thus ~_M is a "refinement" of ~~_L in that ~_M implies ~~_L but not
the other way around.

The equivalence relation that relates any two cities in the US that are
in the same county is a refinement of the relation between any two cities
that are in the same state.

If we can whittle the equivalence classes of ~_M down to the equivalence
classes of ~~_L, we have minimized the DFA.

Myhill-Nerode Theorem
Let L in Sigma* be a regular language.  Then there is a DFA
with the same number of states as there are equivalence classes in ~~L.

In our example, we can merge q1 and q3 and merge q4 and q6 yielding
a four-state DFA
b
+====+ a  +--+
--->||q1||--->|q2|
+====+    +--+
|  ^      |
b|  |a    a|
|  |      |
v  |      v
+--+  b  +--+
|q4|---->|q5|
+--+     +--+
^  |
|  |
+--+
a,b

Algorithm to distinguish states

(1) Every final state is distinguished from every nonfinal state
(2) Forall pairs p,q not distinguished, check if there exists a s.t.
delta(p,q), delta(q,a) are distinguished.
If so, make p,q distinguished.

Example

1         1
+-+     +------+
| |     |      |
v |     v      |
+-+ 0  +-+ 0  +-+ 0  +===+
|a|--->|b|--->|c|--->||e||
+-+    +-+    +-+    +===+
^ \1     ^-\ / 1|
|  \        /   | 0
1|   \      / \  |
|    \+v v+   \1v
|      +-+ 0  +===+
+------|d|--->||f||
+-+    +===+
^   |
|   |
+---+
0

a
+-+
b |3|
+-+-+
c |2|2|
+-+-+-+
d |2|2| |
+-+-+-+-+
e |1|1|1|1|
+-+-+-+-+-+
f |1|1|1|1| |
+-+-+-+-+-+
a b c d e

PAIRS	INPUT		RESULT		RESULT

a,b		0,1		b,c		a,d

a,c		0,1		*b,e*		a,b

a,d		0,1		*b,f*		a,b

b,c		0,1		*c,e*		b,d

b,d		0,1		*c,f*		d,b

c,d		0,1		e,f		b,b

e,f		0,1		f,f		d,c

a,b		0,1		*b,c*		a,c

c,d		0,1		e,f		b,b

e,f		0,1		f,f		d,c

1                       0
+-+                     +--+
| |                     |  |
v |       0,1       0   v  |
+-+ 0  +-+--->+---+--->+=====+
|a|--->|b|    |c,d|    ||e,f||
+-+    +-+<---+---+<---+=====+
1        1

* Context-Free Languages

Proper subset of regular languages
+-------+
|  CFL  |
|       |
| +---+ |
| |REG| |
| +---+ |
|       |
+-------+

For example, {0^n 1^n : n>=0} is a context-free language.

We will look at a language generator and a language recognizer for CFL.
Humans do this regularly, we know that "the cad is in the hat" is
syntactically correct, while "hat the the in is cat" is not.
Humans generate valid language when they speak and when they write.

Consider the re a(a* v b*)b.
How can we rewrite this re using a set of rules?
Let S be a new "start" symbol that we will eventually replace with
a string in the language.

S is replaced with a single a followed by a middle part followed by a b.
Let M be a new symbol that is a place holder for the middle part.  We can
then write our rule as

S -> aMb

Now we need a rule for the middle part.  M can be rewritten as a string of 0
or more "a"s or a string of 0 or more "b"s.  We will let A be a place holder
for the string of "a"s and let B be a place holder for the string of "b"s:

M -> A
M -> B
(either rule can be selected)

How do we rewrite A and B?  A can be replaced by e or by a single a
followed by a string of 0 or more "a"s.  We can rewrite B similarly.

A -> e
A -> aA
B -> e
B -> bB

To generate the string abbbbb we start with S and apply the rules
appropriately.

S -> aMb ->  aBb -> abBb -> abbBb -> abbbBb -> abbbbBb -> abbbbb

Context-Free Grammars (CFG)

G = (V, Sigma, R, S)

V:     alphabet
Sigma: Sigma in V (finite set of terminals)
R:     finite subset of (V - Sigma) x V* (rules)
S:     S in V - Sigma (start symbol)

Example

G = ({S}, {0,1}, {S->0, S->1, S->e, S->0S0, S->1S1}, S)
generates palindromes.

Note that these rules are context free because we only see a single
nonterminal on the left hand side of each rewrite rule.  We rewrite the
symbol independent of symbols around the nonterminal, or independently of the
symbol's context.
```