Thursday, January 7, 2010

Regexp => NFA => DFA problem for tomorrow's class

The main things I want to do in class Friday are:
  • work a few examples of translating regular expressions to non-deterministic finite-state acceptors
  • discuss a few practical complications that aren't apparent in the theory, like how the "maximum munch" rule and matching a lot of different patterns (with different actions) go together
  • discuss the limitations of lexical analysis (the pumping lemma, for those of you who took automata theory)
Here's a small example we can do on the board tomorrow, for the first two items above.  Imagine your language has integer constants and floating point constants.  Integer constants are a sequence of one or more integer digits.  It might be specified as
[0-9]+
or equivalently as
[0-9][0-9]+
Floating point constants consist of zero or more digits, followed by a decimal point, followed by one or more digits.  (So .00 is a floating point constant, but 0. is not, and neither is . )  This might be specified as
[0-9]*[.][0-9][0-9]*

We could carefully follow Thompson's construction to get the NFA for each of these and then combine them, but it's easier and less messy to just eyeball the pattern for integer constant and the pattern for floating point constant and create an automaton for each of them, then use Thompson's construction just to tie them together as
    ( integer constant ) |  (floating point constant )
and then we can go through the subset construction to get a single deterministic finite-state acceptor that is looking for both of them and succeeds if it matches either one.  You might want to try that in advance (but you don't have to).  If you do, consider what should happen with the following input:
998
99.foo
.998

No comments:

Post a Comment