Week 11.1

CourseCompilers & Interpreters
SemesterS1 2022

Week 11.1

🏷️ Replace this section with some keywords that can be used for searching, tagging and the indexing of this page.

Sections

🌱 Links to all Heading1 Blocks in this page.

  1. Regular Expressions
  2. Finite Automata (State Machines)

1.0 - Regular Expressions

1.1 - Defining Lexical Tokens using Regular Expressions

  • In JFlex, we have:
    • Lexical tokens are specified via a regular expression

    • e.g. an IDENTIFIER token is specified as:

      {Letter} ({Letter} | {Digit})*
      
    • A regular expression is translated into a deterministic finite automaton (DFA) to recognise the tokens

    • DFA recognisers are very efficient (O(n) in the length of the input length n)

    • JFlex doesn’t directly produce a DFA - it first produces a non-deterministic finite automaton (NFA) which is then translated to a DFA to minimise size.

    • The DFA initially produced isn’t as small as it could be, and can be minimised to reduce the size further.

  • For example, the following NFA (Left) and DFA (Right) are for matching the expression a b βˆ£ ca\ b\ |\ c

Untitled

Untitled

  • Non-Deteterministic Finite Automata are able to have transitions out of a state on the same symbol (e.g. transitioning out of state 1 to state 2 and 6 on the symbol Ο΅\epsilon)

1.2 - Regular Expressions Syntax

  • Regular Expressions are defined with respect to an alphabet (set of symbols) Ξ£\Sigma

    🌱 The syntax of regular expressions in BNF is as follows

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \def\rq{\text{'}} \def\or{\ |\ } e::=a\or\varepsilon\or\empty\or e\lq|\rq e\or e\ e\or e^{\lq*\rq} \or \lq(\rq\ e\ \lq)\rq

where a stands for a symbol from $\Sigma$. Repetition has higher precedence than concatenation, which has higher precedence than alternation. $a:$ Symbol from alphabet ($a\in\Sigma)$ $\varepsilon:$ Empty sequence of symbols $\empty:$ Represent matching of no input strings at all.

\def\rq{\text{'}} e\ \lq|\rq\ e:

Choice between two regular expressions $e\ e:$ Concatenation of regular expressions

\def\rq{\text{'}} e^{\lq * \rq}:

Repetitionofregularexpressions Repetition of regular expressions

\def\rq{\text{'}} \lq(\rq\ e\ \lq)\rq:

Grouping of regular expressions - Note that regular expressions use β€˜|’ to express alternatives, but is also used to represent alternatives in BNF. Hence the use of quotes around the symbol in the REGEX pattern matching string shown above. ## 1.3 - Language of Regular Expressions - Given an alphabet $\Sigma$, a regular expression defines a language (i.e. a set of strings of symbols from the alphabet). - Let β€œa” be a symbol, and $e, f$ be regular expressions

\def\rq{\text{'}}
\def\L{\mathcal{L}}
\begin{aligned}
\text{Regular Expression}\ &\ \ \ \ \ \  \text{Language}\\
\L(a)&=\{\lq a\rq\}\\
\L(\varepsilon)&=\{\lq\ \rq\}\\
\L(\empty)&=\{\}\\
\L(e|f)&=\L(e)\cup\L(f)\\
\L(e\ f)&=\L(e)^\frown\L(f)\\
\L(e^*)&=\cup_{n\in\mathbb{N}}(\L(e))^n\\
\L((e))&=\L(e)
\end{aligned}
$$
  • Note here that the symbol β€˜βŒ’β€™\def\rq{\text{'}}\lq^\frown\rq indicates set concatenation. In this context, it means that we can take any string in language L(e)\mathcal{L}(e) and concatenate it with any string in L(f)\mathcal{L}(f) to form the new language.
  • Note here in the second-last definition, the set is essentially formed by concatenating L(e)\mathcal{L}(e) with itself nn times.
  • In the last definition, the regular expression of a grouping is just the regular expression of the grouping itself.

1.3.1 - Concatenation of Languages

  • The concatenation of two languages (sets of strings), L1L_1 and L2L_2 is defined as the set of all strings formed by taking a string s1∈L1s_1\in L_1 and s2∈L2s_2\in L_2 and concatenating them.

  • For example:

    {β€˜a’,β€˜b’}⌒{β€˜c’,β€˜d’}={β€˜ac’,β€˜ad’,β€˜bc’,β€˜bd’}\def\rq{\text{'}}\{\lq a\rq,\lq b\rq\}^\frown\{\lq c\rq, \lq d\rq\}=\{\lq ac\rq, \lq ad\rq, \lq bc\rq, \lq bd\rq\}

1.3.2 - Iteration of Languages

  • The iteration LnL^n of a language, LL to the power nn consists of LL concatenated with itself nn times.

    L0={β€˜β€™}L1=LL2=L⌒LL3=L⌒L⌒Lβ‹―\def\rq{\text{'}} \begin{aligned} L^0&=\{\lq\rq\}\\ L^1&=L\\ L^2&=L^\frown L\\ L^3&=L^\frown L^\frown L\\ &\cdots \end{aligned}

  • For example, L(a βˆ£ b)={β€˜a’,β€˜b’}\def\rq{\text{'}}L(a\ |\ b)=\{\lq a\rq , \lq b\rq\} and

\def\rq{\text{'}} {\lq a\rq, \lq b\rq}^0 = {\lq\rq}\ {\lq a\rq, \lq b\rq}^1 = {\lq a\rq, \lq b\rq}\ {\lq a\rq, \lq b\rq}^2={\lq aa\rq, \lq ab\rq, \lq ba\rq, \lq bb\rq}\ {\lq a\rq, \lq b\rq}^3={\lq aaa\rq, \lq aab\rq , \lq aba\rq, \lq abb\rq, \lq baa\rq,\lq bab\rq,\lq bba\rq,\lq bbb\rq}

βˆ’Hence: - Hence:

\def\rq{\text{'}} \begin{aligned} \mathcal{L}((a|b)^*)\ =\ {&\lq\rq, \ &\lq a\rq, \lq b\rq,\ &\lq aa\rq, \lq ab\rq, \lq ba\rq, \lq bb\rq,\ &\lq aaa\rq, \lq aab\rq, \lq aba\rq, \lq abb\rq, \lq baa\rq, \lq bab\rq, \lq bba\rq,\lq bbb\rq\ \ \ \ \ \ \ \ \ \ \ &\cdots} \end{aligned}

\begin{aligned}
&\color{gray}\text{iteration 0 times}\\
&\color{gray}\text{iteration 1 time}\\
&\color{gray}\text{iteration 2 times}\\
&\color{gray}\text{iteration 3 times}\\\\
\end{aligned}

- Note that this language does not include any infinite strings of symbols, because the iterations only generates finite concatenations # 2.0 - Finite Automata (State Machines) - A finite automata is a finite state machine consisting of a finite set of states, with labelled transitions between states - Each transition is labelled with either a symbol $(a\in\Sigma)$ or empty $(\varepsilon)$. - We distinguish between `deterministic` and `non-deterministic` finite automata - For a `deterministic finite automaton` (DFA), empty transitions are not allowed, and for each state and symbol, the next state (if there is one) is uniquely determined. - For a `nondeterministic finite automata` (NFA), empty transitions are allowed and there may be multiple transitions from a state to different next states on the same input symbol. - To create a DFA, we first create a NFA ## 2.1 - Deterministic Finite Automata (Formal) - A DFA, $D$ consists of - A finite alphabet of symbols, $\Sigma$ - A finite set of states, $S$ - A transition function, $T: \Sigma\times Sβ‡ΈS,S$ which maps an (input) symbol and a (current) state to the (next) state; - The function T may not be defined for all pairs of symbol and state - Since the mapping may not exist for every symbol, state pair, we use the partial mapping symbol $β‡Έ$ - A start state $s_0$ - A set of final (or accepting) states, $F$ --- 🌱 The language of a DFA is the set of strings that it can recognise. - The language $\mathcal{L}(D)$ of a DFA $D$ is the set of finite strings of symbols from $\Sigma$ such that $c\in\mathcal{L}(D)$ if and only if there exists a sequence of states $s_1, s_2, \cdots, s_n$ where $n$ is the length of $c,$ such that:

T(c_1,s_0)=s_1\\
T(c_2,s_1)=s_2\\
\vdots\\
T(c_n, s_{n-1})=s(n)\\
$$
  • and s0s_0 is the start state and sns_n is in the set of final (or accepting) states, FF’
  • Through the concatenation of the labels c1,c2,⋯ ,cnc_1, c_2, \cdots, c_n, we should get the string that we’re trying to match.

2.1.1 - DFA Example

  • The following is a DFA as:
    • Ξ£={a,b,c}\Sigma=\{a,b,c\}
    • S={1,2,3,4}S=\{1, 2, 3, 4\}

T(a,1)=2\ T(c,1)=4\ T(b,2)=3

- $s_0=1$ as denoted by the arrow pointing into the state (this is convention) - $F=\{3,4\}$ as denoted by the red double circles. ![Untitled](/images/notes/COMP4403/Untitled%202.png) --- 🌱 What language (set of strings) does this DFA match? $$\def\rq{\text{'}} \mathcal{L}(DFA)=\{\lq ab\rq, \lq c\rq\}

  • In traversing from a start state to a final / accepting state, we can generate the strings β€œab” and β€œc’.
  • Therefore, the language of the DFA is $$ \def\rq{\text{'}} {\lq ab\rq, \lq c\rq}

## 2.2 - Nondeterministic Finite Automata (NFA) 🌱 The differences between the formal definition of a DFA and NFA are very similar - the differences are highlighted here in green. - A NFA, N, consists of: - A finite alphabet of symbols, $\Sigma$ - A finite set of states, $S$ - A transition function, $T:(\Sigma\cup\{\varepsilon\})\times Sβ‡Έ\mathbb{P}S$ which maps an input symbol or the empty string $(\varepsilon)$ and a current state to a set of possible (next) states; The function T may not be defined for all pairs of symbol and state - A start state, $s_0$ - A set of final (or accepting) states --- 🌱 The language of a NFA is the set of strings that it can recognise. This is similar to the definition of a language of a DFA. The differences are highlighted here in green. - The language $\mathcal{L}(D)$ of a NFA $N$ is the set of finite strings of symbols from $\Sigma$ such that $c\in\mathcal{L}(D)$ if and only if there exists a sequence $c'\in(\Sigma\cup\{\varepsilon\})^*$, such that removing all the $\varepsilon$ elements from $c'$ gives $c$ where $n$ is the length of $c',$ such that:

T(c_1,s_0)=s_1\\
T(c_2,s_1)=s_2\\
\vdots\\
T(c_n, s_{n-1})=s(n)\\
$$
  • and s0s_0 is the start state and sns_n is in the set of final (or accepting) states, FF’
  • Through the concatenation of the labels c1,c2,⋯ ,cnc_1, c_2, \cdots, c_n, we should get the string that we’re trying to match.
  • Empty transitions are used to generate the strings, but then omitted from the strings.

2.2.1 - NDA Example

  • The following is a NDA as:
    • Ξ£={a,b,c}\Sigma=\{a,b,c\}
    • S={1,2,3,4}S=\{1,2,3,4\}

\begin{aligned} T(\varepsilon,1)&={2,6}\ \ \ \ \ \ \ \ \ \ T(b,3)&={4}\ T(c,6)&={7} \end{aligned} \begin{aligned} T(a,2)={3}\ T(\varepsilon,4}={5}\ T(\varepsilon,7}={5} \end{aligned}

- $s_0=1$ - $F=\{5\}$ ![Untitled](/images/notes/COMP4403/Untitled%203.png) ## 2.3 - Converting a Regular Expression to NFA - The translation of a regular expression to a non-deterministic finite automaton is based on the structure of the regular expression - For each of the syntactic forms of regular expressions given in Definition 1, an NFA is given for that form. - If the syntactic form contains sub-expressions, (e.g. $e\ |\ f$ contains sub-expressions $e$ and $f$) the NFA for that form is constructed from the NFAs for the sub-expressions $e$ and $f$. 1. Single Symbol / Empty Transition 2. Concatenation 3. Alternatives 4. Repetition ### 2.3.1 - NFA for Single Symbol or Empty Transition 🌱 This is the NFA for matching a single symbol $a$ or empty transition $\varepsilon$ - Here, the final (accepting) states are indicated by a double circle. - Here, the topmost NFA matches the string $\def\rq{\text{'}}\lq a\rq$ - Additionally, the bottommost NFA matches the empty string, $\varepsilon$ ![Untitled](/images/notes/COMP4403/Untitled%204.png) ### 2.3.2 - NFA for Concatenation 🌱 This is the NFA template for matching the concatenation of the expressions $e\ f$. 1. Firstly, we transform the expression $e$ into its NFA ![Untitled](/images/notes/COMP4403/Untitled%205.png) 2. We next obtain the NFA for the expression $f$ ![Untitled](/images/notes/COMP4403/Untitled%206.png) 3. Joining the final state of expression $e$’s NFA to the start state of the NFA for expression $f$ gives the NFA for the concatenation of the expressions $e\ f$. - Note here that the final state of the NFA for expression $e$ is no longer a final state in the concatenated NFA. ![Untitled](/images/notes/COMP4403/Untitled%207.png) ### 2.3.3 - NFA for Alternatives 🌱 This is the NFA template for matching the alternative expressions $e\ |\ f$ 1. Firstly, we obtain the NFA for the expression $e$ ![Untitled](/images/notes/COMP4403/Untitled%205.png) 2. We similarly obtain the NFA for the expression $f$ ![Untitled](/images/notes/COMP4403/Untitled%206.png) 3. Joining the two NFAs in the following configuration gives the following larger NFA for alternation ![Untitled](/images/notes/COMP4403/Untitled%208.png) - Note that the final states of the NFAs for both $e$ and $f$are no longer final states in the larger NFA for $e\ |\ f$ ### 2.3.4 - Repetition 🌱 This is the NFA template for matching the repeated expression $e^*$ 1. Firstly, we obtain the NFA for the expression $e$ ![Untitled](/images/notes/COMP4403/Untitled%205.png) 2. We the introduce a new start and end state: - The start state has the option to match the expression $e$ again, or jump to the start state - The old end state may loop back to the old start state, or jump to the new end state: ![Untitled](/images/notes/COMP4403/Untitled%209.png) - We do this so that the repetition construct can match zero or more occurrences. ### 2.3.5 - NFA Example - Repetition and Concatenation 🌱 Construct the NFA for the Regular Expression $a\ b\ |\ c$ We want to construct the NFA for the regular expression $a\ b\ |\ c$. We recognise that the outermost construct is the alternative construct. To construct this NFA, we first need to create the NFAs for the sub-expressions $a\ b$ and $c$ 1. We first construct the NFA for the expression $a\ b$ - we recognise that this is a concatenation, and we first need to construct the NFAs for each of the sub-expressions 1. We first construct the NFA for the expression $a$ ![Untitled](/images/notes/COMP4403/Untitled%2010.png) 2. We the construct the NFA for the expression ![Untitled](/images/notes/COMP4403/Untitled%2011.png) 3. Using the NFA concatenation rules, we join the two sub-NFAs into the NFA for matching the concatenation of expressions $a\ b$ ![Untitled](/images/notes/COMP4403/Untitled%2012.png) 2. We the construct the NFA for the expression $c$ - this is straightforward as we are only creating a NFA that matches a single character. ![Untitled](/images/notes/COMP4403/Untitled%2013.png) Joining these sub-expression NFAs using the alternation rule, we get: ![Untitled](/images/notes/COMP4403/Untitled%2014.png) And, finally numbering the states: ![Untitled](/images/notes/COMP4403/Untitled%2015.png) ### 2.3.6 - Example for Repetition of Alternatives 🌱 Construct the NFA for the Regular Expression $(a\ |\ b)^*$ 1. We first construct the NFA for the expression $a$ ![Untitled](/images/notes/COMP4403/Untitled%2010.png) 2. We the construct the NFA for the expression $b$ ![Untitled](/images/notes/COMP4403/Untitled%2011.png) 3. We then place the NFAs for expression $a$ and $b$ into the NFA for alternatives ![Untitled](/images/notes/COMP4403/Untitled%2016.png) 4. We then place the NFA for the expression $a\ |\ b$ into the NFA for ![Untitled](/images/notes/COMP4403/Untitled%2017.png) ## 2.4 - From Regular Expression to NFA In the translation from a regular expression, $r$ to an NFA, the generated NFA has a few properties that do not necessarily hold for an arbitrary NFA (i.e. one not generated from a regular expression) - The NFA has a single final (accepting) state - The initial state of the NFA only has outgoing transitions - The final state only has incoming transitions The translation rules preserve these properties. ## 2.5 - From NFA to DFA A DFA cannot have - More than one transition leaving a state on the same symbol - Any empty transitions A NFA N can be translated to an equivalent DFA $D$ - Equivalent in the sense that they accept the same languages, i.e. $\mathcal{L}(N)=\mathcal{L}(D)$ - That is, they can match the same sets of strings How is this done? - An NFA is transformed into a DFA in which the labels on the states of the DFA are sets of states from the NFA - The sets of states that label a DFA state are formed by collecting all the states that can be reached from NFA states by empty transitions. ### 2.5.1 - Empty Closure **Empty Closure of a State** - The empty closure of a state $x$ in an NFA $N$ $\varepsilon$-closure$(x,N)$, is the set of states in $N$ that are reachable from $x$ via any number of empty transitions **Empty Closure of a Set of States** - The empty closure of a set of sates $X$ in a NFA $N$, $\varepsilon$-closure$(x,N)$, is the set of states in $N$ that are reachable from any of the states in $X$ via any number of empty transitions:

\varepsilon-\text{closure}(X,N)=\bigcup_{x\in X} \varepsilon-\text{closure}(x,N)
$$

Empty Closure Example

🌱 What is the empty closure of the NFA shown?

State X\text{State}\ XΞ΅βˆ’closure(x,NFA)\varepsilon-\text{closure}(x, NFA)
1{1,2,6}\{1, 2, 6\}
2{2}\{2\}
3{3}\{3\}
4{4,5}\{4,5\}
5{5}\{5\}
6{6}\{6\}
7{7,5}\{7, 5\}

Untitled

  1. From state 1, we can go to state 2 or 6 only transitioning on Ξ΅\varepsilon. Therefore, Ξ΅βˆ’closure(x,NFA)={1,2,6}\varepsilon-\text{closure}(x,NFA)=\{1, 2, 6\}
  2. From state 2, we can not go to any other states only transitioning on Ξ΅\varepsilon. Therefore, Ξ΅βˆ’closure(x,NFA)={2}\varepsilon-\text{closure}(x,NFA)=\{2\}
  3. From state 3, we can not go to any other states only transitioning on Ξ΅\varepsilon. Therefore, Ξ΅βˆ’closure(x,NFA)={3}\varepsilon-\text{closure}(x,NFA)=\{3\}
  4. From state 4, we can go to state 5 only transitioning on Ξ΅\varepsilon. Therefore, Ξ΅βˆ’closure(x,NFA)={4,5}\varepsilon-\text{closure}(x,NFA)=\{4,5\}
  5. From state 5, we can not go to any other states only transitioning on Ξ΅\varepsilon. Therefore, Ξ΅βˆ’closure(x,NFA)={5}\varepsilon-\text{closure}(x,NFA)=\{5\}
  6. From state 6, we can not go to any other states only transitioning on Ξ΅\varepsilon. Therefore, Ξ΅βˆ’closure(x,NFA)={6}\varepsilon-\text{closure}(x,NFA)=\{6\}
  7. From state 7, we can go to state 55 only transitioning on Ξ΅\varepsilon. Therefore, Ξ΅βˆ’closure(x,NFA)={7,5}\varepsilon-\text{closure}(x,NFA)=\{7, 5\}

2.6 - Constructing the DFA from the NFA

🌱 The first step in constructing the DFA from a NFA is determining the label of the start state.

  • The label of the start state of the DFA consists of the set of states containing the start state s0s_0 of the NFA plus all the states in the NFA that are reachable from its start state by one more more empty transitions:

    S0=Ξ΅βˆ’closure(s0,N)S_0=\varepsilon-\text{closure}(s_0,N)

  • The process for forming a DFA works with a set of unmarked DFA states by selecting an unmarked DFA state and considering all transitions from it on symbols

  • The initial set of unmarked states contains just S0S_0

    • Starting from the initial state, we go through every state that is reachable from the initial state.

The following process is repeated until there are no unmarked DFA states left.

  • An unmarked DFA state SS is selected (the first one is S0S_0)
  • For each symbol a (every symbol in the alphabet)
    • We consider the set of states that can be reached from any state in S by a transition on a call a;
    • call this set of states X
    • if X is nonempty, we add a new state to the DFA labelled with Xβ€²=Ξ΅βˆ’closure(X,N)X'=\varepsilon-\text{closure}(X, N), unless a state with that label already exists (in which case it is re-used)
    • A transition from SS to Xβ€²X' is added to the DFA
  • The state SSis marked as having been processed.

2.6.1 - NFA to DFA Process - Simple Example

🌱 Consider the following NFA and follow the (above) process to turn it into a DFA

Untitled

Untitled

  1. The start state of the DFA is going to consist of the initial start state, and all of the states that can be reached by state 1 by empty transitions. That is, our new initial stages merges the old states {1,2,6}\{1, 2, 6\}
  2. We now need to determine what states can be reached from this new state:
    • We can go from 2β†’3 on β€œa”
    • We can go from 6β†’7 on β€œc”
  3. On that transition to state 3, we transition to a new state that is state 3 + any state that we can reach from state 3 + empty transitions.
    • Since there are no such states, we just leave it as is
    • We can transition out of this state on β€œb”
  4. On that transition to state 7, we transition to a new state that is state 7 + any empty transition.
    • We can transition to state 5, therefore, the new state is {7,5}\{7, 5\}
  5. Revisiting the transition out of state 3 on β€œb”, we can get to state 4
    • This state also has a null transition to state 5: {4,5}\{4, 5\}
  6. The states that include 55 are final or accepting states as 5 was the original accepting state.

2.6.2 - NFA to DFA Process - Example 2

🌱 Consider the following NFA and follow the above process to turn it into a DFA for the REGEX statement for (b∣a)βˆ—aβˆ—(b|a)^* a^*

Untitled

  1. We start from the initial state of the DFA.
    • This state contains the NFA’s start state {1}\{1\}, and any state that we can get to via empty transitions {2,3,5,8,9,11}\{2, 3, 5, 8, 9, 11\}
    • Therefore, the empty closure of the start state is {1,2,3,5,8,9,11}\{1, 2, 3, 5, 8, 9, 11\}
      • This means that the start state of the DFA merges these states from the NFA
    • We now determine the transitions out of this state
      • We have a transition out on aa as 9β†’a109\overset{a}{\rightarrow}10
      • We have a transition out on bb as 3β†’b43\overset{b}{\rightarrow}4
      • We have a transition out on cc as 5β†’c65\overset{c}{\rightarrow} 6
    • We now mark the initial state as seen or visited, and move to other states.
  2. We now consider the transition out of the initial state on aa:
    • From NFA state 9, we can transition to state 10 on aa
    • From state 10, we can get to state 11 and back to state 9 on empty transitions
    • Therefore, the empty closure of this state is {10,9,11}\{10,9,11\}
  3. We now consider the transition out of the initial state on bb:
    • From NFA state 4, we can transition to the following states on empty transitions: {4,7,2,3,5,8,9,11}\{4, 7, 2, 3, 5, 8, 9, 11\}
    • From this state, we can transition out on aa, from 9β†’a109\overset a \rightarrow 10 which takes us to a state containing the empty closure of 10 (which is the state above)
    • From this state, we can transition out on bb, from 3β†’b43\overset b \rightarrow 4 which takes us to a state containing the empty closure of 4 (which is this state)
    • From this state, we can transition out on cc, from 5β†’c65\overset c \rightarrow 6 which takes us to a state containing the empty closure of 6 (which is the state below)
  4. We now consider the transition out of the initial state on cc:
    • From NFA state 5, we can transition to state 6 on cc
    • From state 6, we can get to the following states on empty transitions: (this is state 6’s empty closure) {6,7,2,3,5,8,9,11}\{6, 7, 2, 3, 5, 8, 9, 11\}
    • From this state, we can transition out on aa, from 9β†’a109\overset a \rightarrow 10 which takes us to a state containing the empty closure of 10
    • From this state, we can transition out on bb, from 3β†’b43\overset b \rightarrow 4 which takes us to a state containing the empty closure of 4
    • From this state, we can transition out on cc, from 5β†’c65\overset c \rightarrow 6 which takes us to a state containing the empty closure of 6 (which is this state)

Untitled

2.7 - Minimising a DFA

  • The DFA generated in the above example can be simplified to one with only two states.
  • To the right is a simplified version of the DFA with the states re-labelled to A, B, C and D
  • To minimise the DFA, we merge states that have the same transition to equivalent states.
  • For example, the states A, B and C are equivalent as they all:
    • Transition out on aa onto DD
    • Transition out on bb onto BB
    • Transition out on cc onto CC

Untitled

  • Therefore, the DFA can be minimised:

Untitled

2.7.1 - Process for Minimising a DFA

  • Start by partitioning the states into a group of final states, and a group of non-final states.

  • For example, in the example above, all of the states are final states so we just get the one group

    {A,B,C,D}\{A, B, C, D\}

  • For each symbol x,x, we require all sates in a group G1G_1 to transition to the same group G2G_2. For the example a transition on bb takes states A,B,CA, B, C back to the same group {A,B,C}\{A, B, C\} but there is no transition on bb from state DD, and hence we must split the group into groups {A,B,C}\{A, B, C\} and {D}\{D\}

  • Once that is done for every symbol xx, the transitions from all states in each group on xx are to the same group, and hence we have finished.

3.0 - Regular Expressions for Lexical Analyser

  • The lexical analyser generator JFlex makes use of translating regular expressions into DFAs in order to build the scanner
  • The input to the lexical analyser generator is a list of regular expressions, along with an action to be taken when that token is matched.
  • The generated lexical analyser matches:
    • The longest prefix of the input from the current position that matches any of the regular expressions in the list and
    • If the prefix of the input matches more than one regular expression, the action associated with the first matching regular expression in the list is executed.
  • For PL0, the string β€œend” followed by a blank matches both:
    • The regular expression for the keyword β€œend” and
    • The regular expression for an identifier
  • The action selected for β€œend” is that for the keyword β€œend” because the regular expression for the keyword appears before the regular expression for an identifier
  • The string β€œending” followed by a blank matches only the regular expression for an identifier; it will not be split into the keyword β€œend” followed by the identifier β€œing”
  • But the string β€œend ing” will be matched as the keyword β€œend” followed by the identifier β€œing”