Skip to content

Commit c33629b

Browse files
committed
comp/language/regular: Regular Languages
1 parent 88d1d3e commit c33629b

File tree

2 files changed

+216
-0
lines changed

2 files changed

+216
-0
lines changed

comp/language/regular/index.html

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
<!DOCTYPE html>
2+
<!-- Academia (pandoc HTML5 template)
3+
designer: soimort
4+
last updated: 2016-05-07 -->
5+
<html>
6+
<head>
7+
<meta charset="utf-8">
8+
<meta name="generator" content="pandoc">
9+
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
10+
<meta name="author" content="Mort Yao">
11+
<meta name="dcterms.date" content="2017-04-11">
12+
<title>Regular Languages</title>
13+
<link rel="canonical" href="https://wiki.soimort.org/comp/language/regular">
14+
<style type="text/css">code { white-space: pre; }</style>
15+
<link rel="stylesheet" href="//cdn.soimort.org/normalize/5.0.0/normalize.min.css">
16+
<link rel="stylesheet" href="//cdn.soimort.org/mathsvg/latest/mathsvg.min.css">
17+
<link rel="stylesheet" href="//cdn.soimort.org/fonts/latest/Latin-Modern-Roman.css">
18+
<link rel="stylesheet" href="//cdn.soimort.org/fonts/latest/Latin-Modern-Mono.css">
19+
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
20+
<link rel="stylesheet" href="/__/css/style.css">
21+
<link rel="stylesheet" href="/__/css/pygments.css">
22+
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
23+
<!--[if lt IE 9]>
24+
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
25+
<![endif]-->
26+
<script src="//cdn.soimort.org/jk/20160504/jk.min.js"></script>
27+
<script src="//cdn.soimort.org/mathsvg/latest/mathsvg.min.js"></script>
28+
<script src="/__/js/jk-minibar.js"></script>
29+
<link rel="icon" href="/favicon.png">
30+
<link rel="apple-touch-icon" href="/favicon.png">
31+
</head>
32+
<body>
33+
<main><article>
34+
<header>
35+
<h1 class="title">Regular Languages</h1>
36+
<address class="author">Mort Yao</address>
37+
<!-- h3 class="date">2017-04-11</h3 -->
38+
</header>
39+
<div id="content">
40+
<p><strong>Deterministic finite automaton (DFA).</strong> A <em>deterministic finite automation</em> <span class="math inline">\(M\)</span> is a 5-tuple <span class="math inline">\((Q, \Sigma, \delta, q_0, F)\)</span>, where</p>
41+
<ol type="1">
42+
<li><span class="math inline">\(Q\)</span> is a finite set called the <em>states</em>,</li>
43+
<li><span class="math inline">\(\Sigma\)</span> is a finite set called the <em>alphabet</em>,</li>
44+
<li><span class="math inline">\(\delta: Q \times \Sigma \to Q\)</span> is the <em>transition function</em>,</li>
45+
<li><span class="math inline">\(q_0 \in Q\)</span> is the <em>start state</em> (also called the <em>initial state</em>),</li>
46+
<li><span class="math inline">\(F \subseteq Q\)</span> is the set of <em>accept states</em> (also called <em>final states</em>).</li>
47+
</ol>
48+
<p>We say that the DFA <span class="math inline">\(M = (Q, \Sigma, \delta, q_0, F)\)</span> <em>accepts</em> a string <span class="math inline">\(w = a_1 \cdots a_n\)</span> (where each <span class="math inline">\(a_i \in \Sigma\)</span>) if and only if there exists a sequence of states <span class="math inline">\(r_0, \dots, r_n\)</span> (where each <span class="math inline">\(r_i \in Q\)</span>) such that:</p>
49+
<ol type="1">
50+
<li><span class="math inline">\(r_0 = q_0\)</span>,</li>
51+
<li>For every <span class="math inline">\(0 \leq i &lt; n\)</span>, <span class="math inline">\(r_{i+1} = \delta(r_i, a_{i+1})\)</span>,</li>
52+
<li><span class="math inline">\(r_n \in F\)</span>.</li>
53+
</ol>
54+
<p>Otherwise, we say that the DFA <span class="math inline">\(M\)</span> <em>rejects</em> the string <span class="math inline">\(w\)</span>.</p>
55+
<p><span class="math inline">\(L\)</span> is the <em>language</em> of DFA <span class="math inline">\(M\)</span>, denoted as <span class="math inline">\(\mathcal{L}(M) = L\)</span>, if and only if <span class="math inline">\(L = \{w\ |\ w \text{ is a string accepted by } M\}\)</span>. We say that the DFA <span class="math inline">\(M\)</span> <em>recognizes</em> the language <span class="math inline">\(L\)</span>.</p>
56+
<p><strong>Regular language (RL).</strong> A language <span class="math inline">\(L\)</span> is called a <em>regular language</em> if there exists a DFA <span class="math inline">\(M\)</span> that recognizes <span class="math inline">\(L\)</span>.</p>
57+
<p><strong>Regular operations.</strong> Given languages <span class="math inline">\(L_1\)</span>, <span class="math inline">\(L_2\)</span> and <span class="math inline">\(L\)</span>, we define the <em>regular operations</em> as follows:</p>
58+
<ul>
59+
<li><em>Union</em>: <span class="math inline">\(L_1 \cup L_2 = \{ w\ |\ w \in L_1 \lor w \in L_2 \}\)</span>.</li>
60+
<li><em>Concatenation</em>: <span class="math inline">\(L_1 \circ L_2 = \{ w_1w_2\ |\ w_1 \in L_1 \land w_2 \in L_2\}\)</span>.</li>
61+
<li><em>Kleene star</em>: <span class="math inline">\(L^* = \{ w_1w_2 \dots w_k\ |\ k \geq 0 \land \forall i \in \{1, \dots, k\} : w_i \in L \}\)</span>.</li>
62+
</ul>
63+
<p><strong>Theorem 1. (Closure properties)</strong> The class of regular languages is closed under regular operations.</p>
64+
<p><strong>Nondeterministic finite automaton (NFA).</strong> A <em>nondeterministic finite automaton</em> is a 5-tuple <span class="math inline">\((Q, \Sigma, \delta, q_0, F)\)</span>, where</p>
65+
<ol type="1">
66+
<li><span class="math inline">\(Q\)</span> is a finite set of states,</li>
67+
<li><span class="math inline">\(\Sigma\)</span> is a finite alphabet,</li>
68+
<li><span class="math inline">\(\delta: Q \times \Sigma_\varepsilon \to \mathcal{P}(Q)\)</span> is the transition function, (where <span class="math inline">\(\Sigma_\varepsilon = \Sigma \cup \{\varepsilon\}\)</span>)</li>
69+
<li><span class="math inline">\(q_0 \in Q\)</span> is the start state,</li>
70+
<li><span class="math inline">\(F \subseteq Q\)</span> is the set of accept states.</li>
71+
</ol>
72+
<p>We say that the NFA <span class="math inline">\(M = (Q, \Sigma, \delta, q_0, F)\)</span> <em>accepts</em> a string <span class="math inline">\(w\)</span> if <span class="math inline">\(w\)</span> may be written as <span class="math inline">\(w = a_1 \cdots a_n\)</span> (where each <span class="math inline">\(a_i \in \Sigma_\varepsilon\)</span>), and there exists a sequence of states <span class="math inline">\(r_0, \dots, r_n\)</span> (where each <span class="math inline">\(r_i \in Q\)</span>) such that:</p>
73+
<ol type="1">
74+
<li><span class="math inline">\(r_0 = q_0\)</span>,</li>
75+
<li>For every <span class="math inline">\(0 \leq i &lt; n\)</span>, <span class="math inline">\(r_{i+1} \in \delta(r_i, a_{i+1})\)</span>,</li>
76+
<li><span class="math inline">\(r_n \in F\)</span>.</li>
77+
</ol>
78+
<p>Otherwise, we say that the NFA <span class="math inline">\(M\)</span> <em>rejects</em> the string <span class="math inline">\(w\)</span>.</p>
79+
<p><span class="math inline">\(L\)</span> is the language of NFA <span class="math inline">\(M\)</span>, denoted as <span class="math inline">\(\mathcal{L}(M) = L\)</span>, if and only if <span class="math inline">\(L = \{w\ |\ w \text{ is a string accepted by } M\}\)</span>. We say that the NFA <span class="math inline">\(M\)</span> <em>recognizes</em> the language <span class="math inline">\(L\)</span>.</p>
80+
<p>Clearly, every DFA has an equivalent NFA (that recognizes exactly the same language), by taking <span class="math inline">\(Q \times \Sigma \subset Q \times \Sigma_\varepsilon\)</span> as the domain and <span class="math inline">\(Q \in \mathcal{P}(Q)\)</span> as the range of the transition function.</p>
81+
<p><strong>Theorem 2. (Rabin-Scott powerset construction)</strong> Every NFA has an equivalent DFA.</p>
82+
<p><strong>Corollary 3.</strong> A language <span class="math inline">\(L\)</span> is regular if and only if there exists an NFA <span class="math inline">\(M\)</span> that recognizes <span class="math inline">\(L\)</span>.</p>
83+
<p><strong>Regular expression (RE).</strong> A <em>regular expression</em> <span class="math inline">\(R\)</span> is defined as <span class="math display">\[R ::= a\ |\ \varepsilon\ |\ \emptyset\ |\ (R_1 \cup R_2)\ |\ (R_1 \circ R_2)\ |\ (R_1^*)\]</span> where <span class="math inline">\(a \in \Sigma\)</span>, <span class="math inline">\(R_1\)</span> and <span class="math inline">\(R_2\)</span> are regular expressions.</p>
84+
<p><span class="math inline">\(L\)</span> is the language of regular expression <span class="math inline">\(R\)</span>, denoted as <span class="math inline">\(\mathcal{L}(R) = L\)</span>, if and only if <span class="math inline">\(L = \{ w\ |\ w \text{ is a string in the form of } R \}\)</span>.</p>
85+
<p><strong>Theorem 4. (Kleene’s theorem)</strong> A language <span class="math inline">\(L\)</span> is regular if and only if there exists a regular expression <span class="math inline">\(R\)</span> that describes <span class="math inline">\(L\)</span>.</p>
86+
<p><strong>Theorem 5. (Pumping lemma)</strong> If <span class="math inline">\(L\)</span> is a regular language, then there is a number <span class="math inline">\(p\)</span> (called the <em>pumping length</em>) such that if <span class="math inline">\(w \in L\)</span> and <span class="math inline">\(|w| \geq p\)</span>, then <span class="math inline">\(w\)</span> may be written as <span class="math inline">\(w = xyz\)</span>, under the following conditions:</p>
87+
<ol type="1">
88+
<li>For every <span class="math inline">\(i \geq 0\)</span>, <span class="math inline">\(xy^iz \in L\)</span>,</li>
89+
<li><span class="math inline">\(|y| &gt; 0\)</span>,</li>
90+
<li><span class="math inline">\(|xy| \leq p\)</span>.</li>
91+
</ol>
92+
<p><strong>Theorem 6. (Myhill-Nerode theorem)</strong> Let <span class="math inline">\(L\)</span> be a language over <span class="math inline">\(\Sigma\)</span>. We say that strings <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> are <em>indistinguishable</em> by <span class="math inline">\(L\)</span> if and only if for every string <span class="math inline">\(z\)</span>, we have both <span class="math inline">\(xz \in L\)</span> and <span class="math inline">\(yz \in L\)</span>, or both <span class="math inline">\(xz \notin L\)</span> and <span class="math inline">\(yz \notin L\)</span>, denoted as <span class="math inline">\(x \equiv_L y\)</span>. Then,</p>
93+
<ol type="1">
94+
<li><span class="math inline">\(L\)</span> is regular if and only if the equivalence relation <span class="math inline">\(\equiv_L\)</span> has a finite number of equivalence classes;</li>
95+
<li>There exists a DFA <span class="math inline">\(M\)</span> with <span class="math inline">\(\mathcal{L}(M) = L\)</span> having precisely one state for each equivalence class of <span class="math inline">\(\equiv_L\)</span>.</li>
96+
</ol>
97+
</div>
98+
<footer>
99+
<!-- TO BE MODIFIED BY NEED -->
100+
<a title="Keyboard shortcut: q"
101+
href="..">
102+
<i class="fa fa-angle-double-left" aria-hidden="true"></i>
103+
<code>Parent</code>
104+
</a> |
105+
<a class="raw" accesskey="r"
106+
title="Keyboard shortcut: R"
107+
href="https://wiki.soimort.org/comp/language/regular/src.md">
108+
<i class="fa fa-code" aria-hidden="true"></i>
109+
<code>Raw</code>
110+
</a> |
111+
<a class="history" accesskey="h"
112+
title="Keyboard shortcut: H"
113+
href="https://github.com/soimort/wiki/commits/gh-pages/comp/language/regular/src.md">
114+
<i class="fa fa-history" aria-hidden="true"></i>
115+
<code>History</code>
116+
</a> |
117+
<a class="edit" accesskey="e"
118+
title="Keyboard shortcut: E"
119+
href="https://github.com/soimort/wiki/edit/gh-pages/comp/language/regular/src.md">
120+
<i class="fa fa-code-fork" aria-hidden="true"></i>
121+
<code>Edit</code>
122+
</a> |
123+
<a title="Keyboard shortcut: p"
124+
href="javascript:window.print();">
125+
<i class="fa fa-print" aria-hidden="true"></i>
126+
<code>Print</code>
127+
</a> |
128+
<a title="Keyboard shortcut: ."
129+
href="https://wiki.soimort.org/comp/language/regular">
130+
<i class="fa fa-anchor" aria-hidden="true"></i>
131+
<code>Permalink</code>
132+
</a> |
133+
Last updated: <span id="update-time">2017-04-11</span>
134+
</footer>
135+
</article></main>
136+
</body>
137+
</html>

comp/language/regular/src.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
% Regular Languages
2+
% Mort Yao
3+
% 2017-04-11
4+
5+
**Deterministic finite automaton (DFA).**
6+
A *deterministic finite automation* $M$ is a 5-tuple $(Q, \Sigma, \delta, q_0, F)$, where
7+
8+
1. $Q$ is a finite set called the *states*,
9+
2. $\Sigma$ is a finite set called the *alphabet*,
10+
3. $\delta: Q \times \Sigma \to Q$ is the *transition function*,
11+
4. $q_0 \in Q$ is the *start state* (also called the *initial state*),
12+
5. $F \subseteq Q$ is the set of *accept states* (also called *final states*).
13+
14+
We say that the DFA $M = (Q, \Sigma, \delta, q_0, F)$ *accepts* a string $w = a_1 \cdots a_n$ (where each $a_i \in \Sigma$) if and only if there exists a sequence of states $r_0, \dots, r_n$ (where each $r_i \in Q$) such that:
15+
16+
1. $r_0 = q_0$,
17+
2. For every $0 \leq i < n$, $r_{i+1} = \delta(r_i, a_{i+1})$,
18+
3. $r_n \in F$.
19+
20+
Otherwise, we say that the DFA $M$ *rejects* the string $w$.
21+
22+
$L$ is the *language* of DFA $M$, denoted as $\mathcal{L}(M) = L$, if and only if $L = \{w\ |\ w \text{ is a string accepted by } M\}$.
23+
We say that the DFA $M$ *recognizes* the language $L$.
24+
25+
**Regular language (RL).** A language $L$ is called a *regular language* if there exists a DFA $M$ that recognizes $L$.
26+
27+
**Regular operations.** Given languages $L_1$, $L_2$ and $L$, we define the *regular operations* as follows:
28+
29+
* *Union*: $L_1 \cup L_2 = \{ w\ |\ w \in L_1 \lor w \in L_2 \}$.
30+
* *Concatenation*: $L_1 \circ L_2 = \{ w_1w_2\ |\ w_1 \in L_1 \land w_2 \in L_2\}$.
31+
* *Kleene star*: $L^* = \{ w_1w_2 \dots w_k\ |\ k \geq 0 \land \forall i \in \{1, \dots, k\} : w_i \in L \}$.
32+
33+
**Theorem 1. (Closure properties)** The class of regular languages is closed under regular operations.
34+
35+
**Nondeterministic finite automaton (NFA).**
36+
A *nondeterministic finite automaton* is a 5-tuple $(Q, \Sigma, \delta, q_0, F)$, where
37+
38+
1. $Q$ is a finite set of states,
39+
2. $\Sigma$ is a finite alphabet,
40+
3. $\delta: Q \times \Sigma_\varepsilon \to \mathcal{P}(Q)$ is the transition function,
41+
(where $\Sigma_\varepsilon = \Sigma \cup \{\varepsilon\}$)
42+
4. $q_0 \in Q$ is the start state,
43+
5. $F \subseteq Q$ is the set of accept states.
44+
45+
We say that the NFA $M = (Q, \Sigma, \delta, q_0, F)$ *accepts* a string $w$ if $w$ may be written as $w = a_1 \cdots a_n$ (where each $a_i \in \Sigma_\varepsilon$), and there exists a sequence of states $r_0, \dots, r_n$ (where each $r_i \in Q$) such that:
46+
47+
1. $r_0 = q_0$,
48+
2. For every $0 \leq i < n$, $r_{i+1} \in \delta(r_i, a_{i+1})$,
49+
3. $r_n \in F$.
50+
51+
Otherwise, we say that the NFA $M$ *rejects* the string $w$.
52+
53+
$L$ is the language of NFA $M$, denoted as $\mathcal{L}(M) = L$, if and only if $L = \{w\ |\ w \text{ is a string accepted by } M\}$.
54+
We say that the NFA $M$ *recognizes* the language $L$.
55+
56+
Clearly, every DFA has an equivalent NFA (that recognizes exactly the same language), by taking $Q \times \Sigma \subset Q \times \Sigma_\varepsilon$ as the domain and $Q \in \mathcal{P}(Q)$ as the range of the transition function.
57+
58+
**Theorem 2. (Rabin-Scott powerset construction)** Every NFA has an equivalent DFA.
59+
60+
**Corollary 3.** A language $L$ is regular if and only if there exists an NFA $M$ that recognizes $L$.
61+
62+
**Regular expression (RE).** A *regular expression* $R$ is defined as
63+
$$R ::= a\ |\ \varepsilon\ |\ \emptyset\ |\ (R_1 \cup R_2)\ |\ (R_1 \circ R_2)\ |\ (R_1^*)$$
64+
where $a \in \Sigma$, $R_1$ and $R_2$ are regular expressions.
65+
66+
$L$ is the language of regular expression $R$, denoted as $\mathcal{L}(R) = L$, if and only if $L = \{ w\ |\ w \text{ is a string in the form of } R \}$.
67+
68+
**Theorem 4. (Kleene's theorem)** A language $L$ is regular if and only if there exists a regular expression $R$ that describes $L$.
69+
70+
**Theorem 5. (Pumping lemma)** If $L$ is a regular language, then there is a number $p$ (called the *pumping length*) such that if $w \in L$ and $|w| \geq p$, then $w$ may be written as $w = xyz$, under the following conditions:
71+
72+
1. For every $i \geq 0$, $xy^iz \in L$,
73+
2. $|y| > 0$,
74+
3. $|xy| \leq p$.
75+
76+
**Theorem 6. (Myhill-Nerode theorem)** Let $L$ be a language over $\Sigma$. We say that strings $x$ and $y$ are *indistinguishable* by $L$ if and only if for every string $z$, we have both $xz \in L$ and $yz \in L$, or both $xz \notin L$ and $yz \notin L$, denoted as $x \equiv_L y$. Then,
77+
78+
1. $L$ is regular if and only if the equivalence relation $\equiv_L$ has a finite number of equivalence classes;
79+
2. There exists a DFA $M$ with $\mathcal{L}(M) = L$ having precisely one state for each equivalence class of $\equiv_L$.

0 commit comments

Comments
 (0)