Skip to content

Commit d4dd60d

Browse files
committed
comp/language/context-free: Context-Free Languages
1 parent c33629b commit d4dd60d

File tree

2 files changed

+186
-0
lines changed

2 files changed

+186
-0
lines changed
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
<!DOCTYPE html>
2+
<!-- Academia (pandoc HTML5 template)
3+
designer: soimort
4+
last updated: 2016-05-07 -->
5+
<html>
6+
<head>
7+
<meta charset="utf-8">
8+
<meta name="generator" content="pandoc">
9+
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
10+
<meta name="author" content="Mort Yao">
11+
<meta name="dcterms.date" content="2017-04-11">
12+
<title>Context-Free Languages</title>
13+
<link rel="canonical" href="https://wiki.soimort.org/comp/language/context-free">
14+
<style type="text/css">code { white-space: pre; }</style>
15+
<link rel="stylesheet" href="//cdn.soimort.org/normalize/5.0.0/normalize.min.css">
16+
<link rel="stylesheet" href="//cdn.soimort.org/mathsvg/latest/mathsvg.min.css">
17+
<link rel="stylesheet" href="//cdn.soimort.org/fonts/latest/Latin-Modern-Roman.css">
18+
<link rel="stylesheet" href="//cdn.soimort.org/fonts/latest/Latin-Modern-Mono.css">
19+
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
20+
<link rel="stylesheet" href="/__/css/style.css">
21+
<link rel="stylesheet" href="/__/css/pygments.css">
22+
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
23+
<!--[if lt IE 9]>
24+
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
25+
<![endif]-->
26+
<script src="//cdn.soimort.org/jk/20160504/jk.min.js"></script>
27+
<script src="//cdn.soimort.org/mathsvg/latest/mathsvg.min.js"></script>
28+
<script src="/__/js/jk-minibar.js"></script>
29+
<link rel="icon" href="/favicon.png">
30+
<link rel="apple-touch-icon" href="/favicon.png">
31+
</head>
32+
<body>
33+
<main><article>
34+
<header>
35+
<h1 class="title">Context-Free Languages</h1>
36+
<address class="author">Mort Yao</address>
37+
<!-- h3 class="date">2017-04-11</h3 -->
38+
</header>
39+
<div id="content">
40+
<p><strong>Context-free grammar (CFG).</strong> A <em>context-free grammar</em> <span class="math inline">\(G\)</span> is a 4-tuple <span class="math inline">\((V, \Sigma, R, S)\)</span>, where</p>
41+
<ol type="1">
42+
<li><span class="math inline">\(V\)</span> is a finite set called the <em>variables</em>,</li>
43+
<li><span class="math inline">\(\Sigma\)</span> is a finite set called the <em>terminals</em>, (<span class="math inline">\(\Sigma \cap V = \emptyset\)</span>)</li>
44+
<li><span class="math inline">\(R \subseteq V \times (V \cup \Sigma)^*\)</span> is a finite set of <em>substitution rules</em>,</li>
45+
<li><span class="math inline">\(S \in V\)</span> is the <em>start variable</em>.</li>
46+
</ol>
47+
<p>Given <span class="math inline">\(u, v, w \in (V \cup \Sigma)^*\)</span>, and <span class="math inline">\(A \mapsto w\)</span> is a substitution rule, we say that <span class="math inline">\(uAv\)</span> <em>yields</em> <span class="math inline">\(uwv\)</span>, denoted as <span class="math inline">\(uAv \Rightarrow uwv\)</span>.</p>
48+
<p>Moreover, given <span class="math inline">\(w_0, w \in (V \cup \Sigma)^*\)</span>, if <span class="math inline">\(w_0 = w\)</span> or if there exists a sequence <span class="math inline">\(w_1, \dots, w_k\)</span> (where <span class="math inline">\(k \geq 0\)</span>, and <span class="math inline">\(\forall 1 \leq i \leq k : w_i \in (V \cup \Sigma)^*\)</span>) such that <span class="math inline">\(w_0 \Rightarrow w_1 \Rightarrow \dots \Rightarrow w_k \Rightarrow w\)</span>, we say that <span class="math inline">\(w_0\)</span> <em>derives</em> <span class="math inline">\(w\)</span>, denoted as <span class="math inline">\(w_0 \Rightarrow^* w\)</span>; the sequence is called a <em>derivation</em>. A derivation is a <em>leftmost derivation</em> if at every step the leftmost remaining variable is replaced. The leftmost derivation of a string corresponds to the pre-order traversal of its <em>parse tree</em>.</p>
49+
<p><span class="math inline">\(L\)</span> is the <em>language of grammar</em> <span class="math inline">\(G\)</span>, denoted as <span class="math inline">\(\mathcal{L}(G) = L\)</span>, if and only if <span class="math inline">\(L = \{ w \in \Sigma^*\ |\ S \Rightarrow^* w \}\)</span>. We say that the grammar <span class="math inline">\(G\)</span> <em>generates</em> the language <span class="math inline">\(L\)</span>.</p>
50+
<p><strong>Context-free language (CFL).</strong> A language <span class="math inline">\(L\)</span> is called a <em>context-free language</em> if there exists a CFG <span class="math inline">\(G\)</span> that generates <span class="math inline">\(L\)</span>.</p>
51+
<p>A grammar <span class="math inline">\(G\)</span> is said to be <em>ambiguous</em>, if there exists a string <span class="math inline">\(w \in \mathcal{L}(G)\)</span> with two or more leftmost derivations.</p>
52+
<p><strong>Chomsky normal form (CNF).</strong> A CFG is in <em>Chomsky normal form</em> if every substitution rule is of the form <span class="math display">\[\begin{aligned}
53+
A &amp;\mapsto BC \\
54+
A &amp;\mapsto a \\
55+
S &amp;\mapsto \varepsilon
56+
\end{aligned}\]</span> where <span class="math inline">\(A, B, C \in V\)</span>, <span class="math inline">\(B \neq S\)</span>, <span class="math inline">\(C \neq S\)</span>, and <span class="math inline">\(a \in \Sigma\)</span>.</p>
57+
<p><strong>Theorem 1.</strong> Every context-free language is generated by a CFG in Chomsky normal form. (Alternatively: any CFG can be converted into Chomsky normal form that generates the same language.)</p>
58+
<p><strong>Theorem 2. (Closure properties)</strong> The class of context-free languages is closed under regular operations (i.e., union, concatenation and Kleene star).</p>
59+
<p><strong>Nondeterministic pushdown automaton (PDA).</strong> A <em>nondeterministic pushdown automaton</em> <span class="math inline">\(M\)</span> is a 6-tuple <span class="math inline">\((Q, \Sigma, \Gamma, \delta, q_0, F)\)</span>, where</p>
60+
<ol type="1">
61+
<li><span class="math inline">\(Q\)</span> is a finite set called the <em>states</em>,</li>
62+
<li><span class="math inline">\(\Sigma\)</span> is a finite set called the <em>input alphabet</em>,</li>
63+
<li><span class="math inline">\(\Gamma\)</span> is a finite set called the <em>stack alphabet</em>,</li>
64+
<li><span class="math inline">\(\delta : Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)\)</span> is the <em>transition function</em>, (where <span class="math inline">\(\Sigma_\varepsilon = \Sigma \cup \{\varepsilon\}\)</span> and <span class="math inline">\(\Gamma_\varepsilon = \Gamma \cup \{\varepsilon\}\)</span>)</li>
65+
<li><span class="math inline">\(q_0 \in Q\)</span> is the <em>start state</em>,</li>
66+
<li><span class="math inline">\(F \subseteq Q\)</span> is the set of <em>accept states</em>.</li>
67+
</ol>
68+
<p>We say that the PDA <span class="math inline">\(M = (Q, \Sigma, \Gamma, \delta, q_0, F)\)</span> <em>accepts</em> a string <span class="math inline">\(w\)</span> if <span class="math inline">\(w\)</span> may be written as <span class="math inline">\(w = a_1 \cdots a_n\)</span> (where each <span class="math inline">\(a_i \in \Sigma_\varepsilon\)</span>), and there exists a sequence of states <span class="math inline">\(r_0, \dots, r_n\)</span> (where each <span class="math inline">\(r_i \in Q\)</span>) and a sequence of strings <span class="math inline">\(s_0, \dots, s_n\)</span> (where each <span class="math inline">\(s_i \in \Gamma^*\)</span>), such that</p>
69+
<ol type="1">
70+
<li><span class="math inline">\(r_0 = q_0\)</span> and <span class="math inline">\(s_0 = \varepsilon\)</span>,</li>
71+
<li>For every <span class="math inline">\(0 \leq i &lt; n\)</span>, <span class="math inline">\((r_{i+1}, \beta) \in \delta(r_i, a_{i+1}, \alpha)\)</span>, where <span class="math inline">\(s_i = \alpha t\)</span> and <span class="math inline">\(s_{i+1} = \beta t\)</span> for some <span class="math inline">\(\alpha, \beta \in \Gamma_\varepsilon\)</span> and <span class="math inline">\(t \in \Gamma^*\)</span>.</li>
72+
<li><span class="math inline">\(r_n \in F\)</span>.</li>
73+
</ol>
74+
<p>Otherwise, we say that the PDA <span class="math inline">\(M\)</span> <em>rejects</em> the string <span class="math inline">\(w\)</span>.</p>
75+
<p><span class="math inline">\(L\)</span> is the language of PDA <span class="math inline">\(M\)</span>, denoted as <span class="math inline">\(\mathcal{L}(M) = L\)</span>, if and only if <span class="math inline">\(L = \{w\ |\ w \text{ is a string accepted by } M\}\)</span>. We say that the PDA <span class="math inline">\(M\)</span> <em>recognizes</em> the language <span class="math inline">\(L\)</span>.</p>
76+
<p><strong>Theorem 3.</strong> A language <span class="math inline">\(L\)</span> is context free if and only if there exists a PDA <span class="math inline">\(M\)</span> that recognizes <span class="math inline">\(L\)</span>.</p>
77+
<p><strong>Theorem 4.</strong> Every regular language is context free.</p>
78+
<p><strong>Theorem 5. (Pumping lemma)</strong> If <span class="math inline">\(L\)</span> is a context-free language, then there is a number <span class="math inline">\(p\)</span> (called the <em>pumping length</em>) such that if <span class="math inline">\(w \in L\)</span> and <span class="math inline">\(|w| \geq p\)</span>, then <span class="math inline">\(w\)</span> may be written as <span class="math inline">\(w = uvxyz\)</span>, under the following conditions:</p>
79+
<ol type="1">
80+
<li>For every <span class="math inline">\(i \geq 0\)</span>, <span class="math inline">\(uv^ixy^iz \in L\)</span>,</li>
81+
<li><span class="math inline">\(|vy| &gt; 0\)</span>,</li>
82+
<li><span class="math inline">\(|vxy| \leq p\)</span>.</li>
83+
</ol>
84+
</div>
85+
<footer>
86+
<!-- TO BE MODIFIED BY NEED -->
87+
<a title="Keyboard shortcut: q"
88+
href="..">
89+
<i class="fa fa-angle-double-left" aria-hidden="true"></i>
90+
<code>Parent</code>
91+
</a> |
92+
<a class="raw" accesskey="r"
93+
title="Keyboard shortcut: R"
94+
href="https://wiki.soimort.org/comp/language/context-free/src.md">
95+
<i class="fa fa-code" aria-hidden="true"></i>
96+
<code>Raw</code>
97+
</a> |
98+
<a class="history" accesskey="h"
99+
title="Keyboard shortcut: H"
100+
href="https://github.com/soimort/wiki/commits/gh-pages/comp/language/context-free/src.md">
101+
<i class="fa fa-history" aria-hidden="true"></i>
102+
<code>History</code>
103+
</a> |
104+
<a class="edit" accesskey="e"
105+
title="Keyboard shortcut: E"
106+
href="https://github.com/soimort/wiki/edit/gh-pages/comp/language/context-free/src.md">
107+
<i class="fa fa-code-fork" aria-hidden="true"></i>
108+
<code>Edit</code>
109+
</a> |
110+
<a title="Keyboard shortcut: p"
111+
href="javascript:window.print();">
112+
<i class="fa fa-print" aria-hidden="true"></i>
113+
<code>Print</code>
114+
</a> |
115+
<a title="Keyboard shortcut: ."
116+
href="https://wiki.soimort.org/comp/language/context-free">
117+
<i class="fa fa-anchor" aria-hidden="true"></i>
118+
<code>Permalink</code>
119+
</a> |
120+
Last updated: <span id="update-time">2017-04-11</span>
121+
</footer>
122+
</article></main>
123+
</body>
124+
</html>

comp/language/context-free/src.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
% Context-Free Languages
2+
% Mort Yao
3+
% 2017-04-11
4+
5+
**Context-free grammar (CFG).** A *context-free grammar* $G$ is a 4-tuple $(V, \Sigma, R, S)$, where
6+
7+
1. $V$ is a finite set called the *variables*,
8+
2. $\Sigma$ is a finite set called the *terminals*, ($\Sigma \cap V = \emptyset$)
9+
3. $R \subseteq V \times (V \cup \Sigma)^*$ is a finite set of *substitution rules*,
10+
4. $S \in V$ is the *start variable*.
11+
12+
Given $u, v, w \in (V \cup \Sigma)^*$, and $A \mapsto w$ is a substitution rule, we say that $uAv$ *yields* $uwv$, denoted as $uAv \Rightarrow uwv$.
13+
14+
Moreover, given $w_0, w \in (V \cup \Sigma)^*$, if $w_0 = w$ or if there exists a sequence $w_1, \dots, w_k$ (where $k \geq 0$, and $\forall 1 \leq i \leq k : w_i \in (V \cup \Sigma)^*$) such that $w_0 \Rightarrow w_1 \Rightarrow \dots \Rightarrow w_k \Rightarrow w$, we say that $w_0$ *derives* $w$, denoted as $w_0 \Rightarrow^* w$; the sequence is called a *derivation*. A derivation is a *leftmost derivation* if at every step the leftmost remaining variable is replaced. The leftmost derivation of a string corresponds to the pre-order traversal of its *parse tree*.
15+
16+
$L$ is the *language of grammar* $G$, denoted as $\mathcal{L}(G) = L$, if and only if $L = \{ w \in \Sigma^*\ |\ S \Rightarrow^* w \}$. We say that the grammar $G$ *generates* the language $L$.
17+
18+
**Context-free language (CFL).** A language $L$ is called a *context-free language* if there exists a CFG $G$ that generates $L$.
19+
20+
A grammar $G$ is said to be *ambiguous*, if there exists a string $w \in \mathcal{L}(G)$ with two or more leftmost derivations.
21+
22+
**Chomsky normal form (CNF).** A CFG is in *Chomsky normal form* if every substitution rule is of the form
23+
$$\begin{aligned}
24+
A &\mapsto BC \\
25+
A &\mapsto a \\
26+
S &\mapsto \varepsilon
27+
\end{aligned}$$
28+
where $A, B, C \in V$, $B \neq S$, $C \neq S$, and $a \in \Sigma$.
29+
30+
**Theorem 1.** Every context-free language is generated by a CFG in Chomsky normal form. (Alternatively: any CFG can be converted into Chomsky normal form that generates the same language.)
31+
32+
**Theorem 2. (Closure properties)** The class of context-free languages is closed under regular operations (i.e., union, concatenation and Kleene star).
33+
34+
**Nondeterministic pushdown automaton (PDA).** A *nondeterministic pushdown automaton* $M$ is a 6-tuple $(Q, \Sigma, \Gamma, \delta, q_0, F)$, where
35+
36+
1. $Q$ is a finite set called the *states*,
37+
2. $\Sigma$ is a finite set called the *input alphabet*,
38+
3. $\Gamma$ is a finite set called the *stack alphabet*,
39+
4. $\delta : Q \times \Sigma_\varepsilon \times \Gamma_\varepsilon \to \mathcal{P}(Q \times \Gamma_\varepsilon)$ is the *transition function*, (where $\Sigma_\varepsilon = \Sigma \cup \{\varepsilon\}$ and $\Gamma_\varepsilon = \Gamma \cup \{\varepsilon\}$)
40+
5. $q_0 \in Q$ is the *start state*,
41+
6. $F \subseteq Q$ is the set of *accept states*.
42+
43+
We say that the PDA $M = (Q, \Sigma, \Gamma, \delta, q_0, F)$ *accepts* a string $w$ if $w$ may be written as $w = a_1 \cdots a_n$ (where each $a_i \in \Sigma_\varepsilon$), and there exists a sequence of states $r_0, \dots, r_n$ (where each $r_i \in Q$) and a sequence of strings $s_0, \dots, s_n$ (where each $s_i \in \Gamma^*$), such that
44+
45+
1. $r_0 = q_0$ and $s_0 = \varepsilon$,
46+
2. For every $0 \leq i < n$, $(r_{i+1}, \beta) \in \delta(r_i, a_{i+1}, \alpha)$, where $s_i = \alpha t$ and $s_{i+1} = \beta t$ for some $\alpha, \beta \in \Gamma_\varepsilon$ and $t \in \Gamma^*$.
47+
3. $r_n \in F$.
48+
49+
Otherwise, we say that the PDA $M$ *rejects* the string $w$.
50+
51+
$L$ is the language of PDA $M$, denoted as $\mathcal{L}(M) = L$, if and only if $L = \{w\ |\ w \text{ is a string accepted by } M\}$.
52+
We say that the PDA $M$ *recognizes* the language $L$.
53+
54+
**Theorem 3.** A language $L$ is context free if and only if there exists a PDA $M$ that recognizes $L$.
55+
56+
**Theorem 4.** Every regular language is context free.
57+
58+
**Theorem 5. (Pumping lemma)** If $L$ is a context-free language, then there is a number $p$ (called the *pumping length*) such that if $w \in L$ and $|w| \geq p$, then $w$ may be written as $w = uvxyz$, under the following conditions:
59+
60+
1. For every $i \geq 0$, $uv^ixy^iz \in L$,
61+
2. $|vy| > 0$,
62+
3. $|vxy| \leq p$.

0 commit comments

Comments
 (0)