Each level is strictly more powerful than the one inside it. Today we jump from Type 3 (regular) to Type 2 (context-free). The new superpower: a stack (via pushdown automata) or equivalently, recursive production rules.
2 / 21
Motivation: The Limits of Regular Languages
What DFAs/NFAs CAN'T do
The Pumping Lemma showed us these are NOT regular:
{ anbn | n ≥ 0 } -- equal counts of a's then b's
Balanced parentheses -- (()(())), but not (()(
Palindromes -- strings that read the same forwards and backwards
Nested structures -- HTML tags, math expressions
Why DFAs fail
Finite automata have finite memory (just the current state). They can't "count" unboundedly or "match" things seen earlier.
The common pattern
All these languages need NESTING:
a^n b^n :
aaaa....bbbb
| |
+--match!--+
Balanced parens:
( ( ) ( ( ) ) )
| |_| | |_| | |
| |_____|_|
|_____________|
Palindromes:
a b c b a
| | |
+---+---+
match!
Analogy
A DFA is like a person counting on their fingers -- they run out. A CFG is like a person with a notepad: they can write down what to remember and check it later.
3 / 21
What is a Grammar?
Intuition: Grammars as Recipes
A grammar is a set of rewriting rules that tell you how to build strings in a language, step by step.
Recipe for a SENTENCE:
SENTENCE --> SUBJECT VERB OBJECT
SUBJECT --> "the dog" | "a cat"
VERB --> "chased" | "ate"
OBJECT --> "the ball" | "a fish"
One derivation:
SENTENCE
=> SUBJECT VERB OBJECT
=> "the dog" VERB OBJECT
=> "the dog" "chased" OBJECT
=> "the dog" "chased" "the ball"
Key Terminology
Variables (non-terminals): Placeholders that get replaced. Written in UPPERCASE. Examples: S, A, B, SENTENCE
Terminals: The actual characters in the final string. Written in lowercase. Examples: a, b, 0, 1, +, (, )
Productions: The rewriting rules. "A --> w" means "A can be replaced by w"
Start symbol: Where every derivation begins (usually S)
Analogy
Think of variables as categories in a recipe book. "DESSERT" isn't something you eat -- it's a category that expands into "chocolate cake" or "apple pie." Terminals are the actual food you eat!
4 / 21
Formal Definition of a CFG
Definition
A context-free grammar is a 4-tuple G = (V, T, P, S) where:
Component
Description
V
A finite set of variables (non-terminals)
T
A finite set of terminals (the alphabet)
P
A finite set of productions of the form A → w, where A ∈ V and w ∈ (V ∪ T)*
S
The start symbol, S ∈ V
Why "Context-Free"?
Every production has a single variable on the left side: A → w. The variable A can be replaced regardless of context (what's around it). In context-sensitive grammars, the surrounding symbols matter.
Example: { anbn | n ≥ 0 }
G = (V, T, P, S)
V = { S }
T = { a, b }
P = { S → aSb,
S → ε }
Start = S
Let's trace how this generates aabb:
S ==> aSb (used S → aSb)
==> aaSbb (used S → aSb)
==> aaεbb (used S → ε)
= aabb
The recursion in S → aSb "wraps" matching a's and b's around each other. That's the power regular languages don't have!
5 / 21
Derivations: Leftmost vs. Rightmost
A derivation is a sequence of rule applications that transforms S into a string of terminals. Pick a mode and step through the derivation of "aabb".
Grammar: S → AB, A → aA | a, B → bB | b
|
Current: S(select a mode to begin)
Key Idea
For an unambiguous grammar, different derivation orders produce the same parse tree. The derivation order is just the order you visit the tree -- the tree itself is what matters.
6 / 21
Classic Example: Simple Arithmetic Grammar
Grammar for Arithmetic:
E → E + T | T
T → T * F | F
F → ( E ) | id
E = Expression, T = Term, F = Factor
This grammar encodes precedence:
* binds tighter than + (multiplication first)
Both are left-associative
Parentheses override everything
Analogy
Think of E, T, F as layers of "binding strength." F is the tightest (atoms and parens). T groups multiplications. E groups additions. Like layers of an onion -- inner layers bind first.
Derive: id + id * id
Leftmost derivation:
E
==> E + T (E → E + T)
==> T + T (E → T)
==> F + T (T → F)
==> id + T (F → id)
==> id + T * F (T → T * F)
==> id + F * F (T → F)
==> id + id * F (F → id)
==> id + id * id (F → id)
Notice!
The grammar forces "id * id" to be grouped under T, while "+" connects at the E level. This gives multiplication higher precedence than addition.
7 / 21
Parse Trees
A parse tree is a visual representation of a derivation. It shows the structure of how a string is generated.
Rules for Parse Trees
Root = start symbol S
Internal nodes = variables (non-terminals)
Leaves = terminals (read left-to-right = the string)
Each internal node + its children = one production rule
Analogy: Family Tree
A parse tree is like a family tree for strings. The start symbol S is the ancestor. Each production rule is a "parent has these children." The terminals at the bottom are the youngest generation -- the actual string!
Example: S → aSb | ε
Parse tree for aabb:
S
/|\
/ | \
a S b
/|\
/ | \
a S b
|
ε
Reading the leaves left to right: a a ε b b = aabb
Key Idea
The tree structure shows the nesting. The outer S wraps a...b around the inner S. This is the recursive structure that DFAs cannot capture.
8 / 21
Parse Tree Builder: id + id * id
Grammar: E → E+T | T, T → T*F | F, F → (E) | id. Watch the tree grow step by step.
Step 0 / 8
Why This Tree is Correct
The * operation sits deeper in the tree (under T), so it is evaluated first. The + is at the top (under E), so it is evaluated last. This gives us: id + (id * id).
9 / 21
Ambiguity Explorer: id + id * id
The ambiguous grammar E → E+E | E*E | (E) | id gives TWO parse trees!
Tree 1: (id + id) * id
E
/ | \
E * E
/|\ |
E + E id
| |
id id
Click "Show Tree 1" to evaluate
Tree 2: id + (id * id)
E
/ | \
E + E
| /|\
id E * E
| |
id id
Click "Show Tree 2" to evaluate
The "Dangling Else" Problem
Another famous ambiguity: if E then if E then S else S -- does "else" belong to the outer or inner "if"? Most languages resolve this by matching "else" to the nearest unmatched "if."
10 / 21
Why Ambiguity Matters
Different parse trees mean different meanings. The tree defines the evaluation order!
Parsing: 2 + 3 * 4
Tree A: (2 + 3) * 4 Tree B: 2 + (3 * 4)
E E
/|\ /|\
E * E E + E
/|\ | | /|\
E + E 4 2 E * E
| | | |
2 3 3 4
= 5 * 4 = 2 + 12
= 20 WRONG! = 14 CORRECT!
Ambiguity = Multiple Interpretations
Compilers need exactly one parse tree per program
If two trees exist, the compiler might pick the wrong one
Different trees → different compiled code → different results
Key Idea
Ambiguity is a property of the grammar, not the language. A language might have an ambiguous grammar but also an unambiguous one. The fix: rewrite the grammar to enforce the intended structure.
Caution
You cannot "test" for ambiguity in general -- it is undecidable whether an arbitrary CFG is ambiguous!
11 / 21
Eliminating Ambiguity: See the Difference
Pick an expression to see how each grammar parses it (using id=2, id=3, id=4 left to right):
↑ Click an expression above to compare parse trees
A restricted but equally powerful form of context-free grammars.
CNF Rules
Every production must be one of exactly two forms:
A → BC (two variables, no terminals)
A → a (exactly one terminal)
Plus optionally: S → ε (only for the start symbol, only if ε is in the language)
CNF: NOT CNF:
S → AB S → AaB
A → BC A → BCD
A → a A → B
B → b A → aB
Why CNF Matters
CYK Algorithm: A parsing algorithm that works in O(n³) time -- but it requires the grammar to be in CNF
Proofs: Many theoretical results are easier to prove when the grammar has a restricted form
Binary trees: CNF guarantees every parse tree is a binary tree (each internal node has exactly 2 children)
Analogy
CNF is like putting equations in "standard form" in algebra. It doesn't change what the equation describes -- it just reorganizes it into a form that's easier to work with systematically.
13 / 21
Converting to CNF: The 4-Step Recipe
Step 1 Step 2 Step 3 Step 4
Remove Remove Break long Replace lone
ε-prods unit prods productions terminals
A → ε A → B A → BCD A → aB
| | | |
v v v v
Propagate Substitute A → BX A → T_a B
nullable chains X → CD T_a → a
Step 1: Remove ε-Productions
Find all nullable variables (those that can derive ε)
For each production with a nullable var on the right, add versions with and without it
Delete all A → ε rules (except possibly S → ε)
Step 2: Remove Unit Productions
A unit production is A → B (single variable)
If A → B and B → w, replace with A → w
Repeat until no unit productions remain
Step 3: Fix Long Productions
If A → B1B2...Bk where k > 2
Break into pairs using new variables:
A → B1C1, C1 → B2C2, ..., Ck-2 → Bk-1Bk
Step 4: Fix Terminal Mixing
If a production mixes terminals and variables like A → aB
Replace each terminal a with a new variable Ta
Add Ta → a
Order matters!
Do the steps in order (1 → 2 → 3 → 4). Each step can create situations the next step fixes.
14 / 21
CNF Conversion: Step-Through
Walk through the CNF conversion of: S → ASB | ε, A → aAS | a, B → SbS | A | bb
Step 0 / 9
⇒
Result
Every production ends up as A → BC or A → a. The grammar is in CNF, ready for the CYK parsing algorithm!
15 / 21
Properties of Context-Free Languages
Closure Properties
Operation
Closed?
Union
YES
Concatenation
YES
Kleene Star
YES
Intersection
NO
Complement
NO
Intersection with Regular
YES
How to prove closure under union
Given CFGs G1 (start S1) and G2 (start S2), create a new grammar with start S and rule: S → S1 | S2. Done!
NOT Closed Under Intersection
L1 = { a^n b^n c^m | n,m ≥ 0 }
(match a's and b's) -- CFL!
L2 = { a^m b^n c^n | n,m ≥ 0 }
(match b's and c's) -- CFL!
L1 ∩ L2 = { a^n b^n c^n | n ≥ 0 }
This is NOT context-free!
(Provable by CFL pumping lemma)
Consequence
Since CFLs are closed under union but NOT under complement, and L1 ∩ L2 = complement(complement(L1) ∪ complement(L2)), closure under complement would imply closure under intersection. So both must fail!
16 / 21
Inherently Ambiguous Languages
Some context-free languages are so "tangled" that every possible grammar for them is ambiguous.
Definition
A CFL L is inherently ambiguous if every CFG that generates L is ambiguous. There is no way to "fix" the grammar -- the ambiguity is built into the language itself.
Classic Example
L = { a^i b^j c^k |
i=j OR j=k }
In other words: either the
a's match the b's, OR the
b's match the c's (or both).
Why it's inherently ambiguous
Consider: a^n b^n c^n
This string is in L because:
- i=j=n (a's match b's) YES
- j=k=n (b's match c's) YES
Any grammar must handle BOTH
reasons separately, creating
two parse trees for a^n b^n c^n.
Analogy
Imagine a language with two overlapping "reasons" a string can be included. When both reasons apply simultaneously, any grammar must use one path or the other -- giving two different trees. It's like a Venn diagram overlap that can't be un-overlapped.
Important Distinction
An ambiguous grammar might be fixable (rewrite it). An inherently ambiguous language cannot be fixed -- no grammar for it is unambiguous.
17 / 21
CFG vs. Regular: Head-to-Head Comparison
Feature
Regular Languages
Context-Free Languages
Machine model
DFA / NFA
Pushdown Automaton (PDA)
Memory
Finite (states only)
Infinite stack
Described by
Regular expressions
Context-free grammars
Closure
∪, ∩, *, complement, concat
∪, *, concat (NOT ∩, complement)
Parsing
O(n) -- linear scan
O(n³) CYK; O(n) for some subclasses
Pumping lemma
xykz (one pump)
uvkxykz (two pumps)
Can do
a*b*, (ab)*, keyword matching
anbn, balanced parens, palindromes
Can't do
anbn, matching, counting
anbncn, cross-serial dependencies
Relationship
Every regular language is context-free, but NOT vice versa
Key Idea
Regular languages are a proper subset of CFLs. A DFA is just a PDA that never uses its stack. So anything a DFA can do, a PDA can also do -- plus more.
18 / 21
Real-World Applications of CFGs
Compilers & Programming Languages
Source code: if (x > 0) { y = x + 1; }
STATEMENT
/ | \
IF COND BLOCK
| / \ |
if x > 0 ASSIGN
/ | \
y = EXPR
/ | \
x + 1
Every programming language has a CFG (the "syntax") that defines what valid programs look like. Compilers use parsers (LL, LR, LALR) to build parse trees from source code.
XML / HTML
Nested tags are inherently context-free:
<div><p>Hello <b>world</b></p></div>
Natural Language Processing
SENTENCE
/ \
NP VP
| / \
Det V NP
| | / \
"the" | Det N
"ate" | |
"a" "fish"
Linguists use CFGs to model the structure of human language sentences.
Other Applications
JSON / YAML parsing -- nested data formats
Mathematical expressions -- calculators, CAS
DNA/RNA structure -- folding patterns modeled by stochastic CFGs
Protocol specification -- BNF grammars in RFCs
Analogy
CFGs are the blueprints of structured languages. Wherever you see nesting, hierarchy, or recursive structure, there's likely a CFG underneath.
19 / 21
Summary & Cheat Sheet
The Big Ideas
CFG = (V, T, P, S)
=====================
V = variables (non-terminals)
T = terminals (alphabet)
P = productions (rewrite rules)
S = start symbol
"Context-free" means:
Left side of every rule is a
SINGLE variable. No context needed.
Power: Regular ⊂ Context-Free
Machine: DFA ⊂ PDA (+ stack)
What to Remember
CFGs can express nesting and matching
Parse trees show derivation structure
Ambiguity = multiple parse trees for one string
CNF: A → BC | a (useful for CYK parsing)
Closed under ∪, concat, * but NOT ∩ or complement
Common Grammar Patterns
a^n b^n: S → aSb | ε
palindromes: S → aSa | bSb | a | b | ε
balanced parens: S → SS | (S) | ε
arithmetic: E → E+T | T
T → T*F | F
F → (E) | id
Pushdown Automata (PDA) -- the machine model equivalent of CFGs. Think of it as an NFA with a stack. We will prove that PDAs and CFGs recognize exactly the same class of languages.
20 / 21
Challenge Quiz: Test Your CFG Knowledge
Answer 3 randomly selected questions to test your understanding of context-free grammars.