The Pumping Lemma

Proving Languages Are NOT Regular (or Not Context-Free)

"Can I prove this language IS regular?" --> Build a DFA/NFA/regex for it. "Can I prove this language is NOT regular?" --> Use the PUMPING LEMMA.

CS305 - Formal Language Theory

Use arrow keys or buttons to navigate

1 / 19

The Big Picture

The pumping lemma is a tool for proving NEGATIVE results

What it tells you

  • A language is NOT regular
  • A language is NOT context-free

Key Idea

Every regular language has a certain "pumping" property. If a language lacks this property, it cannot be regular.

What it does NOT tell you

  • It does not prove a language IS regular
  • Passing the pumping lemma does not mean "regular"

Warning

The pumping lemma is a necessary condition for regularity, NOT a sufficient one. Think of it like a one-way test.

Proof by contradiction structure: +-------------------------------------------------+ | 1. Assume L is regular (for contradiction) | | 2. Then pumping lemma applies to L | | 3. Find a string that CANNOT be pumped | | 4. Contradiction! L is NOT regular. | +-------------------------------------------------+
2 / 19

Intuition: Why Pumping Works

The Pigeonhole Principle meets finite automata

A DFA has a finite number of states. Say it has p states.

If it reads a string of length ≥ p, it visits at least p + 1 states (including the start).

By the pigeonhole principle, some state must be visited twice. That means there's a loop!

Analogy

If you have 5 pigeonholes and 6 pigeons, at least one hole has 2 pigeons. If you have p states and p+1 visits, at least one state is visited twice.

q0 q1 q2 q3 q4 q5 a a b a b b This is the pump loop!
3 / 19

The Pumping Lemma for Regular Languages

The formal statement you need to memorize

Pumping Lemma (Regular Languages)

If L is a regular language, then there exists a number p (the pumping length) such that for every string s ∈ L with |s| ≥ p, s can be written as s = xyz satisfying:

1.   |y| > 0             (y is not empty)
2.   |xy| ≤ p           (loop is in the first p characters)
3.   xyiz ∈ L         for all i ≥ 0    (pump y any number of times)
The string s broken into x, y, z: |<------------ s = xyz ------------>| |<--- x --->|<-- y -->|<--- z ----->| | | LOOP | | |<--- xy ---->| | ≤ p chars | Pumping: i=0: x z (delete the loop) i=1: x y z (original string) i=2: x yy z (traverse loop twice) i=3: x yyy z (traverse loop three times) ...
4 / 19

The Pumping Game: Prove {anbn} is NOT Regular

Play the adversarial game -- you are the prover!

1 Adversary picks pumping length p

2 You pick s ∈ L with |s| ≥ p. Suggestion: s = apbp = ""

3 Adversary splits s = xyz (|y| > 0, |xy| ≤ p):

4 You pick i to pump:

Critical Point

Your argument must work for ALL possible values of p and ALL valid splits xyz. You only get to choose s and i.

5 / 19

The Proof Game as a Flowchart

Follow this template for every pumping lemma proof

START: "I want to prove L is not regular." | v +-----------------------------------------------+ | Step 0: ASSUME (for contradiction) L is | | regular. Then the pumping lemma holds. | +-----------------------------------------------+ | v +-----------------------------------------------+ | Step 1: Let p be the pumping length | | (given by the lemma -- we don't pick | | its value, just call it p). | +-----------------------------------------------+ | v +-----------------------------------------------+ | Step 2: CHOOSE a specific string s in L | | with |s| >= p. | | (This is YOUR strategic choice!) | +-----------------------------------------------+ | v +-----------------------------------------------+ | Step 3: CONSIDER ANY split s = xyz | | where |y| > 0 and |xy| <= p. | | (You must handle ALL valid splits!) | +-----------------------------------------------+ | v +-----------------------------------------------+ | Step 4: FIND an i >= 0 such that | | xy^i z is NOT in L. | | (Usually i = 0 or i = 2 works.) | +-----------------------------------------------+ | v +-----------------------------------------------+ | Step 5: CONTRADICTION with the pumping lemma. | | Therefore L is NOT regular. QED | +-----------------------------------------------+

Pro Tip: Choosing s

Pick s so that any way the adversary splits the first p characters forces y to land in an "inconvenient" region. Strings like apbp work well because the first p characters are all a's, so y must be all a's -- pumping it breaks the a/b balance.

6 / 19

Explore: Pumping apbp

Interactively pick p, choose a split, and pump to see the contradiction

p =
String s = apbp:


Pump i =

Why this always works

Since |xy| ≤ p and the first p characters are all a's, y must consist entirely of a's. Pumping y (with i ≠ 1) changes the number of a's but not b's, so the result cannot be in {anbn}.

7 / 19

Example 2: {ww | w ∈ {0,1}*}

Prove this language is not regular

0 Assume L = {ww} is regular.

1 Let p be the pumping length.

2 Choose s = 0p1 0p1.

Here w = 0p1, so s = ww ∈ L. And |s| = 2p + 2 ≥ p.

3 Consider any split s = xyz with |y| > 0, |xy| ≤ p.

Since |xy| ≤ p and the first p chars are all 0's, y = 0k for some k ≥ 1.

4 Choose i = 2. Then:

xy2z = 0p+k1 0p1

Total length = 2p + 2 + k (odd+even matters less; structure matters).

For this to be ww, the two halves must match. The first half would be 0(p+k)/2+1... but the 1's are no longer aligned symmetrically. The first half has more 0's before its 1 than the second half does. So xy2z ∉ L.

5 Contradiction! L is not regular. □

s = 0^p 1 0^p 1 (this is ww where w = 0^p 1) 0 0 0 ... 0 0 1 0 0 0 ... 0 0 1 |<-- p -->| |<-- p -->| |<--- w --->| |<--- w --->| Since |xy| <= p: |< x >|< y >|<--------- z --------->| 0 .. 0 0..0 0 .. 0 1 0 0 .. 0 0 1 || all 0's After pumping (i=2): 0 0 .. 0 0 0..0 0..0 1 0 0 .. 0 0 1 |<--- p+k 0's --->| |<-- p -->| For this to be ww, split in half: first half = 0^((p+k+1)) ... contains the "1" second half = 0^p 1 ... or similar The two halves CANNOT match because there are p+k zeros before the first "1" but only p zeros before the second "1".
8 / 19

Example 3: {1n2 | n ≥ 0}

Strings of 1s whose length is a perfect square -- proof using number theory

0 Assume L = {1n2} is regular.

1 Let p be the pumping length.

2 Choose s = 1p2.

s ∈ L since p2 is a perfect square. |s| = p2 ≥ p.

3 Consider any split s = xyz with |y| = k where 1 ≤ k ≤ p (since |y| > 0 and |xy| ≤ p).

4 Choose i = 2. Then:

|xy2z| = p2 + k

We need to show p2 + k is NOT a perfect square.

Since 1 ≤ k ≤ p:

p2 < p2 + k ≤ p2 + p < p2 + 2p + 1 = (p+1)2

So p2 + k is strictly between two consecutive perfect squares. Therefore it is NOT a perfect square!

5 Contradiction! L is not regular. □

The key number theory insight: Perfect squares: 0, 1, 4, 9, 16, 25, 36, ... Gaps between consecutive squares GROW: n: 0 1 2 3 4 5 6 n^2: 0 1 4 9 16 25 36 gap: 1 3 5 7 9 11 Gap between p^2 and (p+1)^2: (p+1)^2 - p^2 = 2p + 1 When we pump, we add k where 1 <= k <= p: p^2 + k Since k <= p < 2p + 1: p^2 < p^2 + k < (p+1)^2 |-------|-----------|------------| p^2 p^2+1 p^2+p (p+1)^2 |<--- k --->| falls in the GAP! NOT a perfect square!

Analogy

Think of perfect squares as "stepping stones" that get further and further apart. Pumping adds a small amount (at most p), but the gap to the next stone is 2p+1. You land in the water every time!

9 / 19

Common Mistakes

These errors cost points on every exam. Do not make them!

Mistake 1: Picking a specific p

"Let p = 5..." -- NO! You don't get to choose p. The adversary chooses it. Your proof must work for any p.

Mistake 2: Picking a specific split

"Let x = a2, y = a3, z = ..." -- NO! The adversary picks the split. You must argue about all valid splits.

Mistake 3: Forgetting |xy| ≤ p

This condition restricts WHERE y can be. It's often the most useful condition! Don't ignore it -- it constrains the adversary's choices.

Mistake 4: Wrong quantifier order

"For some split xyz... for all i..." -- BACKWARDS! The adversary picks the split, and then you pick i.

Mistake 5: Using pumping to prove regularity

"The language can be pumped, so it's regular." -- WRONG! The pumping lemma is one-directional. It can only prove non-regularity.

The Correct Quantifier Order

FOR ALL p (adversary picks) THERE EXISTS s (you pick) FOR ALL xyz splits (adversary picks) THERE EXISTS i (you pick) xy^i z NOT in L
10 / 19

When the Pumping Lemma Fails

A necessary condition is not the same as a sufficient condition

Surprising Fact

There exist languages that are NOT regular but still satisfy the pumping lemma!

Consider the language:

L = {aibjck | i, j, k ≥ 0 and if i = 1 then j = k}

This language is NOT regular (it contains {abncn}), but you cannot prove this using the pumping lemma alone -- it satisfies the pumping property!

Analogy

"All dogs are mammals" does NOT mean "all mammals are dogs." Similarly, "all regular languages are pumpable" does NOT mean "all pumpable languages are regular."

The logical relationship: +--------------------------------------+ | All languages | | | | +--------------------------------+ | | | Pumpable languages | | | | | | | | +--------------------------+ | | | | | Regular languages | | | | | | | | | | | +--------------------------+ | | | | ^ | | | | These are pumpable AND | | | | regular. | | | | | | | | * Non-regular but pumpable | | | | languages live HERE | | | +--------------------------------+ | | | | * Non-pumpable languages | | are definitely NOT regular | +--------------------------------------+ Pumping lemma proves: NOT pumpable --> NOT regular It CANNOT prove: Pumpable --> Regular

When the pumping lemma is insufficient, use:

  • Myhill-Nerode theorem (necessary AND sufficient)
  • Closure properties (intersect with a regular language, then pump)
11 / 19

The Pumping Lemma for CFLs

Same idea, but now we pump TWO substrings

Pumping Lemma (Context-Free Languages)

If L is context-free, then there exists p such that for every s ∈ L with |s| ≥ p, s can be written as s = uvxyz satisfying:

1.   |vy| > 0           (v and y are not BOTH empty)
2.   |vxy| ≤ p         (the "middle chunk" is bounded)
3.   uvixyiz ∈ L     for all i ≥ 0    (pump v and y together)
The string s broken into u, v, x, y, z: |<------------------ s = uvxyz ------------------>| |<- u ->|<- v ->|<- x ->|<- y ->|<----- z ------>| | | PUMP | | PUMP | | | |<----- vxy ---->| | | <= p chars | Pumping (v and y are pumped TOGETHER, same number of copies): i=0: u x z (delete both v and y) i=1: u v x y z (original) i=2: u vv x yy z (double both) i=3: u vvv x yyy z (triple both)

Key Difference from Regular Pumping

Regular: pump ONE substring (y). CFL: pump TWO substrings (v and y) in sync. This is because CFGs can generate matching pairs (like matching parentheses), but pumping both sides preserves the pairing.

12 / 19

Intuition: Why CFL Pumping Works

The parse tree argument -- a repeated variable means a "nestable" pattern

A context-free grammar has a finite number of variables (nonterminals).

If a string s is long enough, its parse tree must be tall. A tall tree means a long root-to-leaf path.

By the pigeonhole principle, some variable A must appear twice on this path.

The subtree rooted at the upper A generates vxy. The subtree rooted at the lower A generates just x.

We can replace the lower A's subtree with the upper A's subtree (or vice versa), giving us the pumping effect!

Parse tree with repeated variable A: S /|\ / | \ u . z <-- generates u...z | A <--------- UPPER occurrence of A /|\ / | \ v . y <-- generates v...y | A <--------- LOWER occurrence of A | x <-- generates x String: u v x y z PUMP UP (replace lower A with upper A's tree): S /|\ u . z | A /|\ v . y | A <-- plug in upper A again! /|\ v . y | A | x Result: u v v x y y z = uv^2 xy^2 z PUMP DOWN (replace upper A with lower A's tree): S /|\ u . z | A | x Result: u x z = uv^0 xy^0 z
13 / 19

Example: {anbncn | n ≥ 0}

Prove this language is not context-free

0 Assume L = {anbncn} is context-free.

1 Let p be the pumping length.

2 Choose s = apbpcp.

s ∈ L and |s| = 3p ≥ p.

3 Consider any split s = uvxyz with |vy| > 0 and |vxy| ≤ p.

Since |vxy| ≤ p, the substring vxy can span at most two of the three symbol types (a, b, c). It cannot touch all three.

4 Choose i = 2. Then uv2xy2z has more of at most two symbols but not the third. The counts of a's, b's, c's are no longer all equal.

So uv2xy2z ∉ L.

5 Contradiction! L is not context-free. □

s = a^p b^p c^p a a...a a b b...b b c c...c c |<- p ->| |<- p ->| |<- p ->| Since |vxy| <= p, vxy fits in a window of width p. Where can this window be? Case 1: vxy is all a's and b's (no c's) a a [a..a b..b] b c c...c c |<= p chars>| Pumping increases a's or b's (or both), but NOT c's. Counts become unequal! Case 2: vxy is all b's and c's (no a's) a a...a a b [b..b c..c] c |<= p chars>| Pumping increases b's or c's (or both), but NOT a's. Counts become unequal! Case 3: vxy is all a's (or all b's/c's) Same argument -- only one count changes. In ALL cases, pumping breaks the a-count = b-count = c-count requirement!

Key Insight

The constraint |vxy| ≤ p is what makes this work. It prevents the "pump zone" from touching all three symbol groups simultaneously.

14 / 19

Example: {ww | w ∈ {0,1}*}

Not just non-regular -- also NOT context-free!

0 Assume L = {ww} is context-free.

1 Let p be the pumping length.

2 Choose s = 0p1p0p1p.

Here w = 0p1p, so s = ww ∈ L, and |s| = 4p ≥ p.

3 Consider any split s = uvxyz with |vy| > 0 and |vxy| ≤ p.

Since |vxy| ≤ p, it sits within a window of at most p characters. In the string 0p1p0p1p, this window straddles at most two of the four blocks.

4 Choose i = 2. Pumping changes the length of at most two of the four blocks, destroying the ww structure.

The first half and second half can no longer match.

5 Contradiction! L is not context-free. □

s = 0^p 1^p 0^p 1^p 0...0 1...1 0...0 1...1 |blk1| |blk2| |blk3| |blk4| |< w = 0^p 1^p >|< w = 0^p 1^p >| |vxy| <= p, so the window sits in one of these regions: Region A: within block 1 (all 0s) Region B: straddling blocks 1-2 (0s and 1s) Region C: within block 2 (all 1s) Region D: straddling blocks 2-3 (1s and 0s) Region E: within block 3 (all 0s) Region F: straddling blocks 3-4 (0s and 1s) Region G: within block 4 (all 1s) In every case, pumping affects at most 2 adjacent blocks. The other 2 blocks stay the same. Example - Region D (straddles 1^p and 0^p): Pumping gives: 0^p 1^(p+a) 0^(p+b) 1^p First half: 0^p 1^((p+a)/2)... Second half: ...doesn't match!

Note

Contrast with {wwR} (palindromes), which IS context-free. ww requires "copying" which CFGs cannot do; wwR requires "mirroring" which CFGs handle via nesting.

15 / 19

Comparing the Two Pumping Lemmas

Side-by-side: Regular vs. Context-Free

Feature Regular Languages PL Context-Free Languages PL
Split s = xyz (3 parts) s = uvxyz (5 parts)
Pumped parts y alone v and y together (in sync)
Non-empty |y| > 0 |vy| > 0
Length bound |xy| ≤ p |vxy| ≤ p
Pumped string xyiz ∈ L uvixyiz ∈ L
Source of loop Repeated state in DFA Repeated variable in parse tree
Proves Language is NOT regular Language is NOT context-free
Limitation Necessary, not sufficient Necessary, not sufficient
Regular PL: |-- x --|-- y --|--- z ---| PUMP Pumped: |-- x --|yyyyyy|--- z ---|
CFL PL: |- u -|- v -|- x -|- y -|- z -| PUMP PUMP Pumped: |- u -|vvvvv|- x -|yyyyy|- z -|

How to Decide Which to Use

Trying to prove a language is not regular? Use the regular pumping lemma first (simpler). Trying to prove it's not context-free? You must use the CFL pumping lemma. If you already know a language is not regular, the CFL lemma can tell you if it's also not context-free.

16 / 19

Beyond Pumping

Other techniques for proving non-regularity and non-context-freeness

Myhill-Nerode Theorem

A language L is regular if and only if it has a finite number of equivalence classes under the indistinguishability relation.

Advantage over Pumping

Myhill-Nerode is necessary AND sufficient. If the pumping lemma can't prove non-regularity, Myhill-Nerode still can.

Closure Properties

Regular and context-free languages are closed under certain operations. Strategy:

  • Assume L is regular (or CF)
  • Intersect L with a known regular language
  • Show the result is a known non-regular (or non-CF) language
  • Contradiction with closure!

Ogden's Lemma

A strengthened version of the CFL pumping lemma where you can "mark" certain positions and the lemma guarantees the pump includes marked positions.

Example: Closure property proof Prove L = {0^n 1^n 2^n} is not CF. Alternative to pumping: 1. Assume L is CF. 2. CF languages are closed under intersection with regular languages. 3. Let R = 0* 1* 2* (regular). 4. L intersect R = L itself. 5. But we can also use this trick with harder languages where direct pumping is tricky. Closure properties let you REDUCE a hard problem to an easier one!

Analogy

The pumping lemma is a screwdriver -- great for most screws. Myhill-Nerode is a power drill -- works on everything but takes more setup. Closure properties are like using a friend's tool -- reduce the problem to one they already solved.

17 / 19

Summary & Cheat Sheet

Your quick reference for pumping lemma proofs

Proof Template (Regular)

1. Assume L is regular. 2. Let p = pumping length. 3. Choose s in L, |s| >= p. (TIP: make first p chars uniform) 4. Let s = xyz, |y| > 0, |xy| <= p. 5. Show xy^i z not in L for some i. (TIP: try i = 0 or i = 2 first) 6. Contradiction. L not regular. QED.

Proof Template (CFL)

1. Assume L is context-free. 2. Let p = pumping length. 3. Choose s in L, |s| >= p. 4. Let s = uvxyz, |vy| > 0, |vxy| <= p. 5. Show uv^i xy^i z not in L for some i. (TIP: |vxy| <= p limits the window) 6. Contradiction. L not CF. QED.

Golden Rule of String Choice

Pick s so that the constraint |xy| ≤ p (or |vxy| ≤ p) forces the pump zone into a region that will break the language's defining property when pumped.

Quick Reference Table

Language Regular? CF?
anbn No Yes
ww No No
wwR No Yes
anbncn No No
1n2 No Yes
balanced parens No Yes

Remember!

The Quantifier Chant

For all p, there exists s, for all xyz, there exists i.

Adversary, You, Adversary, You. A-Y-A-Y.

THEY pick p --> YOU pick s THEY split --> YOU pick i If you always win --> NOT regular!
18 / 19

Challenge Quiz

Test your pumping lemma knowledge -- 3 random questions

19 / 19