CS305 — Formal Language Theory
Use ← → arrows to navigate
Regular Expressions, DFAs, and NFAs all describe the exact same class of languages.
Any language you can describe with an RE, you can build a DFA for — and vice versa. The proofs go around the triangle:
It's like having three different maps of the same city — street map, satellite view, and transit map. Different formats, same territory.
A regular expression is a compact, algebraic notation for describing a set of strings (a language).
Regex : languages :: arithmetic : numbers
Just as 3 + 5 × 2 compactly describes 13, (0|1)*01 compactly describes "all binary strings ending in 01."
| RE | Language L(R) |
|---|---|
0 | { "0" } |
0|1 | { "0", "1" } |
01 | { "01" } |
0* | { ε, "0", "00", ... } |
(0|1)* | All binary strings |
(0|1)*01 | Ends in 01 |
Try it — enter a string to test against (0|1)*01:
Every RE is built from just three operations. Click each to see the Canvas diagram:
Click an operation above to see how it works.
Union, Concatenation, and Kleene Star — these three are all you need to build every regular expression. Like LEGO bricks: simple pieces, infinite combinations.
A regular expression over alphabet Σ is defined inductively. Click to explore:
Select a regular expression above to see its formal definition, language, and Canvas visualization.
∅ matches nothing at all (empty language — 0 elements). ε matches the empty string "" (1 element). They are NOT the same!
Just like arithmetic has PEMDAS, regular expressions have a precedence order:
| Priority | Regex | Like Arithmetic |
|---|---|---|
| Highest | * Star | Exponent (^) |
| Middle | . Concat | Multiply (×) |
| Lowest | | Union | Add (+) |
Click a regex above to compare correct vs incorrect parsing.
Just as 2+3×4 = 14 (not 20), a|bc = {a, bc} (not {ac, bc}).
Select a regex to see its parse tree built step by step on Canvas:
Select a regex to explore its parse tree.
1. Find the outermost (lowest precedence) operation
2. Break into sub-expressions
3. Describe each part in English
4. Combine
Given a language description, write the RE. Click each example to reveal the answer:
Think: 3 symbols, each is 0 or 1
(0|1)(0|1)(0|1)
Check: 000 ✓ 010 ✓ 111 ✓ 0 ✗ 0011 ✗
Think: something, 010, something
(0|1)*010(0|1)*
Check: 010 ✓ 10101 ✓ 111 ✗
Think: pairs of symbols, repeated
((a|b)(a|b))*
Check: "" ✓ "ab" ✓ "a" ✗ "abba" ✓
Think: after each 1, must see 0 or end
(0|10)*(1|ε)
Check: "" ✓ "101" ✓ "11" ✗ "010" ✓
Think in building blocks: (1) What must appear? → concat. (2) What can repeat? → star. (3) What are the choices? → union. Combine these three to express any regular language.
We can systematically convert any RE into an equivalent ε-NFA using Thompson's Construction (1968).
Each base case (symbol, ε) is a basic LEGO brick. Union, concat, and star are ways to snap bricks together. Build bottom-up until you have one NFA for the whole RE.
Click each rule to see the NFA fragment it produces:
Click a rule above to see its NFA fragment on the Canvas.
Watch the NFA build step by step:
Given the RE a(b|c), how many states will Thompson's Construction produce?
Think about it step by step:
a: 2 statesb: 2 statesc: 2 statesb|c: adds 2 new statesa·(b|c): merges, no new statesTo convert a DFA back to a regular expression, we eliminate states one by one.
For every pair (qi, qj) through q:
New label = (old qi→qj) | (qi→q)(q→q)*(q→qj)
Convert a DFA (binary strings ending in 1) to RE:
A student tried to eliminate state q1 from this DFA but got the wrong RE. Find the mistake:
DFA: s --ε→ q0, q0 --0→ q0, q0 --1→ q1, q1 --1→ q1, q1 --0→ q0, q1 --ε→ f
Student eliminated q0 first and wrote:
s --0*1→ q1 ✓
q1 self-loop: 1 ← BUG?
q1 --ε→ f ✓
What's wrong with the q1 self-loop label?
REs obey useful identities. Click a category to explore:
| Category | Law | Rule |
|---|---|---|
| Union | Commutative | R|S = S|R |
| Associative | (R|S)|T = R|(S|T) | |
| Idempotent | R|R = R | |
| Identity | R|∅ = R | |
| Concat | Associative | (RS)T = R(ST) |
| Identity | Rε = εR = R | |
| Annihilator | R∅ = ∅R = ∅ | |
| Distrib. | Left | R(S|T) = RS|RT |
| Right | (S|T)R = SR|TR | |
| Star | Idempotent | (R*)* = R* |
| Star of ε | ε* = ε | |
| Star of ∅ | ∅* = ε |
Concatenation is NOT commutative. ab ≠ ba. Order matters!
These laws let you simplify complex REs. For example: 0*1(1|00*1)* = (0|1)*1 can be verified algebraically.
The "regex" in Python/Java is more powerful than theoretical REs!
+, ?, {n,m}, [a-z], \d\1 (NOT regular!)| Feature | Theoretical | Still Regular? |
|---|---|---|
R+ | Write as RR* | Yes |
R? | Write as R|ε | Yes |
[a-z] | Write as a|b|...|z | Yes |
R{3,5} | Expand manually | Yes |
\1 backref | Cannot express | NO! |
(?=...) | Cannot express | NO! |
Backreferences make regex matching NP-hard. Patterns like (a+)+ cause catastrophic backtracking — a real security vulnerability.
Useful building blocks for writing regular expressions:
| Language | RE |
|---|---|
| All strings | (0|1)* |
| Starts with 1 | 1(0|1)* |
| Ends in 00 | (0|1)*00 |
| Contains 101 | (0|1)*101(0|1)* |
| Even length | ((0|1)(0|1))* |
| Only 0s | 0* |
| Exactly 3 chars | (0|1)(0|1)(0|1) |
| Language | RE |
|---|---|
| Starts & ends with a | a(a|b)*a | a |
| ≥2 b's | (a|b)*b(a|b)*b(a|b)* |
| No consecutive a's | (b|ab)*(a|ε) |
| Every a followed by b | (b|ab)* |
Σ* = anything
Σ* w Σ* = contains w
w Σ* = starts with w
Σ* w = ends with w
Enter a simple RE and test strings against it:
This tester converts the theoretical RE into a JavaScript regex. It supports: 0, 1, | (union), concatenation, * (star), ε (empty string), and parentheses.
This is a theoretical RE tester — only the three basic operations (union, concat, star) plus base cases. No +, ?, [...], or backreferences.
For each language description, select the correct regular expression:
Precedence: * > concat > |
Base: ∅ (empty lang), ε (empty string), a (symbol)
Operations: R|S (union), RS (concat), R* (star)
R|S = S|R | union commutes |
R|R = R | idempotent |
Rε = R | concat identity |
R∅ = ∅ | annihilator |
(R*)* = R* | star idempotent |
∅* = ε | star of empty |
ab* ≠ (ab)* (star binds tightest)Given the RE (0|1)*00, determine which strings are accepted:
For each string, select Accept or Reject:
This accepts all binary strings ending in 00.
(0|1)* matches any prefix, then 00 requires the string to end with two zeros.
Write the regular expression for each language description. Type your answer and check:
1: 1(0|1)*0
2: a*ba*
3: (1*01*01*)*1* or (1|01*0)*