Regular Expressions

CS305 — Formal Language Theory

Use ← → arrows to navigate

The Big Picture: Three Equivalent Views

Regular Expressions, DFAs, and NFAs all describe the exact same class of languages.

Key Idea

Any language you can describe with an RE, you can build a DFA for — and vice versa. The proofs go around the triangle:

RE → ε-NFA: Thompson's Construction
NFA → DFA: Subset Construction
DFA → RE: State Elimination

Analogy

It's like having three different maps of the same city — street map, satellite view, and transit map. Different formats, same territory.

What Is a Regular Expression?

A regular expression is a compact, algebraic notation for describing a set of strings (a language).

Every RE defines a language L(R)
Built from three simple operations
No memory, no counting — just patterns
Equivalent in power to finite automata

Analogy

Regex : languages :: arithmetic : numbers
Just as 3 + 5 × 2 compactly describes 13, (0|1)*01 compactly describes "all binary strings ending in 01."

Quick Examples

RE	Language L(R)
`0`	{ "0" }
`0\|1`	{ "0", "1" }
`01`	{ "01" }
`0*`	{ ε, "0", "00", ... }
`(0\|1)*`	All binary strings
`(0\|1)*01`	Ends in 01

Try it — enter a string to test against (0|1)*01:

The Three Basic Operations

Every RE is built from just three operations. Click each to see the Canvas diagram:

Click an operation above to see how it works.

That's It!

Union, Concatenation, and Kleene Star — these three are all you need to build every regular expression. Like LEGO bricks: simple pieces, infinite combinations.

Formal Definition (Recursive)

A regular expression over alphabet Σ is defined inductively. Click to explore:

Test string:

Select a regular expression above to see its formal definition, language, and Canvas visualization.

Watch Out: ∅ vs ε

∅ matches nothing at all (empty language — 0 elements). ε matches the empty string "" (1 element). They are NOT the same!

Operator Precedence

Just like arithmetic has PEMDAS, regular expressions have a precedence order:

Priority	Regex	Like Arithmetic
Highest	`*` Star	Exponent (^)
Middle	`.` Concat	Multiply (×)
Lowest	`\|` Union	Add (+)

Click a regex above to compare correct vs incorrect parsing.

Arithmetic Parallel

Just as 2+3×4 = 14 (not 20), a|bc = {a, bc} (not {ac, bc}).

Interactive RE Parse Tree Explorer

Select a regex to see its parse tree built step by step on Canvas:

Select a regex to explore its parse tree.

Reading Strategy

1. Find the outermost (lowest precedence) operation
2. Break into sub-expressions
3. Describe each part in English
4. Combine

Writing Regular Expressions

Given a language description, write the RE. Click each example to reveal the answer:

"Binary strings of length exactly 3"

Think: 3 symbols, each is 0 or 1

(0|1)(0|1)(0|1)

Check: 000 ✓ 010 ✓ 111 ✓ 0 ✗ 0011 ✗

"Binary strings containing 010"

Think: something, 010, something

(0|1)*010(0|1)*

Check: 010 ✓ 10101 ✓ 111 ✗

"Strings over {a,b} with even length"

Think: pairs of symbols, repeated

((a|b)(a|b))*

Check: "" ✓ "ab" ✓ "a" ✗ "abba" ✓

"Over {0,1} with no consecutive 1s"

Think: after each 1, must see 0 or end

(0|10)*(1|ε)

Check: "" ✓ "101" ✓ "11" ✗ "010" ✓

Writing Strategy

Think in building blocks: (1) What must appear? → concat. (2) What can repeat? → star. (3) What are the choices? → union. Combine these three to express any regular language.

RE → ε-NFA: Thompson's Construction

We can systematically convert any RE into an equivalent ε-NFA using Thompson's Construction (1968).

The Idea: Build Like LEGO

Each sub-expression gets a small NFA "fragment"
Every fragment has exactly one start and one accept state
Combine fragments using rules for |, ·, and *
The structure mirrors the parse tree!

Analogy

Each base case (symbol, ε) is a basic LEGO brick. Union, concat, and star are ways to snap bricks together. Build bottom-up until you have one NFA for the whole RE.

Thompson's Construction Rules

Click each rule to see the NFA fragment it produces:

Click a rule above to see its NFA fragment on the Canvas.

Thompson's Construction: (0|1)*01

Watch the NFA build step by step:

Step 0 / 9

Challenge: Predict the Thompson NFA

Given the RE a(b|c), how many states will Thompson's Construction produce?

Think about it step by step:

Base case for a: 2 states
Base case for b: 2 states
Base case for c: 2 states
Union b|c: adds 2 new states
Concat a·(b|c): merges, no new states

Your answer:

DFA → RE: State Elimination

To convert a DFA back to a regular expression, we eliminate states one by one.

The Algorithm

Add new unique start s → old start (ε)
Add new unique accept f ← old accepts (ε)
Remove states one by one (not s or f)
When only s and f remain, the edge label is the RE!

Core Rule for Removing State q

For every pair (qi, qj) through q:
New label = (old qi→qj) | (qi→q)(q→q)*(q→qj)

State Elimination: Step-Through

Convert a DFA (binary strings ending in 1) to RE:

Step 0 / 7

Challenge: Fix the Bug

A student tried to eliminate state q1 from this DFA but got the wrong RE. Find the mistake:

DFA: s --ε→ q0, q0 --0→ q0, q0 --1→ q1, q1 --1→ q1, q1 --0→ q0, q1 --ε→ f

Student eliminated q0 first and wrote:

s --0*1→ q1 ✓

q1 self-loop: 1 ← BUG?

q1 --ε→ f ✓

What's wrong with the q1 self-loop label?

Algebraic Laws of Regular Expressions

REs obey useful identities. Click a category to explore:

Category	Law	Rule
Union	Commutative	`R\|S = S\|R`
	Associative	`(R\|S)\|T = R\|(S\|T)`
	Idempotent	`R\|R = R`
	Identity	`R\|∅ = R`
Concat	Associative	`(RS)T = R(ST)`
	Identity	`Rε = εR = R`
	Annihilator	`R∅ = ∅R = ∅`
Distrib.	Left	`R(S\|T) = RS\|RT`
Distrib.	Right	`(S\|T)R = SR\|TR`
Star	Idempotent	`(R) = R*`
	Star of ε	`ε* = ε`
	Star of ∅	`∅* = ε`

Not Commutative!

Concatenation is NOT commutative. ab ≠ ba. Order matters!

Why These Matter

These laws let you simplify complex REs. For example: 0*1(1|00*1)* = (0|1)*1 can be verified algebraically.

Theory vs Practice: Regex in Programming

The "regex" in Python/Java is more powerful than theoretical REs!

Theoretical RE (CS305)

Only: union, concat, star
Describes exactly the regular languages
Equivalent to DFA/NFA
Always runs in O(n) time

Practical Regex (Programming)

Adds: +, ?, {n,m}, [a-z], \d
Backreferences: \1 (NOT regular!)
Lookahead (NOT regular!)
Can cause exponential blowup!

Feature	Theoretical	Still Regular?
`R+`	Write as `RR*`	Yes
`R?`	Write as `R\|ε`	Yes
`[a-z]`	Write as `a\|b\|...\|z`	Yes
`R{3,5}`	Expand manually	Yes
`\1` backref	Cannot express	NO!
`(?=...)`	Cannot express	NO!

ReDoS

Backreferences make regex matching NP-hard. Patterns like (a+)+ cause catastrophic backtracking — a real security vulnerability.

Common RE Patterns

Useful building blocks for writing regular expressions:

Over Σ = {0, 1}

Language	RE
All strings	`(0\|1)*`
Starts with 1	`1(0\|1)*`
Ends in 00	`(0\|1)*00`
Contains 101	`(0\|1)101(0\|1)`
Even length	`((0\|1)(0\|1))*`
Only 0s	`0*`
Exactly 3 chars	`(0\|1)(0\|1)(0\|1)`

Over Σ = {a, b}

Language	RE
Starts & ends with a	`a(a\|b)*a \| a`
≥2 b's	`(a\|b)b(a\|b)b(a\|b)*`
No consecutive a's	`(b\|ab)*(a\|ε)`
Every a followed by b	`(b\|ab)*`

Building Block Patterns

Σ* = anything
Σ* w Σ* = contains w
w Σ* = starts with w
Σ* w = ends with w

Interactive RE Tester

Enter a simple RE and test strings against it:

Regular Expression (over {0,1}):

Test string:

How This Works

This tester converts the theoretical RE into a JavaScript regex. It supports: 0, 1, | (union), concatenation, * (star), ε (empty string), and parentheses.

Reminder

This is a theoretical RE tester — only the three basic operations (union, concat, star) plus base cases. No +, ?, [...], or backreferences.

Challenge: Match Language to RE

For each language description, select the correct regular expression:

1. Binary strings with at least one 0

2. Strings over {a,b} of odd length

3. Binary strings NOT ending in 11

Summary & Cheat Sheet

Precedence: * > concat > |

Base: ∅ (empty lang), ε (empty string), a (symbol)

Operations: R|S (union), RS (concat), R* (star)

Key Identities

`R\|S = S\|R`	union commutes
`R\|R = R`	idempotent
`Rε = R`	concat identity
`R∅ = ∅`	annihilator
`(R) = R*`	star idempotent
`∅* = ε`	star of empty

Common Mistakes

∅ ≠ ε (empty lang vs empty string)
ab* ≠ (ab)* (star binds tightest)
R* always includes ε
Concat does NOT commute
Practical regex ≠ theoretical RE

Quiz: Multiple Choice

Q1: What is L(∅*)?

{ } { ε } { ∅ }

Q2: Which is NOT a valid RE simplification?

R|R = R RS = SR R|∅ = R

Q3: Thompson's Construction for a symbol 'a' produces how many states?

1 2 3

Quiz: Trace Exercise

Given the RE (0|1)*00, determine which strings are accepted:

For each string, select Accept or Reject:

The RE: (0|1)*00

This accepts all binary strings ending in 00.

(0|1)* matches any prefix, then 00 requires the string to end with two zeros.

Quiz: Build the RE

Write the regular expression for each language description. Type your answer and check:

Regular Expressions

The Big Picture: Three Equivalent Views

Key Idea

Analogy

What Is a Regular Expression?

Analogy

Quick Examples

The Three Basic Operations

That's It!

Formal Definition (Recursive)

Watch Out: ∅ vs ε

Operator Precedence

Arithmetic Parallel

Interactive RE Parse Tree Explorer

Reading Strategy

Writing Regular Expressions

"Binary strings of length exactly 3"

"Binary strings containing 010"

"Strings over {a,b} with even length"

"Over {0,1} with no consecutive 1s"

Writing Strategy

RE → ε-NFA: Thompson's Construction

The Idea: Build Like LEGO

Analogy

Thompson's Construction Rules

Thompson's Construction: (0|1)*01

Challenge: Predict the Thompson NFA

DFA → RE: State Elimination

The Algorithm

Core Rule for Removing State q

State Elimination: Step-Through

Challenge: Fix the Bug

Algebraic Laws of Regular Expressions

Not Commutative!

Why These Matter

Theory vs Practice: Regex in Programming

Theoretical RE (CS305)

Practical Regex (Programming)

ReDoS

Common RE Patterns

Over Σ = {0, 1}

Over Σ = {a, b}

Building Block Patterns

Interactive RE Tester

How This Works

Reminder

Challenge: Match Language to RE

1. Binary strings with at least one 0

2. Strings over {a,b} of odd length

3. Binary strings NOT ending in 11

Summary & Cheat Sheet

Key Identities

Common Mistakes

Quiz: Multiple Choice

Q1: What is L(∅*)?

Q2: Which is NOT a valid RE simplification?

Q3: Thompson's Construction for a symbol 'a' produces how many states?

Quiz: Trace Exercise

The RE: (0|1)*00

Quiz: Build the RE

1. Binary strings starting with 1 and ending with 0

2. Strings over {a,b} with exactly one b

3. Binary strings with an even number of 0s

Answers