A basic block is a maximal sequence of statements with one entry, one exit, and no branching except at the end.
Click Step to see how block boundaries are identified:
let example x =
let a = 1 in
let b = 2 in
let c =
if a > b then
a + b
else
a - b
in
print_int c
New block starts at: (1) Function entry, (2) Branch targets (after if/else/loop), (3) Statements right after a branch or jump.
CFG Patterns
Three fundamental patterns — click to explore each:
Nested Structures
Real programs combine patterns. Step through to build a CFG with nested if + for loop:
let complex x y =
if x > 0 then begin (* B1 *)
for i = 0 to y-1 do (* B2 *)
if i mod 2 = 0 then (* B3 *)
print_int i (* B4 *)
(* else: skip *) (* B5 *)
done;
x + y (* B6 *)
end else
0 (* B7 *)
(* merge *) (* B8 *)
Predecessors & Successors
Click any node to see its predecessors (blue arrows in) and successors (green arrows out):
Click a node on the CFG to inspect its predecessors and successors.
Why it matters: Dataflow analysis propagates information along edges. Predecessors feed data IN, successors receive data OUT.
Analogy: Think of a river system. Predecessors are upstream tributaries feeding into a confluence. Successors are downstream branches after a fork.
Building CFGs: The Algorithm
How to systematically build a CFG from an AST. Step through the algorithm:
type basic_block = {
label : string;
mutable stmts : stmt list;
mutable succ : block list;
mutable pred : block list;
}
let build_cfg stmts =
let entry = make_block "ENTRY" in
let exit = make_block "EXIT" in
let rec process stmts cur =
match stmts with
| [] -> add_edge cur exit
| s :: rest ->
if is_branch s then
handle_branch s cur rest
else begin
add_stmt cur s;
process rest cur
end
The 3 rules: (1) Sequential statements → same block. (2) Branch statement → end block, create targets. (3) Return/end → edge to EXIT.
⚡ Challenge: Identify Basic Blocks
Where do new basic blocks start in this code? Select the correct answer for each line:
let analyze x y =
let a = x + 1 in
let b = a * y in
if a > b then
let c = a - b in
Printf.printf "%d" c
else
let d = b - a in
Printf.printf "%d" d;
let result = a + b in
result
1. How many basic blocks?
2. Which line starts Block 2?
3. Where is the merge point?
Answer: 4 blocks. B1: a=x+1, b=a*y, if a>b (entry + sequential + branch). B2: c=a-b, printf c (true target). B3: d=b-a, printf d (false target). B4: result=a+b, return (merge point).
The Three Pillars of Dataflow Analysis
CFG gives us structure. Now we need a reasoning engine. Click each pillar to explore:
Click a pillar on the left to learn about it.
Together they guarantee:
Correctness — results account for all paths
Termination — algorithm always finishes
Uniqueness — one well-defined solution
The framework is general: Swap the lattice and transfer functions to get entirely different analyses — reaching definitions, live variables, available expressions, and more.
Lattice Theory
Click any two nodes to compute their join (∨) and meet (∧):
Click two nodes to see their join and meet.
Partial Order Properties
Reflexive: a ≤ a
Antisymmetric: a ≤ b ∧ b ≤ a ⇒ a = b
Transitive: a ≤ b ∧ b ≤ c ⇒ a ≤ c
Analogy: Think of water flowing downhill. The join is where two streams merge (lowest point both reach). The meet is the highest source that feeds both.
Transfer Functions & Merge Points
Watch data flow through a block: gen adds facts, kill removes them, merge combines paths.
Transfer function:
OUT[B] = gen[B] ∪ (IN[B] - kill[B])
Watch the worklist algorithm iterate until no sets change (fixpoint!):
Iteration table (IN / OUT for each block):
Forward vs Backward × May vs Must
Four combinations define the major dataflow analyses. Click each cell:
Click a cell in the grid above to see details about that analysis type.
Gen & Kill: The Concepts
Every block generates new definitions and kills old ones. Step through to see the rules:
x = a + b (* d1: defines x *)
y = x * 2 (* d2: defines y *)
x = y + 1 (* d3: defines x *)
z = x (* d4: defines z *)
LHS defines (goes into gen): x = expr ← x is defined
Last def wins: If x is defined twice in a block, only the last definition of x survives in gen[B].
Gen/Kill Calculator
Enter assignments to see gen and kill sets update in real-time. Try adding a duplicate variable!
No statements yet. Add some above!
Try this: Add x = a+b, then y = x*2, then x = y+1. Watch d1 get replaced by d3 in gen (both define x — last one wins!).
⚡ Challenge: Compute Gen/Kill
Given this block, what are the final gen and kill sets?
a = 5 (* d1: def of a *)
b = a + 3 (* d2: def of b *)
a = b * 2 (* d3: def of a *)
c = a + b (* d4: def of c *)
Assume other blocks in the program also define a (as d7), b (as d8), and c (as d9).
1. What is gen[B]?
2. Why is d1 NOT in gen[B]?
3. What is kill[B]?
Explanation: gen = {d2, d3, d4}. d1 is NOT in gen because d3 redefines a (last def wins). kill = {d7, d8, d9} — the definitions of a, b, c from OTHER blocks. d1 is NOT in kill because kill only contains definitions from other blocks that this block's definitions overwrite.
Live Variables: Backward Analysis
Which variables might be used before redefinition? Information flows backward from uses to defs.
let x = ref 1 in (* B1: def={x} *)
while !x < 10 do (* B2: use={x} *)
x := !x + 1 (* B3: def={x}, use={x} *)
done;
print_int !x (* B4: use={x} *)
Backward equations:
OUT[B] = ∪ { IN[succ(B)] }
IN[B] = use[B] ∪ (OUT[B] - def[B])
Result: x is live at every point — it's always used before the program ends. A variable is dead if it will definitely be overwritten before any future use.
Available Expressions: Must Analysis
An expression is available only if computed on ALL paths. Uses intersection at merge points.
let t = x + y in (* B1: e_gen={x+y} *)
if cond then
let c = x + y in (* B2: x+y still avail *)
else
let a = 5 in (* B3: redefines a, *)
(* kills x+y if x=a *)
let d = x + y in (* B4: is x+y available? *)
Must vs May: Available expressions uses intersection (must be on ALL paths). Reaching defs uses union (might be on ANY path). Wrong merge operator = wrong results!
Three Analyses Compared
All three are instances of one general framework. Click a row to see details:
The General Framework: Framework = (L, ∨, f, d, Δ) — Lattice, merge op, transfer function, direction, initialization. To create a NEW analysis, just fill in these 5 parameters!
Interprocedural Analysis
Real programs have many functions. Toggle between context-insensitive and context-sensitive:
let helper param =
param + 1
let process x y =
let r1 = helper (x*2) in
let r2 = helper y in
r1 + r2
let main () =
process 5 10
⚡ Challenge: Pick the Right Analysis
For each scenario, which dataflow analysis answers the question?
Scenario 1:
"Can we avoid recomputing x+y at line 15 since it was computed at line 3?"
Scenario 2:
"Is the assignment x = 5 at line 7 ever used, or is x always overwritten before being read?"
Scenario 3:
"At line 20, could the value of y come from the assignment at line 4 or line 12?"
Scenario 4:
"After the loop ends, which variables still hold values we need? Can we free register for the rest?"
Answers:
1. Available Expressions — "is x+y computed on ALL paths?" (must analysis)
2. Live Variables — "is x used before redefined?" (backward may)
3. Reaching Definitions — "which assignments reach this use?" (forward may)
4. Live Variables — "which variables needed after this point?" (register alloc)
🎯 Quiz: Core Concepts
Q1: What defines a basic block?
Q2: Why does fixpoint iteration terminate?
Q3: Available Expressions uses intersection because...
🎯 Quiz: Trace the Fixpoint
Given this CFG with a loop, predict IN[B4] after reaching definitions converges:
B1: x=1 (d1) gen={d1}, kill={}
B2: x<10? gen={}, kill={}
B3: x=x+1 (d2) gen={d2}, kill={d1}
B4: print x gen={}, kill={}
What definitions reach the entry of B4?
Iteration 1:
OUT[B1]={d1}, IN[B2]={d1}, OUT[B3]={d2}
IN[B4]=OUT[B2]={d1} (only B1's output reaches B2 so far)
d1 reaches via the path that skips the loop. d2 reaches via the loop body. Both may have provided x's value.
🎯 Quiz: Design an Analysis
You want to find all variables that are definitely initialized before use. Fill in the framework parameters:
1. Direction?
2. Merge operator?
3. Lattice elements?
4. Initialization for non-entry blocks?
Answer the questions and click "Check Design" to see if your analysis is correct.
Hint: "Definitely initialized" means on ALL paths a variable must have been assigned. Think about what kind of analysis (may/must) matches "definitely" and which direction tracks "before use."