Static Analysis Fundamentals

Module 3 — Program Analysis Bootcamp

CFGs
Control Flow
Lattices
Dataflow Framework
Fixpoint
Analysis Results

Instructor: Weihao  |  Office Hours: By appointment, HH227

Learning Objectives

What You'll Learn

  • Construct control flow graphs from source code
  • Apply the dataflow analysis framework (lattices, transfer functions, fixpoint)
  • Implement reaching definitions, live variables, and available expressions
  • Compare forward vs backward and may vs must analyses
  • Evaluate interprocedural analysis trade-offs

Prerequisites (from Modules 1-2)

AST Knowledge: Structured representation, tree traversals, node types

Mathematical Foundations

  • Set Theory: Union (∪), intersection (∩), difference (\)
  • Graph Theory: Directed graphs, nodes, edges, paths, cycles
  • Functions: Domain, range, composition
The Analysis Pipeline: Source Code → AST (structure) → CFG (execution paths)Dataflow Analysis (properties) → Results

Why Control Flow Graphs?

ASTs show structure. CFGs show execution paths. How many paths exist in this code?

let example x y =
let a =
if x > 0 then 1
else 2
in
let b =
if y > 0 then a
else -a
in
b
Key insight: 2 branches × 2 branches = 4 distinct paths. CFGs make every path explicit — that's why static analysis needs them.

Basic Blocks

A basic block is a maximal sequence of statements with one entry, one exit, and no branching except at the end.

Click Step to see how block boundaries are identified:

let example x =
let a = 1 in
let b = 2 in
let c =
if a > b then
a + b
else
a - b
in
print_int c
New block starts at: (1) Function entry, (2) Branch targets (after if/else/loop), (3) Statements right after a branch or jump.

CFG Patterns

Three fundamental patterns — click to explore each:

Nested Structures

Real programs combine patterns. Step through to build a CFG with nested if + for loop:

let complex x y =
if x > 0 then begin (* B1 *)
for i = 0 to y-1 do (* B2 *)
if i mod 2 = 0 then (* B3 *)
print_int i (* B4 *)
(* else: skip *) (* B5 *)
done;
x + y (* B6 *)
end else
0 (* B7 *)
(* merge *) (* B8 *)

Predecessors & Successors

Click any node to see its predecessors (blue arrows in) and successors (green arrows out):

Click a node on the CFG to inspect its predecessors and successors.

Why it matters: Dataflow analysis propagates information along edges. Predecessors feed data IN, successors receive data OUT.
Analogy: Think of a river system. Predecessors are upstream tributaries feeding into a confluence. Successors are downstream branches after a fork.

Building CFGs: The Algorithm

How to systematically build a CFG from an AST. Step through the algorithm:

type basic_block = {
label : string;
mutable stmts : stmt list;
mutable succ : block list;
mutable pred : block list;
}
let build_cfg stmts =
let entry = make_block "ENTRY" in
let exit = make_block "EXIT" in
let rec process stmts cur =
match stmts with
| [] -> add_edge cur exit
| s :: rest ->
if is_branch s then
handle_branch s cur rest
else begin
add_stmt cur s;
process rest cur
end
The 3 rules: (1) Sequential statements → same block. (2) Branch statement → end block, create targets. (3) Return/end → edge to EXIT.

⚡ Challenge: Identify Basic Blocks

Where do new basic blocks start in this code? Select the correct answer for each line:

let analyze x y =
let a = x + 1 in
let b = a * y in
if a > b then
let c = a - b in
Printf.printf "%d" c
else
let d = b - a in
Printf.printf "%d" d;
let result = a + b in
result

1. How many basic blocks?

2. Which line starts Block 2?

3. Where is the merge point?

The Three Pillars of Dataflow Analysis

CFG gives us structure. Now we need a reasoning engine. Click each pillar to explore:

Click a pillar on the left to learn about it.

Together they guarantee:
  • Correctness — results account for all paths
  • Termination — algorithm always finishes
  • Uniqueness — one well-defined solution
The framework is general: Swap the lattice and transfer functions to get entirely different analyses — reaching definitions, live variables, available expressions, and more.

Lattice Theory

Click any two nodes to compute their join (∨) and meet (∧):

Click two nodes to see their join and meet.

Partial Order Properties

  • Reflexive: a ≤ a
  • Antisymmetric: a ≤ b ∧ b ≤ a ⇒ a = b
  • Transitive: a ≤ b ∧ b ≤ c ⇒ a ≤ c
Analogy: Think of water flowing downhill. The join is where two streams merge (lowest point both reach). The meet is the highest source that feeds both.

Transfer Functions & Merge Points

Watch data flow through a block: gen adds facts, kill removes them, merge combines paths.

Transfer function:
OUT[B] = gen[B] ∪ (IN[B] - kill[B])

Merge (may analysis):
IN[B] = ∪ { OUT[P] | P ∈ pred(B) }

Reaching Definitions: Fixpoint Iteration

Watch the worklist algorithm iterate until no sets change (fixpoint!):

Iteration table (IN / OUT for each block):

Forward vs Backward × May vs Must

Four combinations define the major dataflow analyses. Click each cell:

Click a cell in the grid above to see details about that analysis type.

Gen & Kill: The Concepts

Every block generates new definitions and kills old ones. Step through to see the rules:

x = a + b (* d1: defines x *)
y = x * 2 (* d2: defines y *)
x = y + 1 (* d3: defines x *)
z = x (* d4: defines z *)
LHS defines (goes into gen):
x = expr ← x is defined
Last def wins: If x is defined twice in a block, only the last definition of x survives in gen[B].

Gen/Kill Calculator

Enter assignments to see gen and kill sets update in real-time. Try adding a duplicate variable!



No statements yet. Add some above!
Try this: Add x = a+b, then y = x*2, then x = y+1. Watch d1 get replaced by d3 in gen (both define x — last one wins!).

⚡ Challenge: Compute Gen/Kill

Given this block, what are the final gen and kill sets?

a = 5 (* d1: def of a *)
b = a + 3 (* d2: def of b *)
a = b * 2 (* d3: def of a *)
c = a + b (* d4: def of c *)

Assume other blocks in the program also define a (as d7), b (as d8), and c (as d9).

1. What is gen[B]?

2. Why is d1 NOT in gen[B]?

3. What is kill[B]?

Live Variables: Backward Analysis

Which variables might be used before redefinition? Information flows backward from uses to defs.

let x = ref 1 in (* B1: def={x} *)
while !x < 10 do (* B2: use={x} *)
x := !x + 1 (* B3: def={x}, use={x} *)
done;
print_int !x (* B4: use={x} *)
Backward equations:
OUT[B] = ∪ { IN[succ(B)] }
IN[B] = use[B] ∪ (OUT[B] - def[B])
Result: x is live at every point — it's always used before the program ends. A variable is dead if it will definitely be overwritten before any future use.

Available Expressions: Must Analysis

An expression is available only if computed on ALL paths. Uses intersection at merge points.

let t = x + y in (* B1: e_gen={x+y} *)
if cond then
let c = x + y in (* B2: x+y still avail *)
else
let a = 5 in (* B3: redefines a, *)
(* kills x+y if x=a *)
let d = x + y in (* B4: is x+y available? *)
Must vs May: Available expressions uses intersection (must be on ALL paths). Reaching defs uses union (might be on ANY path). Wrong merge operator = wrong results!

Three Analyses Compared

All three are instances of one general framework. Click a row to see details:

The General Framework: Framework = (L, ∨, f, d, Δ) — Lattice, merge op, transfer function, direction, initialization. To create a NEW analysis, just fill in these 5 parameters!

Interprocedural Analysis

Real programs have many functions. Toggle between context-insensitive and context-sensitive:

let helper param =
param + 1
let process x y =
let r1 = helper (x*2) in
let r2 = helper y in
r1 + r2
let main () =
process 5 10

⚡ Challenge: Pick the Right Analysis

For each scenario, which dataflow analysis answers the question?

Scenario 1:

"Can we avoid recomputing x+y at line 15 since it was computed at line 3?"

Scenario 2:

"Is the assignment x = 5 at line 7 ever used, or is x always overwritten before being read?"

Scenario 3:

"At line 20, could the value of y come from the assignment at line 4 or line 12?"

Scenario 4:

"After the loop ends, which variables still hold values we need? Can we free register for the rest?"

🎯 Quiz: Core Concepts

Q1: What defines a basic block?

Q2: Why does fixpoint iteration terminate?

Q3: Available Expressions uses intersection because...

🎯 Quiz: Trace the Fixpoint

Given this CFG with a loop, predict IN[B4] after reaching definitions converges:

B1: x=1 (d1)   gen={d1}, kill={}

B2: x<10?   gen={}, kill={}

B3: x=x+1 (d2)   gen={d2}, kill={d1}

B4: print x   gen={}, kill={}

What definitions reach the entry of B4?

🎯 Quiz: Design an Analysis

You want to find all variables that are definitely initialized before use. Fill in the framework parameters:

1. Direction?

2. Merge operator?

3. Lattice elements?

4. Initialization for non-entry blocks?

Answer the questions and click "Check Design" to see if your analysis is correct.

Hint: "Definitely initialized" means on ALL paths a variable must have been assigned. Think about what kind of analysis (may/must) matches "definitely" and which direction tracks "before use."