Tools Implementation & Integration

Module 6 — Program Analysis Bootcamp

From individual analyses to integrated tools

Security
Taint Analysis (M5)
+
Safety
Sign Domain (M4)
+
Code Quality
Dead Code (AST)
=
Integrated Tool
Pipeline (M6)

Learning Objectives

  1. Define unified finding types that standardize outputs across passes
  2. Implement dead code detection as a purely AST-level analysis
  3. Build analysis passes using record-based, composable architecture
  4. Compose multiple passes and merge results
  5. Design configurable pipelines that select and filter
  6. Generate structured reports in text and JSON

The Full Journey

M1: Foundations (ASTs, CFGs)
M2: AST Construction
M3: Static Analysis Framework
M4: Abstract Interpretation → Safety pass
M5: Security Analysis → Security pass
M6: Compose everything → Integrated tool
Like building a car: M2-M5 built individual components (engine, brakes, steering). M6 assembles them into a working vehicle with a dashboard and controls.

One Program, Three Bug Classes

Click each analysis pass to see what it finds in the same code.

func handler(request):
user_input = get_param("q") -- source
query = "SELECT..." + user_input
exec_query(query) -- sink
ratio = user_input / 0 -- div-by-0
return ratio
dead_var = 42 -- dead
log(dead_var) -- dead

Click a pass button to see what it detects.

Unified Finding Types

All passes must speak the same language — a unified finding record.

type severity =
Critical | High | Medium | Low | Info
type category =
Security | Safety | CodeQuality | Performance
type finding = {
id : int;
category : category;
severity : severity;
pass_name : string;
location : string;
message : string;
suggestion : string option;
}

Why unify?

Without: Safety → warning strings
Taint → vulnerability records
Dead → location lists
Cannot sort, filter, or deduplicate!
With: All passes → finding list
Sort + filter + dedup = trivial
Severity
Critical — Exploitable flaw
High — Likely crash
Medium — Code quality
Low/Info — Suggestions
Category
Security — Injection, leaks
Safety — Div-by-zero, null
CodeQuality — Dead code
Performance — Redundant ops
Same idea as SARIF (Static Analysis Results Interchange Format) used by real tools.

Finding Operations

Sort, filter, and deduplicate — because all passes produce the same type.

Sample Findings

let compare_by_severity f1 f2 =
compare (sev_to_int f1.severity)
(sev_to_int f2.severity)
let filter_by_severity threshold findings =
List.filter (fun f ->
sev_to_int f.severity <= sev_to_int threshold
) findings
let filter_by_category cats findings =
List.filter (fun f ->
List.mem f.category cats) findings
let deduplicate findings =
(* key = (message, location) *)
filter_unique_by key findings
Click an operation to see it applied.

Analysis Pass Architecture

Each analysis is packaged as a record with a uniform interface.

type analysis_pass = {
name : string;
category : category;
run : program -> finding list;
}

Benefits

Uniform
Every pass is program → finding list
Composable
Combine via list ops
Configurable
Enable/disable at runtime
Extensible
Add passes without changing pipeline

Record vs Functor

Record (our choice)
let pass = {
name = "safety";
run = analyze_safety
}
Runtime flexible, first-class values, easy to store in lists
Functor (heavier)
module type PASS = sig
val name : string
val run : ...
end
Compile-time only, can't store different passes in same list
Records give us runtime flexibility that functors cannot — critical for configurable pipelines.

Dead Code Detection

A purely AST-level analysis — no abstract domains needed.

func compute(a, b):
x = a + b
y = 42 -- unused?
return x
z = x * 2 -- after return?
log(z) -- after return?

Detection Strategy

PatternSeverity
Unreachable after ReturnMedium
Unused variablesLow
Unused parametersInfo
Algorithm:
  1. Walk AST → find Return + trailing stmts
  2. Collect assigned vars and used vars
  3. Unused = assigned − used (set difference)
Step through to see findings accumulate.

Wrapping M4 & M5 as Passes

Existing analyses become passes by wrapping them and converting results to finding records.

Safety Pass (Sign Domain → Findings)

let safety_pass = {
name = "safety";
category = Safety;
run = fun program ->
List.concat_map (fun func ->
let env = SignDomain.analyze func in
check_divisions env func
) program.functions
}
Detection: divisor = Zero → High, divisor = Top → Medium

Security Pass (Taint Domain → Findings)

let security_pass config = {
name = "taint";
category = Security;
run = fun program ->
List.concat_map (fun func ->
let env = TaintDomain.analyze config func in
check_sinks env config func
) program.functions
}
Detection: tainted arg at sink → Critical
The analysis code is unchanged from M4/M5. The pass just wraps it and converts results to unified finding records. This is the power of good abstractions.

🎯 Challenge A: Classify the Finding

For each scenario, choose the correct severity and category.

Scenario 1

A variable user_input flows directly into exec() without any sanitization.

Scenario 2

Function compute(x) is defined but never called anywhere in the program.

Scenario 3

Division by variable y where sign analysis shows y could be zero.

Scenario 4

Variable count is assigned on line 3 but overwritten on line 4 without being read.

Multi-Pass Composition

Run multiple analysis passes over the same AST and merge their findings into a unified report.

let run_all passes ast =
let all_findings =
List.concat_map
(fun p -> p.run ast)
passes
in
sort_findings all_findings

Worked Example: Three-Pass Analysis

Watch as dead code, safety, and security passes analyze the same program — findings accumulate in a unified table.

1 let x = input()
2 let y = 0
3 let z = 42
4 let w = x / y
5 exec(x)
6 print(w)
Findings Table
Line Severity Category Message

Configuration Types & Builder

A config record controls which passes run, what severity to filter, and output format. Build configs with the |> pipeline operator.

type config = {
passes: analysis_pass list;
min_severity: severity;
output_format: format;
dedup: bool;
}
let my_config =
default_config
|> add_pass dead_code
|> add_pass safety
|> add_pass security
|> set_min_severity Medium
|> set_format JSON
|> enable_dedup
Interactive Config Builder
Passes:
Min Severity:
Format:
Analogy: Think of config like a coffee order — you pick the beans (passes), grind size (severity filter), and cup type (output format). The barista (pipeline) follows your spec exactly.

Applying Filters

Filters reduce noise by removing low-severity findings, duplicates, and excluded categories. Toggle filters to see the effect live.

All Findings (7)
Filtered Findings (7)
Min Severity:
Exclude:
Key Idea: Filters run after all passes complete. This means you can re-filter the same results without re-running analysis — fast iteration on what matters.

Pipeline Architecture

The full pipeline: Parse → Analyze → Filter → Deduplicate → Sort → Report. Each stage is a pure function — easy to test and compose.

let run_pipeline config source =
source
|> parse
|> run_passes config.passes
|> filter_by_severity config.min_severity
|> (if config.dedup then dedup else Fun.id)
|> sort_findings
|> report config.output_format

Reporting: Text vs JSON

The same findings can be rendered as human-readable text or machine-readable JSON. Toggle to compare both formats.

let report format findings =
match format with
| Text ->
findings
|> List.map format_text_line
|> String.concat "\n"
| JSON ->
findings
|> List.map to_json_obj
|> wrap_in_array
|> json_to_string
| Stats ->
findings
|> group_by_severity
|> format_summary
Key Idea: The report function is the last stage in the pipeline. It doesn't modify findings — it just serializes them. This separation means you can add new formats without touching analysis logic.

🎯 Challenge B: Build a Pipeline Config

A team needs a CI/CD security gate. Build the config that meets their requirements.

Requirements:
  • ✅ Must check for security vulnerabilities
  • ✅ Must check for runtime safety issues
  • ❌ No dead code analysis (too noisy for CI)
  • ✅ Only report Medium severity and above
  • ✅ Remove duplicate findings
  • ✅ Output must be JSON (for CI parsing)
Your Config:
Passes:
Min Severity:
Format:

Real-World Static Analysis Tools

Production tools implement exactly the pipeline we built — with richer rules, IDE integration, and CI/CD hooks. Click each to explore.

Click a tool on the left to see details.

Putting It All Together

Watch the complete pipeline process a real program — from source code to final report.

Key Takeaways

1. Unified Finding Types
A single finding record with severity + category + location works for all analysis types. Standardization enables composition.
2. Pass Architecture
Each analysis is a self-contained analysis_pass — a record with name and run. Wrap any M3–M5 analysis into this interface.
3. Configuration-Driven
Pipeline behavior is controlled by a config record — which passes, severity filters, output format. No code changes needed to adjust behavior.
4. Composable Pipeline
Parse → Analyze → Filter → Dedup → Sort → Report. Each stage is a pure function connected by |>. Easy to test, extend, reorder.
Common Mistake: Don't skip the filter/dedup stages! Without them, multi-pass analysis produces noisy, duplicate-heavy reports that developers ignore. Signal-to-noise ratio matters.

🎓 Bootcamp Complete!

You've built a complete program analysis toolkit from the ground up.

M1: Foundations
✅ OCaml, ASTs, CFGs
M2: AST Analysis
✅ Visitors, patterns, transforms
M3: Static Analysis
✅ Dataflow, reaching defs, liveness
M4: Abstract Interp.
✅ Lattices, domains, widening
M5: Security Analysis
✅ Taint tracking, OWASP, sanitizers
M6: Tools Integration
✅ Findings, passes, pipelines
You can now build real static analysis tools!
From parsing source code to generating actionable security reports — the full pipeline is yours.
Final Analogy: You started as someone who could read a blueprint (AST). Now you can run the entire quality inspection factory — multiple inspectors (passes), a unified defect report (findings), configurable assembly line (pipeline), and formatted output for the customer (reporting).

🎯 Challenge C: Debug the Pipeline

Each pipeline has a bug. Identify what's wrong and pick the fix.

Bug 1: Missing Findings
source |> parse
|> run_passes [security]
|> filter_by_severity High
|> report Text

Team says: "We're missing medium-severity SQL injection findings!"

Bug 2: Duplicate Alerts
source |> parse
|> run_passes [safety; security]
|> sort_findings
|> report JSON

Team says: "Same finding appears twice in the report!"

Bug 3: Wrong Order
source |> parse
|> run_passes [dead_code; safety]
|> report Text
|> filter_by_severity Medium

Team says: "Filter doesn't work — all findings still show up!"

Bug 4: No Output
source |> parse
|> run_passes []
|> filter_by_severity Info
|> report JSON

Team says: "Report is always empty — no findings at all!"

Quiz 1: Concept Check

Q1: Finding Record

Which field is NOT part of our unified finding type?

Q2: Pipeline Order

What is the correct pipeline stage order?

Q3: Pass Wrapping

Why wrap M4/M5 analyses as analysis_pass records?

Quiz 2: Pipeline Trace

Given this config and program, predict what the final report contains.

(* Config *)
passes = [dead_code; security]
min_severity = Medium
dedup = true
format = Text
1 let a = input()
2 let b = 10
3 let c = 20
4 exec(a)
5 query(a)
6 print(b)
Your Prediction:

How many findings in the final report?

Which findings survive? (check all that apply)

Quiz 3: Design Decisions

For each real-world scenario, choose the best design approach.

Scenario 1: New Analysis

Your team writes a new "complexity analysis" that measures function complexity. How should you integrate it?

Scenario 2: CI Performance

CI takes 10 minutes because dead code analysis is slow on large codebases. What's the best fix?

Scenario 3: Output Format

Your tool needs to work with both GitHub code scanning AND a custom dashboard. What output strategy?

Scenario 4: False Positives

Developers complain about too many false positives from the safety pass. Best approach?