BFS - Breadth-First Search

Explore a graph level by level

[A]---[B]---[E] | | | [C]---[D]---[F] | | [G]---------[H] Level 0: A Level 1: B, C Level 2: D, E, G Level 3: F, H
CS205 Data Structures Graph Algorithms Shortest Paths
1 / 18

What is BFS?

Exploring a graph one layer at a time

Breadth-First Search starts at a source vertex and explores the graph in expanding rings:

  • First, visit all neighbors of the source (distance 1)
  • Then, visit all neighbors of neighbors (distance 2)
  • Then distance 3, 4, 5 ... until the entire reachable graph is explored

Analogy: Ripples in a Pond

Drop a stone into still water. The ripples expand outward in concentric circles. BFS works exactly the same way -- it radiates outward from the source, reaching everything at distance k before anything at distance k+1.

Source = A [A]---[B]---[E] Level 0: A | | | [C]---[D]---[F] Level 1: B, C | | [G]---------[H] Level 2: D, E, G Level 3: F, H ~~~~ Ripple 0 ~~~~ A ~~~~~ Ripple 1 ~~~~ B, C ~~~~~~ Ripple 2 ~~~ D, E, G ~~~~~~~ Ripple 3 ~~ F, H

Key Idea

BFS guarantees that when you first reach a vertex, you have found the shortest path (in terms of number of edges) from the source to that vertex.

2 / 18

BFS Uses a Queue

FIFO ordering is the secret ingredient

Two Key Data Structures

  • Queue (FIFO) -- holds vertices waiting to be explored. First in, first out ensures level-by-level order.
  • Visited set (or boolean array) -- prevents revisiting vertices and infinite loops in cyclic graphs.

Why a Queue?

A queue processes vertices in the order they were discovered. All level-1 vertices are enqueued before any level-2 vertices, so level-1 is fully processed first. This is what makes it "breadth-first."

Warning: Stack != BFS

If you replace the queue with a stack, you get DFS (Depth-First Search), which goes deep before going wide. The data structure choice is the fundamental difference.

Pseudocode Overview

BFS(graph, source): create empty Queue Q create empty Set visited Q.enqueue(source) visited.add(source) while Q is not empty: u = Q.dequeue() process(u) // e.g., print u for each neighbor v of u: if v not in visited: visited.add(v) Q.enqueue(v)
Queue behavior (FIFO): enqueue --> [ D | C | B | A ] --> dequeue back front Dequeue returns A first (earliest discovered)
3 / 18

BFS Algorithm -- Detailed

With distance and parent tracking

BFS(graph, source): // --- INITIALIZATION --- for each vertex v in graph: dist[v] = INFINITY // unknown distance parent[v] = NULL // no parent yet color[v] = WHITE // unvisited dist[source] = 0 color[source] = GRAY // discovered parent[source] = NULL Q = empty queue Q.enqueue(source) // --- MAIN LOOP --- while Q is not empty: u = Q.dequeue() for each neighbor v of u: if color[v] == WHITE: // unvisited? color[v] = GRAY // mark discovered dist[v] = dist[u] + 1 parent[v] = u Q.enqueue(v) color[u] = BLACK // fully explored

Color Scheme (CLRS convention)

ColorMeaning
WHITEUndiscovered
GRAYDiscovered, in the queue
BLACKFully explored (dequeued)

What Each Array Stores

ArrayPurpose
dist[v]Shortest distance from source to v
parent[v]Predecessor of v on shortest path
color[v]Visit status (WHITE/GRAY/BLACK)

Key Idea

We mark a vertex as discovered (GRAY) when we enqueue it, not when we dequeue it. This prevents the same vertex from being enqueued multiple times.

4 / 18

BFS Step-by-Step Example

Source = A on the following graph

The Graph

[A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Adjacency lists (alphabetical): A: B, C, D B: A, E C: A, F D: A, G E: B, G, H F: C G: D, E H: E

Step 0: Initialize

Enqueue source A, mark visited. Graph: [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Queue: [ A ] Visited: { A } Dist: A=0

Step 1: Dequeue A, explore neighbors

Dequeue A. Neighbors: B, C, D All unvisited --> enqueue all. Graph: [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Queue: [ B, C, D ] Visited: { A, B, C, D } Dist: A=0 B=1 C=1 D=1
5 / 18

BFS Step-by-Step (continued)

Processing level 1 vertices

Step 2: Dequeue B, explore neighbors

Dequeue B. Neighbors: A, E A already visited. E is new --> enqueue E. Graph: [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Queue: [ C, D, E ] Visited: { A, B, C, D, E } Dist: A=0 B=1 C=1 D=1 E=2

Step 3: Dequeue C, explore neighbors

Dequeue C. Neighbors: A, F A already visited. F is new --> enqueue F. Graph: [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Queue: [ D, E, F ] Visited: { A, B, C, D, E, F } Dist: A=0 B=1 C=1 D=1 E=2 F=2

Step 4: Dequeue D, explore neighbors

Dequeue D. Neighbors: A, G A already visited. G is new --> enqueue G. Graph: [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Queue: [ E, F, G ] Visited: { A, B, C, D, E, F, G } Dist: A=0 B=1 C=1 D=1 E=2 F=2 G=2

Step 5: Dequeue E, explore neighbors

Dequeue E. Neighbors: B, G, H B visited. G visited. H is new --> enqueue H. Graph: [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Queue: [ F, G, H ] Visited: { A, B, C, D, E, F, G, H } Dist: A=0 B=1 C=1 D=1 E=2 F=2 G=2 H=3
6 / 18

BFS Step-by-Step (final)

Processing level 2 and level 3 vertices

Step 6: Dequeue F

Dequeue F. Neighbors: C C already visited. Nothing to enqueue. Queue: [ G, H ]

Step 7: Dequeue G

Dequeue G. Neighbors: D, E Both already visited. Nothing to enqueue. Queue: [ H ]

Step 8: Dequeue H

Dequeue H. Neighbors: E E already visited. Nothing to enqueue. Queue: [ ] (EMPTY -- BFS complete!)

Observation

The queue emptied naturally. Every reachable vertex was visited exactly once. Total dequeue operations = 8 (one per vertex).

Final State

All vertices visited (GREEN): [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H]

BFS Tree (tree edges only)

[A] dist = 0 / | \ [B] [C] [D] dist = 1 | | | [E] [F] [G] dist = 2 | [H] dist = 3

Distance & Parent Table

VertexABCDEFGH
Dist01112223
Parent--AAABCDE
7 / 18

The BFS Tree

Tree edges vs. cross edges

What Is the BFS Tree?

During BFS, each vertex (except the source) is discovered from exactly one other vertex. The edge used for that discovery is called a tree edge.

All tree edges together form a spanning tree of the connected component -- the BFS tree.

Original graph BFS tree (source=A) [A]----[B] [A] / \ \ / | \ [C] [D] [E] [B] [C] [D] | | / | | | | [F] [G] [H] [E] [F] [G] | [H]

Key Idea

The BFS tree encodes shortest paths. The path from any vertex back to the root in the BFS tree is the shortest path in the original graph.

Edge Classification

Edge TypeDefinitionExample
Tree edgeUsed to discover a new vertexA-B, A-C, A-D, B-E, C-F, D-G, E-H
Cross edgeConnects vertices at same or adjacent levels, not used for discoveryD-G* already tree, C-A back, E-G cross
Edges in original graph: Tree edges (solid): Cross edges (dashed): A--B B----A (back to parent) A--C E----G (same level) A--D G----D (back to parent) B--E F----C (back to parent) C--F H----E (back to parent) D--G E--H

Important Property

In an undirected BFS, cross edges can only connect vertices whose levels differ by at most 1. There are no edges skipping levels.

8 / 18

BFS Finds Shortest Paths (Unweighted)

Level = distance from source

The Guarantee

In an unweighted graph (all edges have cost 1), BFS discovers every vertex at the minimum possible distance from the source.

Source = A Distance 0: A Distance 1: B, C, D (direct neighbors) Distance 2: E, F, G (2 hops away) Distance 3: H (3 hops away) Shortest path A-->H: A --> B --> E --> H (length 3)

Why Does This Work?

  1. BFS processes vertices in non-decreasing order of distance.
  2. When we discover vertex v from vertex u, we set dist[v] = dist[u] + 1.
  3. Since u was dequeued first, dist[u] is already optimal.
  4. Therefore dist[v] is also optimal.

Proof Sketch (by contradiction)

Assume BFS does NOT give the shortest distance to some vertex v. Let d*(v) = true shortest distance. Let d(v) = BFS-computed distance. Then d(v) > d*(v). But BFS sets d(v) = d(u) + 1 for some u where (u,v) is an edge. Since d(u) >= d*(u) and there exists a shortest path ending with edge (u',v) where d*(u') = d*(v) - 1: BFS must have discovered u' at distance <= d*(u') and thus v at distance <= d*(u') + 1 = d*(v). Contradiction: d(v) <= d*(v). Combined with d(v) >= d*(v), we get d(v) = d*(v). QED

Only Unweighted Graphs!

BFS does NOT find shortest paths in weighted graphs. For weighted graphs, use Dijkstra's algorithm (non-negative weights) or Bellman-Ford (negative weights allowed).

9 / 18

Computing Distances

dist[v] = dist[u] + 1 when discovering v from u

Distance Computation Trace

Graph: [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Source A: dist[A] = 0 Discover B from A: dist[B] = 0 + 1 = 1 Discover C from A: dist[C] = 0 + 1 = 1 Discover D from A: dist[D] = 0 + 1 = 1 Discover E from B: dist[E] = 1 + 1 = 2 Discover F from C: dist[F] = 1 + 1 = 2 Discover G from D: dist[G] = 1 + 1 = 2 Discover H from E: dist[H] = 2 + 1 = 3

Key Code Fragment

// When processing edge (u, v): if not visited[v]: visited[v] = true dist[v] = dist[u] + 1 // THIS LINE parent[v] = u queue.enqueue(v)

Distance Array Evolution

After processingABCDEFGH
Init0
Dequeue A0111
Dequeue B01112
Dequeue C011122
Dequeue D0111222
Dequeue E01112223

Key Idea

Distances are computed as vertices are discovered (enqueued), not when they are processed (dequeued). Each vertex's distance is set exactly once and never changes.

10 / 18

Reconstructing the Shortest Path

Follow the parent pointers back to the source

Parent Array

VertexABCDEFGH
parent--AAABCDE

Example: Path from A to H

Start at H. parent[H] = E --> path: H parent[E] = B --> path: E, H parent[B] = A --> path: B, E, H parent[A] = -- --> path: A, B, E, H (DONE) Shortest path: A --> B --> E --> H Distance: 3

Example: Path from A to G

Start at G. parent[G] = D --> path: G parent[D] = A --> path: D, G parent[A] = -- --> path: A, D, G (DONE) Shortest path: A --> D --> G Distance: 2

Path Reconstruction Code

function shortestPath(source, dest, parent): path = empty list current = dest // Trace back from dest to source while current != NULL: path.addFirst(current) // prepend current = parent[current] // Check if path is valid if path[0] != source: return "No path exists" return path
Visual: following parent pointers BFS Tree: [A] / | \ [B] [C] [D] | | | [E] [F] [G] | [H] Path A-->H: go UP from H [A] | [B] | [E] | [H]

Analogy: Breadcrumb Trail

The parent array is like leaving breadcrumbs. Each vertex remembers who led it there. To find your way back to the source, just follow the breadcrumbs!

11 / 18

BFS on Directed Graphs

Same algorithm, but only follow outgoing edges

Directed Graph Example

Source = A A ---> B ---> E | | | v v v C ---> D ---> F | v G Adjacency lists (outgoing only): A: B, C B: D, E C: D D: F, G E: F F: (none) G: (none)

Direction Matters!

In a directed graph, edge A-->B does NOT mean you can go from B to A. BFS only follows outgoing edges from the current vertex.

BFS Trace (source = A)

Step 0: Enqueue A Queue: [A] Visited: {A} Step 1: Dequeue A. Out-neighbors: B, C Queue: [B, C] Visited: {A, B, C} dist: A=0 B=1 C=1 Step 2: Dequeue B. Out-neighbors: D, E Queue: [C, D, E] Visited: {A,B,C,D,E} dist: D=2 E=2 Step 3: Dequeue C. Out-neighbors: D D already visited. Skip. Queue: [D, E] Step 4: Dequeue D. Out-neighbors: F, G Queue: [E, F, G] Visited: {A,B,C,D,E,F,G} dist: F=3 G=3 Step 5: Dequeue E. Out-neighbors: F F already visited. Skip. Queue: [F, G] Step 6: Dequeue F. No out-neighbors. Queue: [G] Step 7: Dequeue G. No out-neighbors. Queue: [] DONE!

Result

VertexABCDEFG
Dist0112233
Parent--AABBDD
12 / 18

BFS for Connected Components

Finding all pieces of a disconnected graph

Disconnected Graph

Component 1 Component 2 Comp 3 ----------- ----------- ------ [A]---[B] [E]---[F] [H] | | | | [C]---[D] [G]---[I]

Algorithm

function findComponents(graph): components = 0 visited = empty set for each vertex v in graph: if v not in visited: components += 1 BFS(graph, v, visited) // BFS marks all reachable // vertices as visited return components

Each BFS call explores one entire connected component. We count how many times we need to start a new BFS.

Trace

Pass 1: Start BFS from A Visits: A, B, C, D visited = {A, B, C, D} components = 1 Skip B (visited), C (visited), D (visited) Pass 2: Start BFS from E Visits: E, F, G, I visited = {A, B, C, D, E, F, G, I} components = 2 Skip F, G (visited) Pass 3: Start BFS from H Visits: H visited = {A, B, C, D, E, F, G, H, I} components = 3 Skip I (visited) Result: 3 connected components

Key Idea

A single BFS from one vertex explores only its connected component. To cover the entire graph, loop through all vertices and start a new BFS whenever you find an unvisited vertex.

Analogy: Islands

Think of each component as an island. BFS explores one island completely. You need a new "boat trip" (new BFS) to reach each separate island.

13 / 18

Time Complexity: O(V + E)

Every vertex and every edge examined exactly once

Why O(V + E)?

BFS main loop: while Q is not empty: // runs V times u = Q.dequeue() // O(1) per call for each neighbor v of u: // deg(u) neighbors if not visited[v]: // O(1) check ... // O(1) work Total iterations of inner loop: sum of deg(u) for all u = 2|E| (undirected) = |E| (directed) Total work: V dequeues + O(E) neighbor checks = O(V + E)

Why V + E, not V * E?

Each vertex is dequeued once (not E times). Each edge is checked once from each endpoint (undirected) or once total (directed). The work is distributed across vertices, not repeated.

Concrete Example

Graph: V = 8, E = 9 [A]----[B] / \ \ [C] [D] [E] | | / | [F] [G] [H] Work done per vertex: Dequeue A: check 3 neighbors (B,C,D) Dequeue B: check 2 neighbors (A,E) Dequeue C: check 2 neighbors (A,F) Dequeue D: check 2 neighbors (A,G) Dequeue E: check 3 neighbors (B,G,H) Dequeue F: check 1 neighbor (C) Dequeue G: check 2 neighbors (D,E) Dequeue H: check 1 neighbor (E) -------- Total neighbor checks: 16 = 2*8 (but we have 9 edges, undirected so 2*9 = 18 adj entries) Total: 8 dequeues + 18 checks = O(V+E)

Comparison

RepresentationBFS Time
Adjacency ListO(V + E) -- optimal
Adjacency MatrixO(V2) -- must scan each row
14 / 18

Space Complexity: O(V)

Linear in the number of vertices

Space Breakdown

Data StructureSpacePurpose
QueueO(V)At most V vertices in queue
Visited arrayO(V)One boolean per vertex
Distance arrayO(V)One integer per vertex
Parent arrayO(V)One pointer per vertex
TotalO(V)
Memory layout: visited: [T][T][T][T][T][T][T][T] A B C D E F G H dist: [0][1][1][1][2][2][2][3] A B C D E F G H parent: [-][A][A][A][B][C][D][E] A B C D E F G H Queue (max size during BFS): 3 (when B, C, D were all in queue)

Queue Size Over Time

Queue size during our BFS example: Size 4 | 3 | * * * 2 | * * * 1 | * * * 0 | * +--+--+--+--+--+--+--+--+--> 0 1 2 3 4 5 6 7 8 Step Max queue size = 3 (occurs when processing level 1)

Worst Case Queue Size

In the worst case, the queue can hold O(V) vertices. Imagine a star graph where the center connects to all other V-1 vertices -- after processing the center, all V-1 neighbors are in the queue.

Star graph (worst case for queue): [B] [C] [D] \ | / \ | / [A] / | \ / | \ [E] [F] [G] After dequeuing A: Queue = [B, C, D, E, F, G] (size V-1)
15 / 18

Application: Social Network Distance

"Six degrees of separation"

The Problem

Given a social network graph where vertices are people and edges are friendships, find the shortest friendship chain between two people.

Social Network Graph: [Alice]---[Bob]---[Eve] | | | [Carol] [Dave] [Frank] | | [Grace]-----------[Hank] Q: Shortest path from Alice to Frank? BFS from Alice: Level 0: Alice Level 1: Bob, Carol Level 2: Dave, Eve, Grace Level 3: Frank, Hank Answer: Alice->Bob->Eve->Frank (3 hops)

Six Degrees of Separation

The famous theory states that any two people on Earth are connected by at most 6 friendship links. BFS is how you'd actually verify this -- start from any person and measure BFS distances to everyone else.

Real-World Scale

Facebook (2016 study): - 1.59 billion users (vertices) - Average degree: ~338 - Average distance: 3.57 LinkedIn "degrees of connection": - 1st: direct connection - 2nd: friend of friend - 3rd: 3 hops away All computed using BFS variants!

Key Idea

BFS naturally computes the "degree of separation" between any two people. This is the foundation for features like LinkedIn's "2nd connection" and Facebook's "mutual friends."

16 / 18

Applications: Web Crawling & Maze Solving

BFS is everywhere

Web Crawling

A web crawler uses BFS to systematically discover web pages:

Start: www.example.com Level 0: example.com Level 1: /about /blog /contact Level 2: /blog/post1 /blog/post2 Level 3: /blog/post1/comments ... BFS ensures you discover all pages at distance k before going deeper. This gives a "breadth-first crawl."

Why BFS for Crawling?

BFS finds pages close to the root first, which are usually the most important. DFS might get lost in deep, low-value chains of links.

Maze Solving (Shortest Path)

Maze as a grid graph: S = Start, E = End, # = Wall +---+---+---+---+---+---+ | S | . | . | # | . | . | +---+---+---+---+---+---+ | # | # | . | # | . | . | +---+---+---+---+---+---+ | . | . | . | . | . | # | +---+---+---+---+---+---+ | . | # | # | # | . | . | +---+---+---+---+---+---+ | . | . | . | . | . | E | +---+---+---+---+---+---+ Green = BFS shortest path (9 steps) Each cell is a vertex. Edges connect adjacent non-wall cells. BFS finds the shortest path!

Analogy

Imagine flooding the maze with water from the start. The water expands one cell at a time in all directions. The first time water reaches the exit is the shortest path. That is BFS.

17 / 18

Summary & Cheat Sheet

Everything you need to know about BFS

BFS Pseudocode

BFS(graph, source): for each v in graph.vertices: dist[v] = INF parent[v] = NULL dist[source] = 0 Q = new Queue() Q.enqueue(source) visited[source] = true while Q is not empty: u = Q.dequeue() for each neighbor v of u: if not visited[v]: visited[v] = true dist[v] = dist[u] + 1 parent[v] = u Q.enqueue(v)

Complexity

Adj ListAdj Matrix
TimeO(V + E)O(V2)
SpaceO(V)O(V)

When to Use BFS vs DFS

Use BFS When...Use DFS When...
Shortest path (unweighted)Topological sort
Level-order traversalCycle detection
Nearest neighbor searchPath existence check
Connected componentsConnected components
Web crawling (breadth)Maze generation

Key Takeaways

  • BFS uses a queue (FIFO) to explore level by level
  • It finds shortest paths in unweighted graphs
  • Mark vertices visited when enqueuing, not when dequeuing
  • The BFS tree gives shortest paths via parent pointers
  • Time: O(V + E) -- linear in graph size

Common Pitfalls

  • Forgetting to mark source as visited before the loop
  • Marking visited on dequeue instead of enqueue (causes duplicates)
  • Using BFS on weighted graphs (use Dijkstra instead)
18 / 18