O(1) Average Lookup
CS205 Data Structures · Use arrow keys to navigate
We want O(1) for get, put, and remove. Arrays give O(1) by index — but what if keys are strings?
| Structure | get | put |
|---|---|---|
| Unsorted Array | O(n) | O(1) |
| Sorted Array | O(log n) | O(n) |
| Linked List | O(n) | O(1) |
| BST (balanced) | O(log n) | O(log n) |
| Hash Table | O(1)* | O(1)* |
* average case
Convert any key into an array index using a hash function, then store the value at that index. Direct access — no searching!
Two steps: hash code → compression → array index
You hand your coat (key-value pair) to the attendant. They give you a ticket number (the hash). When you return, they go directly to the right hook — no searching through all coats.
Step 1 — Hash Code: Turn the key into a (large) integer.
Step 2 — Compression: Squeeze that integer into range [0, N-1] using mod N.
h("alice") = 42 now and forever.A hash function mapping everything to index 0 turns the table into a linked list — O(n) for everything!
index = hashCode % N
Simple, fast. Works best when N is prime.
index = ((a·h(k) + b) mod p) mod N
Where p is prime > N, a > 0. Better distribution — "scrambles" the hash code before compressing.
"abc" and "cba" get different hashesString.hashCode() uses x = 31If a.equals(b) is true, then a.hashCode() == b.hashCode() must be true. The reverse is not required (collisions are OK).
Interactive — try different hash codes and table sizes
If N=10 and keys are multiples of 5: {5,10,15,20,25,...} all map to indices {0,5} — only 2 of 10 buckets used! A prime N minimizes such patterns.
((a·h(k) + b) mod p) mod N
The multiply-add step "scrambles" hash codes before compression, producing better distribution.
A collision occurs when two different keys map to the same index.
If you have more keys than array slots, at least two keys must share a slot. Even with fewer keys, collisions are very likely (Birthday Paradox: 23 people → 50% chance two share a birthday).
1. Separate Chaining
Store a linked list at each bucket
2. Open Addressing
Find another open slot in the array
Each bucket holds a linked list of all entries that hash to that index.
The array itself never "fills up." Each bucket can hold unlimited entries. The tradeoff: long chains degrade to O(n) search within that chain.
α = n/N can exceed 1.0 with chaining. Rehash when α > 0.75.
N = 7, h(k) = k mod 7. Insert: 10, 22, 31, 4, 15
All entries live directly in the array. If the target slot is taken, try the next slot.
Occupied slots form long contiguous runs. New keys hashing anywhere in the cluster must probe to its end, making it even longer.
Simple and cache-friendly, but clustering degrades performance as the table fills. Keep α ≤ 0.5 for best results.
N = 7, h(k) = k mod 7. Insert: 10, 22, 31, 4, 15
Probe at increasing squared offsets to break up clusters.
Quadratic probing jumps farther each time, breaking primary clusters. But keys with the same hash still follow the same probe sequence (secondary clustering).
May not visit all slots. Guaranteed to work when N is prime and table is less than half full.
A second hash function determines the step size — each key gets a unique probe sequence.
| Method | Primary | Secondary |
|---|---|---|
| Linear | ✗ Yes | ✗ Yes |
| Quadratic | ✓ No | ✗ Yes |
| Double Hash | ✓ No | ✓ No |
Keys 20 & 31 both hash to index 9, but step sizes differ (1 vs 4) — completely different probe paths!
You cannot simply empty a slot — it breaks the probe chain for other keys!
A DELETED marker means: "a key was here; keep probing."
Naively emptying a slot creates a "gap" that stops probe chains short, making existing keys unfindable.
Insert 10, 31 (both hash to 3). Delete 10. Search for 31 — the tombstone keeps the chain alive!
The load factor α = n/N controls how full the table is — and how fast it runs.
α = n / N
n = entries stored, N = table size
| α | Successful | Unsuccessful |
|---|---|---|
| 0.25 | 1.17 | 1.39 |
| 0.50 | 1.50 | 2.50 |
| 0.75 | 2.50 | 8.50 |
| 0.90 | 5.50 | 50.50 |
When α exceeds the threshold, grow the table and reinsert everything.
Rehashing costs O(n) for that one operation, but happens rarely. Amortized cost per insert remains O(1) — same idea as dynamic array doubling.
You must recompute all indices because N changed. Old indices are no longer valid. You cannot just copy entries to the same slots!
| Op | Chaining | Open Addr. |
|---|---|---|
| put | O(1) | O(1) |
| get | O(1) | O(1) |
| remove | O(1) | O(1) |
| Op | Chaining | Open Addr. |
|---|---|---|
| put | O(n) | O(n) |
| get | O(n) | O(n) |
| remove | O(n) | O(n) |
A good hash function assigns students to exam rooms evenly. A bad one puts everyone in Room 1 and leaves Rooms 2–10 empty.
Linear probing, N = 7, h(k) = k mod 7. Trace these operations and fill in the final table:
(n-1) & hash (bitwise AND = fast mod)Java 8+ treeifies long chains, guaranteeing O(log n) worst case per bucket — protection against hash-flooding attacks.
This linear probing search has a bug. Can you spot it?
For each scenario, choose the best collision handling approach.
Limited RAM, cache-friendly access is crucial.
Session cache where entries expire/get removed often.
Keys may cluster; must avoid both primary & secondary clustering.
More entries than buckets is acceptable.
A hash table is a filing cabinet. The hash function labels which drawer to open. When two files share a drawer (collision), stack them (chaining) or find the next empty drawer (open addressing). When it's too full, buy a bigger cabinet and refile everything (rehash).
What is the average time for get() in a well-designed hash table?
A hash table has 12 entries and 16 buckets. What is α?
Why can't you simply empty a slot when deleting in open addressing?
Quadratic probing, N = 11. Table has keys at slots: [1]=22, [3]=36, [4]=47, [5]=58.
Where does insert(69) land? (h(69) = 69 mod 11 = 3)