Iterative DFS: Mark-on-Pop vs. Mark-on-Push — A Comparative Analysis
Introduction
Depth-First Search (DFS) is one of the most fundamental graph traversal algorithms in computer science. While its recursive form is elegant and intuitive, practical applications often demand an iterative version using an explicit stack to avoid stack overflow on large inputs. However, there is a subtle but important design decision in iterative DFS that is rarely discussed in textbooks: when should a node be marked as visited — when it is pushed onto the stack, or when it is popped off?
This essay presents two iterative DFS strategies, analyzes their behavioral and performance differences, and reports benchmark results that reveal a surprising finding: the optimal strategy depends on the cost of the visited-set operations, not on the algorithm's asymptotic complexity.
The Two Strategies
Strategy V1: Mark on Pop
In this approach, a node is pushed onto the stack without being marked as visited. The visited check and marking happen only when the node is popped:
stack.push(start);
while (!stack.empty()) {
int node = stack.pop();
if (visited[node]) continue; // check on pop
visited[node] = true; // mark on pop
for (int neighbor : adj[node]) {
if (!visited[neighbor]) {
stack.push(neighbor); // push without marking
}
}
}
Characteristics:
- A node may be pushed onto the stack multiple times by different neighbors before it is first popped and marked visited.
- When a duplicate is eventually popped, the visited check causes it to be skipped, but the pop itself still costs time.
- The stack can grow larger than the number of unvisited nodes due to duplicates.
Strategy V2: Mark on Push
In this approach, a node is marked as visited immediately before it is pushed onto the stack. Since each node is marked before pushing, no node can be pushed twice, so no duplicate check is needed on pop:
visited[start] = true; // mark before push
stack.push(start);
while (!stack.empty()) {
int node = stack.pop();
// no check needed — each node is pushed exactly once
for (int neighbor : adj[node]) {
if (!visited[neighbor]) {
visited[neighbor] = true; // mark before push
stack.push(neighbor);
}
}
}
Characteristics:
- Each node appears on the stack at most once.
- No wasted pops or redundant neighbor scans.
- The stack size is bounded by the number of unvisited nodes.
- When a node is popped from the stack, it can be processed unconditionally.
Behavioral Differences
Duplicate Pushes in V1
Consider a simple diamond graph:
A
/ \\
B C
\\ /
D
Starting DFS from A (stack is LIFO, so the last pushed node is popped first):
- Push A. Stack: [A]
- Pop A, mark visited. Push B, C. Stack: [B, C]
- Pop C (top), mark visited. C's neighbor D is unvisited, push D. Stack: [B, D]
- Pop D (top), mark visited. D's neighbor B is unvisited, push B. Stack: [B, B]
- Pop B (top), mark visited. All neighbors visited. Stack: [B]
- Pop B (bottom), skip — already visited.
In V1, node B is pushed twice and popped twice. The second pop is wasted work: B is already visited, but we still had to pop it and check. In V2, B would have been marked visited when first pushed (by A), so D would see B as visited and never push it again.
Redundant Neighbor Scanning
There is a worse variant of V1 found in some textbook implementations (notably the GeeksforGeeks iterative DFS example) where the neighbor-scanning loop runs unconditionally even for already-visited nodes:
s = stack.pop();
if (!visited[s]) {
visited[s] = true;
}
// Always scans neighbors, even if s was already visited!
for (auto neighbor : adj[s]) {
if (!visited[neighbor])
stack.push(neighbor);
}
This version not only pops duplicate nodes but also re-scans their entire adjacency list each time, compounding the wasted work.
Impact on Stack Size
In dense graphs, V1 can accumulate significant stack bloat. In the worst case (a complete graph K_n), a single node can be pushed by every other node that discovers it, leading to O(V^2) stack entries instead of O(V). V2 guarantees the stack never exceeds O(V) entries.
For grid-based problems (like Number of Islands on a 1000x1000 grid), this difference is meaningful: a cell with 4 neighbors can be pushed up to 4 times in V1 before any of those copies are popped.
Asymptotic Complexity
Despite the behavioral differences, both strategies have the same asymptotic time complexity: O(V + E).
In V1, each edge causes at most one push (the if (!visited[neighbor]) guard prevents pushing already-visited nodes). The duplicate pushes come from multiple edges targeting the same unvisited node before it is first popped, but this is bounded by the node's in-degree. Summing over all nodes, the total number of pushes is O(E), and each node is processed (visited check + neighbor scan) at most once after its first pop. The extra duplicate pops are bounded by O(E) as well.
In V2, each node is pushed and popped exactly once, and each edge is examined exactly once during neighbor scanning, giving a clean O(V + E).
So while the asymptotic complexity is identical, the constant factors differ, and this is where our benchmarks become interesting.
Benchmark Setup
We benchmarked both strategies on two different problem types:
Test 1: General Graph with Hash-Set Visited
Using our dfs_basic.cpp template with std::unordered_set<int> for the visited set:
- Graph: 100,000 nodes, ~500,000 edges (chain backbone + random edges)
- Visited operations:
unordered_set::find()andunordered_set::insert()— both amortized O(1) but with significant constant overhead (hashing, bucket lookup, memory allocation for new entries) - 5 runs averaged, compiled with
O2
Test 2: Grid Problem with Array Visited
Using our num_islands.cpp template where the grid itself serves as the visited array (grid[i][j] = '0' to mark visited):
- Grid: 1000 x 1000 (1,000,000 cells), 60% land
- Visited operations: direct array read (
grid[i][j] != '1') and write (grid[i][j] = '0') — true O(1) with minimal constant overhead - 10 runs averaged, compiled with
O2
Benchmark Results
Test 1: General Graph (Hash-Set Visited)
V1 (check/mark on pop): 51.5 ms avg
V2 (check/mark on push): 68.1 ms avg
V2 speedup: 0.76x (V1 wins by ~24%)
V1 wins. This was unexpected.
Test 2: Grid Problem (Array Visited)
V1 (check/mark on pop): 29.5 ms avg
V2 (check/mark on push): 26.9 ms avg
V2 speedup: 1.10x (V2 wins by ~10%)
V2 wins. This matched our expectations.
Analysis: Why the Winner Depends on the Visited-Set Cost
The key insight is that V1 and V2 distribute their visited-set operations differently:
| Operation | V1 | V2 |
|---|---|---|
visited.find() per neighbor | 1 (before push) | 1 (before push) |
visited.insert() per node | 1 (on pop) | 1 (on push) |
Extra visited.find() on pop | 1 per pop (incl. dupes) | 0 (no check needed on pop) |
Extra visited.insert() calls | 0 | 0 |
At first glance, V2 should be faster: it avoids duplicate pops and their associated visited checks. Both strategies call insert exactly V times and find at least E times (once per edge during neighbor scanning). The difference is not in the total count of operations but in where they happen in the code.
- V1 calls
insertin the outer loop (once per pop, only for newly visited nodes). The neighbor-scanning inner loop only callsfind. This means the hot inner loop — which iterates over adjacency lists — does lightweight read-only hash lookups without modifying the hash set. The adjacency data stays cache-hot during scanning. - V2 calls
insertin the inner loop (for each newly discovered neighbor). This means the hot neighbor-scanning loop alternates between reading adjacency data and modifying the hash set. Eachinsertmay allocate memory and potentially trigger a rehash, evicting adjacency data from the CPU cache.
The extra work V1 does — popping duplicates and checking find for already-visited nodes — costs only a cheap hash lookup per duplicate (fast path: the bucket exists and matches immediately). This overhead is less than the cache disruption V2 causes by interleaving insert calls with adjacency iteration.
With a hash set, the cache and allocation overhead of insert inside the inner loop dominates, and V1's approach of deferring insert to the outer loop wins.
With an array/grid, both find (array read) and insert (array write) are trivially cheap — a single memory access with no allocation or rehashing. The cache disruption disappears because array writes are in-place. The dominant cost becomes the redundant work of popping duplicates and re-scanning their neighbors, which V2 avoids entirely. Hence V2 wins.
What the Literature Says
This performance trade-off is largely undiscussed in existing literature. Most discussions focus on correctness and memory:
- A notable blog post, "Why what you have been taught about DFS is wrong (at least partially)" (banterly.net, 2020), argues that the common V1 approach is misleadingly taught because it causes stack bloat that doesn't match recursive DFS's memory behavior. The author focuses on memory, not runtime.
- The GeeksforGeeks iterative DFS tutorial uses V1 but doesn't discuss V2 as an alternative.
- A GitHub issue on the
graphologylibrary (#481) flagged the library's DFS as incorrect due to mark-on-push vs. mark-on-pop behavior affecting traversal order. - TechNetExperts provides a comparison focused on correctness: V1 is recommended for pathfinding and backtracking; V2 is recommended for simple reachability and memory efficiency.
None of these sources benchmark the two approaches or identify the visited-set cost as the determining factor.
Practical Recommendations
Based on our analysis and benchmarks:
- For grid/matrix problems (flood fill, island counting, maze solving): Use V2 (mark on push). The visited check is a trivial array access, and V2 avoids all duplicate work. This is the approach used in our
num_islands.cpp. - For general graphs with hash-based visited sets: Use V1 (mark on pop). The hash set's
insertcost makes V2's eager marking more expensive than V1's lazy duplicate handling. Alternatively, if node IDs are contiguous integers, use avector<bool>instead ofunordered_setto get array-like performance, which would likely flip the result in V2's favor. - For backtracking problems (permutations, N-queens): Neither V1 nor V2 applies directly, since backtracking requires temporary marking (mark on enter, unmark on backtrack). Use recursive DFS or an iterative approach with explicit state management.
- When stack memory is a concern: Use V2 (mark on push). It guarantees no duplicates on the stack, bounding stack size to O(V) in all cases.
Conclusion
The choice between mark-on-pop and mark-on-push in iterative DFS is not merely a stylistic preference — it has measurable performance implications that depend on the data structure used for the visited set. Our benchmarks demonstrate a ~24% advantage for mark-on-pop with hash sets and a ~10% advantage for mark-on-push with array-based visited tracking.
This finding highlights a general principle in algorithm engineering: asymptotic complexity tells you the shape of the curve, but constant factors — driven by data structure choice, cache behavior, and branch prediction — determine where you actually land on it. When optimizing DFS for a specific problem, the right strategy depends not just on the graph, but on how you represent the "visited" state.