Data Structures
Python's built-in structures and when to reach for each — what it's for, the operations you'll actually use, and the costs/gotchas specific to it. (The cross-cutting "mutating vs returning-new" gotcha is collected at the end of this page.)
List
Ordered, mutable, duplicates allowed — your default sequence (and Python's stack; see below). Membership x in list is O(n) (linear scan), so switch to a set/dict when you do repeated lookups.
nums = [1, 2, 3] # literal
nums = list("abc") # from iterable → ['a','b','c']
nums.append(4) # add to end O(1)
nums.extend([5, 6]) # add iterable to end
nums.pop() # remove + return last O(1)
nums.pop(0) # remove + return at index O(n) — shifts everything
nums.remove(2) # remove first occurrence by value
nums.sort() # in place; sorted(nums) returns a NEW list
nums.reverse() # in place; reversed(nums) returns an iterator
Set
Unordered, mutable, unique elements. Reach for it for O(1) membership and dedup.
s = {1, 2, 3} # literal
s = set([1, 2, 2, 3]) # from iterable → {1, 2, 3}
s = set() # empty — NOT {} (that's an empty dict!)
s.add(4)
s.remove(2) # KeyError if missing
s.discard(2) # silent if missing
x in s # membership — O(1)
s | t # union
s & t # intersection
s - t # difference
s ^ t # symmetric difference
Dict
Key→value, keys unique, preserves insertion order (Py3.7+). O(1) insert / lookup / membership (x in d checks keys). The workhorse for counting, grouping, memoization, and adjacency lists.
d = {"a": 1, "b": 2} # literal
d = {} # empty
d["a"] # KeyError if missing
d.get("z") # None if missing
d.get("z", 0) # custom default
d["c"] = 3 # insert / overwrite
d.pop("a", None) # remove + return, None default
for k, v in d.items(): ... # idiomatic iteration
d.keys(); d.values() # views (iterate-only — not indexable)
# GOTCHAS
{} # empty DICT, not empty set
{[1, 2, 3]} # TypeError — list isn't hashable (keys & set members must be hashable)
{(1, 2, 3)} # OK — tuples are hashable
Hash lookup is O(1) in collection size, but O(K) in key size. Every hit costs hash(key) (O(K)) plus an == check against the stored key (O(K)) to resolve collisions. For int / short-str keys, K is constant and ignored. For tuple / long-string keys (e.g. a matrix row), the K factor is real and shows up in inner loops. Hashing trades a scan-the-collection cost for a hash-the-key cost — a pure win for small keys, conditional for large ones.
Counter & defaultdict
collections upgrades to a plain dict for the two most common patterns — auto-initializing missing keys, and counting.
from collections import defaultdict, Counter
# defaultdict — missing keys auto-create a default value
dd = defaultdict(int) # missing → 0; then dd[k] += 1 just works
dd = defaultdict(list) # missing → []; then dd[k].append(...) just works
dd = defaultdict(set) # missing → set()
dd = defaultdict(lambda: "?") # custom default
# Grouping pattern (very common):
groups = defaultdict(list)
for word in words:
groups[word[0]].append(word) # group by first char
# Graph adjacency list:
graph = defaultdict(list)
for u, v in edges:
graph[u].append(v); graph[v].append(u)
# Counter — built for counting from any iterable
cnt = Counter("mississippi") # Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
cnt = Counter([1, 1, 2, 3, 1]) # Counter({1: 3, 2: 1, 3: 1})
cnt['z'] # 0 — missing keys default to 0, no KeyError
cnt.most_common(2) # [('i', 4), ('s', 4)] — top-k by count, sorted desc
cnt.most_common() # full sorted list
cnt.update("hellohello") # ADDS counts (not replaces)
cnt.subtract("ll") # SUBTRACTS counts
# Counter arithmetic — addition/subtraction/min/max between Counters
Counter("aab") + Counter("abc") # {'a': 3, 'b': 2, 'c': 1}
Counter("aab") - Counter("abc") # {'a': 1} — drops zero/negative
Counter("aab") & Counter("abc") # {'a': 1, 'b': 1} — intersection (min each)
Counter("aab") | Counter("abc") # {'a': 2, 'b': 1, 'c': 1} — union (max each)
# Killer use cases:
Counter("listen") == Counter("silent") # True — anagram check in 1 line
[x for x, _ in Counter(nums).most_common(k)] # top-k frequent elements
Stack
Just use a list — there's no dedicated Stack class. LIFO; all O(1) at the end.
stack = []
stack.append(x) # push O(1)
stack.pop() # pop O(1)
stack[-1] # peek O(1)
while stack: ... # truthy when non-empty
Queue / Deque
collections.deque — double-ended, O(1) at both ends. NEVER use a list as a queue (list.pop(0) is O(n)).
from collections import deque
dq = deque()
dq.append(x); dq.appendleft(x) # push right / left O(1)
dq.pop(); dq.popleft() # pop right / left O(1)
dq[0]; dq[-1] # peek front / back O(1)
# Use for queues (BFS), double-ended access, and sliding-window index buffers.
# queue.Queue is thread-safe — never needed in LC.
Linked list
LeetCode provides the ListNode class — you don't define it.
# Boilerplate (given in problem, do NOT redefine):
# class ListNode:
# def __init__(self, val=0, next=None):
# self.val = val
# self.next = next
# Common idioms:
dummy = ListNode(0, head) # dummy head — simplifies edge cases when head may change
slow = fast = head # two-pointer: middle (LC 876), cycle (LC 141, Floyd's)
# Iterative reverse: 3-pointer (prev, curr, nxt); walk forward updating each.
Doubly linked list + sentinels (the one you build — LC 707, and the backbone of LRU): each node carries prev and next; wrap the list in dummy head/tail nodes wired head <-> tail at init. Then every real node has non-None neighbours, so insert/remove are uniform 4-pointer splices with no boundary null-checks, and removing an arbitrary node given its pointer is O(1) — which is why dict{key: node} + DLL gives O(1) LRU. Gotcha: sentinels kill structural edge cases (empty list, head/tail), not semantic ones — the valid index range differs by op (read/delete [0, n-1] must hit a real node; insert [0, n], where n = "before the tail sentinel" = append).
Heap / priority queue
heapq — min-heap only in Python. Use when you repeatedly need "the smallest (or largest)" as elements arrive. O(log n) push/pop, O(1) peek, O(n) to build from a list.
import heapq
h = [] # heap is just a list
heapq.heappush(h, x) # O(log n)
heapq.heappop(h) # extract min, O(log n)
h[0] # peek min, O(1) — no `heappeek` function
# Build from existing list — O(n), faster than n pushes (which would be O(n log n))
h = [5, 2, 8, 1, 9]
heapq.heapify(h) # in-place
# Combined ops (one sift instead of two)
heapq.heappushpop(h, x) # push then pop — if x is smaller than min, returns x unchanged
heapq.heapreplace(h, x) # pop then push — assumes heap non-empty, evicts current top
# Top-k shortcuts (use the size-k pattern internally)
heapq.nlargest(k, iterable) # → list, descending. O(n log k)
heapq.nsmallest(k, iterable) # → list, ascending. O(n log k)
# MAX-HEAP TRICK — negate values on push and pop (no native max-heap class)
heapq.heappush(h, -val)
-heapq.heappop(h) # the max
# PRIORITY-WITH-PAYLOAD — tuples compare lexicographically
heapq.heappush(h, (priority, item)) # ordered by priority; then by item on tie
# Tie-breaker gotcha: if `item` isn't comparable (custom objects), TypeError on tie.
# Fix: insert a monotonic counter as middle tiebreaker → (priority, counter, item)
# CANONICAL PATTERN — "min-heap of size k" for kth largest / top-k largest
# Invariant: heap holds the k largest seen so far; its min (h[0]) is the kth largest seen.
def kth_largest(nums, k):
h = []
for x in nums:
heapq.heappush(h, x)
if len(h) > k:
heapq.heappop(h) # evict smallest of the k candidates
return h[0] # O(n log k) total
# Mirror pattern for kth smallest: max-heap of size k (negate values).
# nsmallest/nlargest implement these patterns internally; write the loop manually
# in interviews to demonstrate pattern fluency.
Union-Find
Disjoint Set Union — by size, with path compression. Effectively O(1) per find/union (O(α(n)) amortized). Use for connected components, cycle detection in an undirected graph, equivalence classes, online connectivity, and Kruskal's MST.
class UnionFind:
def __init__(self, n):
self.parent = list(range(n)) # each element is its own root initially
self.size = [1] * n # only valid at roots
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry:
return False # already in same set
if self.size[rx] < self.size[ry]:
rx, ry = ry, rx # rx = larger root
self.parent[ry] = rx
self.size[rx] += self.size[ry] # size[ry] left stale (non-root, never read)
return True
# Common queries on top:
# - same component? uf.find(x) == uf.find(y)
# - num components? len({uf.find(i) for i in range(n)})
# - cycle in undirected graph? union(u,v) returns False → edge closes a cycle
Mutating vs returning-new
Cross-cutting gotcha for list/dict/set: these methods mutate in place and return None, so never chain or assign their result.
Mutates in place, returns None | Returns a new value |
|---|---|
lst.sort() | sorted(lst) → new list |
lst.reverse() | reversed(lst) → iterator, or lst[::-1] → new list |
lst.append(x), lst.extend(it), lst.insert(i,x) | lst + [x], lst + list(it) |
lst.pop(), lst.remove(x) | (no direct non-mutating equivalent) |
d[k] = v, d.pop(k), d.update(...) | {**d, k: v} (merges into new dict) |
s.add(x), s.discard(x), s.remove(x) | s | {x}, s - {x} |
Trap: x = lst.sort() sets x = None. Same for s.split().reverse() — .reverse() returns None, blowing up the chain.
Strings
- Strings are immutable —
answer += charin a loop is O(n²) (allocate + copy each iteration). - Build strings with a
listplusappend, then''.join(list)at the end for O(n) total. - CPython sometimes optimizes
+=in place for single-reference strings, but it's not in the spec — assume O(n²) when reasoning.
# SPLIT / JOIN (split returns list[str], join returns str)
s.split() # → list[str]; whitespace split, COLLAPSES runs, strips ends
s.split(",") # explicit separator
s.split(",", 1) # maxsplit — limit number of splits
s.rsplit("-", 1) # split from right
s.splitlines() # → list[str]; split on line boundaries
sep.join(iterable) # → str; inverse of split; iterable must yield strings
"-".join("abc") # 'a-b-c' — string is iterable of chars
# GOTCHA: s.split(' ') (explicit space) does NOT collapse runs; keeps '' between spaces
# GOTCHA: split returns a list, but list.reverse()/list.sort() return None — don't chain
# STRIP / REPLACE
s.strip() # both ends
s.lstrip(); s.rstrip() # one-sided
s.strip("#") # custom chars to strip from ends
s.replace("a", "b") # replace all
s.replace("a", "b", 1) # count limit
# SEARCH
s.startswith("pre") # bool
s.endswith("post") # bool
s.find("sub") # leftmost index, or -1 if missing
s.rfind("sub") # rightmost index
s.index("sub") # like find but raises ValueError on miss
s.count("sub") # non-overlapping occurrences
# CASE & CHARACTER CHECKS
s.lower(); s.upper(); s.title()
s.isalpha(); s.isdigit(); s.isalnum(); s.isspace()
# All return False on empty string — `"".isdigit() == False` is a common edge
# F-STRINGS (interpolation)
f"x={x}, y+1={y+1}" # expressions allowed inside braces
f"{x:.2f}" # 2 decimal places
f"{x:>10}" # right-align, width 10
f"{x:0>4}" # zero-padded width 4 → '0007'
Numbers & operators
/returns float in Python 3 even for int operands — use//for integer division.5 / 2 == 2.5,5 // 2 == 2.- Integer ceiling division (positive ints):
(a + b - 1) // bcomputesceil(a / b)without floats. Intuition: addingb - 1pushes any non-zero remainder over the next multiple ofb. Avoids float precision concerns at scale. For mixed signs use-(-a // b)instead. - Euclidean GCD:
gcd(a, b) = gcd(b, a % b), base casegcd(a, 0) = a. Iterative:while b: a, b = b, a % b. O(log min(a,b)). Stdlib:math.gcd. - Tuple swap
a, b = b, a— RHS evaluated fully into a tuple before assignment, no temp needed. - Booleans are
True/False— capitalized first letter only.TRUE,FALSE,true,falseall raiseNameError. (Same forNone, notNULL/null.) - Chained comparisons:
a < b < cis native Python, evaluates each expression once, equivalent toa < b and b < cbut cleaner and faster (no re-evaluation ofb). - Sentinel infinities:
float('inf')andfloat('-inf')for running min/max trackers.math.infis the same value. Beats picking a "big enough" int. - Modular arithmetic distributes over
+and×(standard for "return answer mod 10⁹+7" problems):(a+b) mod n = ((a mod n) + (b mod n)) mod n; same for multiplication. Apply mod at each step to keep ints bounded. Python's%always returns non-negative for positive modulus, so subtraction is also safe. Exponentiation does NOT distribute — usepow(base, exp, mod)for fast modexp. Division requires modular inverse (e.g.,pow(b, mod - 2, mod)for prime modulus by Fermat's little theorem).
OOP / class design
Design problems (caches, tries, iterators, rate limiters) are graded on class structure. Skeleton + idioms here; full grounding in python_concepts.md.
class TrieNode: # internal helper node
def __init__(self):
self.children = {} # {char: TrieNode} — composition: holds more of its kind
self.is_end = False
class Trie: # public container class
def __init__(self): # constructor — runs automatically on Trie()
self.root = TrieNode() # instance attribute (per-object state)
def insert(self, word): # `self` = the instance; NOT passed at call site
... # trie.insert(w) binds trie → self
# IDIOMS / GOTCHAS
# - set ALL state in __init__; never put mutable state in a class attribute (shared across instances!)
# - match the problem's method names EXACTLY (insert/search/startsWith, not add/find)
# - `_helper` leading underscore = internal/private (convention, not enforced)
# - composition (container holds node objects) covers most design problems
# - instance attr (self.x, per object) vs class attr (Class.x, shared) — default to instance
# - inheritance / ABC (from abc import ABC, abstractmethod; super().__init__()) — for Cache-family problems