Data Structures

Python's built-in structures and when to reach for each — what it's for, the operations you'll actually use, and the costs/gotchas specific to it. (The cross-cutting "mutating vs returning-new" gotcha is collected at the end of this page.)

List

Ordered, mutable, duplicates allowed — your default sequence (and Python's stack; see below). Membership x in list is O(n) (linear scan), so switch to a set/dict when you do repeated lookups.

nums = [1, 2, 3]                 # literal
nums = list("abc")               # from iterable → ['a','b','c']
nums.append(4)                   # add to end                 O(1)
nums.extend([5, 6])              # add iterable to end
nums.pop()                       # remove + return last       O(1)
nums.pop(0)                      # remove + return at index   O(n) — shifts everything
nums.remove(2)                   # remove first occurrence by value
nums.sort()                      # in place; sorted(nums) returns a NEW list
nums.reverse()                   # in place; reversed(nums) returns an iterator

Set

Unordered, mutable, unique elements. Reach for it for O(1) membership and dedup.

s = {1, 2, 3}                    # literal
s = set([1, 2, 2, 3])            # from iterable → {1, 2, 3}
s = set()                        # empty — NOT {} (that's an empty dict!)
s.add(4)
s.remove(2)                      # KeyError if missing
s.discard(2)                     # silent if missing
x in s                           # membership — O(1)
s | t                            # union
s & t                            # intersection
s - t                            # difference
s ^ t                            # symmetric difference

Dict

Key→value, keys unique, preserves insertion order (Py3.7+). O(1) insert / lookup / membership (x in d checks keys). The workhorse for counting, grouping, memoization, and adjacency lists.

d = {"a": 1, "b": 2}             # literal
d = {}                           # empty
d["a"]                           # KeyError if missing
d.get("z")                       # None if missing
d.get("z", 0)                    # custom default
d["c"] = 3                       # insert / overwrite
d.pop("a", None)                 # remove + return, None default
for k, v in d.items(): ...       # idiomatic iteration
d.keys(); d.values()             # views (iterate-only — not indexable)

# GOTCHAS
{}                               # empty DICT, not empty set
{[1, 2, 3]}                      # TypeError — list isn't hashable (keys & set members must be hashable)
{(1, 2, 3)}                      # OK — tuples are hashable

Hash lookup is O(1) in collection size, but O(K) in key size. Every hit costs hash(key) (O(K)) plus an == check against the stored key (O(K)) to resolve collisions. For int / short-str keys, K is constant and ignored. For tuple / long-string keys (e.g. a matrix row), the K factor is real and shows up in inner loops. Hashing trades a scan-the-collection cost for a hash-the-key cost — a pure win for small keys, conditional for large ones.

Counter & defaultdict

collections upgrades to a plain dict for the two most common patterns — auto-initializing missing keys, and counting.

from collections import defaultdict, Counter

# defaultdict — missing keys auto-create a default value
dd = defaultdict(int)            # missing → 0; then dd[k] += 1 just works
dd = defaultdict(list)           # missing → []; then dd[k].append(...) just works
dd = defaultdict(set)            # missing → set()
dd = defaultdict(lambda: "?")    # custom default

# Grouping pattern (very common):
groups = defaultdict(list)
for word in words:
    groups[word[0]].append(word)  # group by first char

# Graph adjacency list:
graph = defaultdict(list)
for u, v in edges:
    graph[u].append(v); graph[v].append(u)

# Counter — built for counting from any iterable
cnt = Counter("mississippi")     # Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
cnt = Counter([1, 1, 2, 3, 1])   # Counter({1: 3, 2: 1, 3: 1})
cnt['z']                         # 0 — missing keys default to 0, no KeyError
cnt.most_common(2)               # [('i', 4), ('s', 4)] — top-k by count, sorted desc
cnt.most_common()                # full sorted list
cnt.update("hellohello")         # ADDS counts (not replaces)
cnt.subtract("ll")               # SUBTRACTS counts

# Counter arithmetic — addition/subtraction/min/max between Counters
Counter("aab") + Counter("abc")  # {'a': 3, 'b': 2, 'c': 1}
Counter("aab") - Counter("abc")  # {'a': 1}        — drops zero/negative
Counter("aab") & Counter("abc")  # {'a': 1, 'b': 1} — intersection (min each)
Counter("aab") | Counter("abc")  # {'a': 2, 'b': 1, 'c': 1} — union (max each)

# Killer use cases:
Counter("listen") == Counter("silent")        # True — anagram check in 1 line
[x for x, _ in Counter(nums).most_common(k)]   # top-k frequent elements

Stack

Just use a list — there's no dedicated Stack class. LIFO; all O(1) at the end.

stack = []
stack.append(x)                  # push   O(1)
stack.pop()                      # pop    O(1)
stack[-1]                        # peek   O(1)
while stack: ...                 # truthy when non-empty

Queue / Deque

collections.deque — double-ended, O(1) at both ends. NEVER use a list as a queue (list.pop(0) is O(n)).

from collections import deque
dq = deque()
dq.append(x); dq.appendleft(x)   # push right / left   O(1)
dq.pop();    dq.popleft()        # pop right / left    O(1)
dq[0]; dq[-1]                    # peek front / back   O(1)
# Use for queues (BFS), double-ended access, and sliding-window index buffers.
# queue.Queue is thread-safe — never needed in LC.

Linked list

LeetCode provides the ListNode class — you don't define it.

# Boilerplate (given in problem, do NOT redefine):
#   class ListNode:
#       def __init__(self, val=0, next=None):
#           self.val = val
#           self.next = next

# Common idioms:
dummy = ListNode(0, head)        # dummy head — simplifies edge cases when head may change
slow = fast = head               # two-pointer: middle (LC 876), cycle (LC 141, Floyd's)
# Iterative reverse: 3-pointer (prev, curr, nxt); walk forward updating each.

Doubly linked list + sentinels (the one you build — LC 707, and the backbone of LRU): each node carries prev and next; wrap the list in dummy head/tail nodes wired head <-> tail at init. Then every real node has non-None neighbours, so insert/remove are uniform 4-pointer splices with no boundary null-checks, and removing an arbitrary node given its pointer is O(1) — which is why dict{key: node} + DLL gives O(1) LRU. Gotcha: sentinels kill structural edge cases (empty list, head/tail), not semantic ones — the valid index range differs by op (read/delete [0, n-1] must hit a real node; insert [0, n], where n = "before the tail sentinel" = append).

Heap / priority queue

heapq — min-heap only in Python. Use when you repeatedly need "the smallest (or largest)" as elements arrive. O(log n) push/pop, O(1) peek, O(n) to build from a list.

import heapq

h = []                              # heap is just a list
heapq.heappush(h, x)                # O(log n)
heapq.heappop(h)                    # extract min, O(log n)
h[0]                                # peek min, O(1) — no `heappeek` function

# Build from existing list — O(n), faster than n pushes (which would be O(n log n))
h = [5, 2, 8, 1, 9]
heapq.heapify(h)                    # in-place

# Combined ops (one sift instead of two)
heapq.heappushpop(h, x)             # push then pop — if x is smaller than min, returns x unchanged
heapq.heapreplace(h, x)             # pop then push — assumes heap non-empty, evicts current top

# Top-k shortcuts (use the size-k pattern internally)
heapq.nlargest(k, iterable)         # → list, descending. O(n log k)
heapq.nsmallest(k, iterable)        # → list, ascending.  O(n log k)

# MAX-HEAP TRICK — negate values on push and pop (no native max-heap class)
heapq.heappush(h, -val)
-heapq.heappop(h)                   # the max

# PRIORITY-WITH-PAYLOAD — tuples compare lexicographically
heapq.heappush(h, (priority, item)) # ordered by priority; then by item on tie
# Tie-breaker gotcha: if `item` isn't comparable (custom objects), TypeError on tie.
# Fix: insert a monotonic counter as middle tiebreaker → (priority, counter, item)

# CANONICAL PATTERN — "min-heap of size k" for kth largest / top-k largest
# Invariant: heap holds the k largest seen so far; its min (h[0]) is the kth largest seen.
def kth_largest(nums, k):
    h = []
    for x in nums:
        heapq.heappush(h, x)
        if len(h) > k:
            heapq.heappop(h)        # evict smallest of the k candidates
    return h[0]                     # O(n log k) total

# Mirror pattern for kth smallest: max-heap of size k (negate values).
# nsmallest/nlargest implement these patterns internally; write the loop manually
# in interviews to demonstrate pattern fluency.

Union-Find

Disjoint Set Union — by size, with path compression. Effectively O(1) per find/union (O(α(n)) amortized). Use for connected components, cycle detection in an undirected graph, equivalence classes, online connectivity, and Kruskal's MST.

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))      # each element is its own root initially
        self.size = [1] * n               # only valid at roots

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])    # path compression
        return self.parent[x]

    def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False                  # already in same set
        if self.size[rx] < self.size[ry]:
            rx, ry = ry, rx               # rx = larger root
        self.parent[ry] = rx
        self.size[rx] += self.size[ry]    # size[ry] left stale (non-root, never read)
        return True

# Common queries on top:
# - same component? uf.find(x) == uf.find(y)
# - num components? len({uf.find(i) for i in range(n)})
# - cycle in undirected graph? union(u,v) returns False → edge closes a cycle

Mutating vs returning-new

Cross-cutting gotcha for list/dict/set: these methods mutate in place and return None, so never chain or assign their result.

Mutates in place, returns `None`	Returns a new value
`lst.sort()`	`sorted(lst)` → new list
`lst.reverse()`	`reversed(lst)` → iterator, or `lst[::-1]` → new list
`lst.append(x)`, `lst.extend(it)`, `lst.insert(i,x)`	`lst + [x]`, `lst + list(it)`
`lst.pop()`, `lst.remove(x)`	(no direct non-mutating equivalent)
`d[k] = v`, `d.pop(k)`, `d.update(...)`	`{**d, k: v}` (merges into new dict)
`s.add(x)`, `s.discard(x)`, `s.remove(x)`	`s \| {x}`, `s - {x}`

Trap: x = lst.sort() sets x = None. Same for s.split().reverse() — .reverse() returns None, blowing up the chain.

Strings

Strings are immutable — answer += char in a loop is O(n²) (allocate + copy each iteration).
Build strings with a list plus append, then ''.join(list) at the end for O(n) total.
CPython sometimes optimizes += in place for single-reference strings, but it's not in the spec — assume O(n²) when reasoning.

# SPLIT / JOIN  (split returns list[str], join returns str)
s.split()                          # → list[str]; whitespace split, COLLAPSES runs, strips ends
s.split(",")                       # explicit separator
s.split(",", 1)                    # maxsplit — limit number of splits
s.rsplit("-", 1)                   # split from right
s.splitlines()                     # → list[str]; split on line boundaries
sep.join(iterable)                 # → str; inverse of split; iterable must yield strings
"-".join("abc")                    # 'a-b-c' — string is iterable of chars
# GOTCHA: s.split(' ') (explicit space) does NOT collapse runs; keeps '' between spaces
# GOTCHA: split returns a list, but list.reverse()/list.sort() return None — don't chain

# STRIP / REPLACE
s.strip()                          # both ends
s.lstrip(); s.rstrip()             # one-sided
s.strip("#")                       # custom chars to strip from ends
s.replace("a", "b")                # replace all
s.replace("a", "b", 1)             # count limit

# SEARCH
s.startswith("pre")                # bool
s.endswith("post")                 # bool
s.find("sub")                      # leftmost index, or -1 if missing
s.rfind("sub")                     # rightmost index
s.index("sub")                     # like find but raises ValueError on miss
s.count("sub")                     # non-overlapping occurrences

# CASE & CHARACTER CHECKS
s.lower(); s.upper(); s.title()
s.isalpha(); s.isdigit(); s.isalnum(); s.isspace()
# All return False on empty string — `"".isdigit() == False` is a common edge

# F-STRINGS (interpolation)
f"x={x}, y+1={y+1}"                # expressions allowed inside braces
f"{x:.2f}"                         # 2 decimal places
f"{x:>10}"                         # right-align, width 10
f"{x:0>4}"                         # zero-padded width 4 → '0007'

Numbers & operators

/ returns float in Python 3 even for int operands — use // for integer division. 5 / 2 == 2.5, 5 // 2 == 2.
Integer ceiling division (positive ints): (a + b - 1) // b computes ceil(a / b) without floats. Intuition: adding b - 1 pushes any non-zero remainder over the next multiple of b. Avoids float precision concerns at scale. For mixed signs use -(-a // b) instead.
Euclidean GCD: gcd(a, b) = gcd(b, a % b), base case gcd(a, 0) = a. Iterative: while b: a, b = b, a % b. O(log min(a,b)). Stdlib: math.gcd.
Tuple swap a, b = b, a — RHS evaluated fully into a tuple before assignment, no temp needed.
Booleans are True / False — capitalized first letter only. TRUE, FALSE, true, false all raise NameError. (Same for None, not NULL / null.)
Chained comparisons: a < b < c is native Python, evaluates each expression once, equivalent to a < b and b < c but cleaner and faster (no re-evaluation of b).
Sentinel infinities: float('inf') and float('-inf') for running min/max trackers. math.inf is the same value. Beats picking a "big enough" int.
Modular arithmetic distributes over + and × (standard for "return answer mod 10⁹+7" problems): (a+b) mod n = ((a mod n) + (b mod n)) mod n; same for multiplication. Apply mod at each step to keep ints bounded. Python's % always returns non-negative for positive modulus, so subtraction is also safe. Exponentiation does NOT distribute — use pow(base, exp, mod) for fast modexp. Division requires modular inverse (e.g., pow(b, mod - 2, mod) for prime modulus by Fermat's little theorem).

OOP / class design

Design problems (caches, tries, iterators, rate limiters) are graded on class structure. Skeleton + idioms here; full grounding in python_concepts.md.

class TrieNode:                  # internal helper node
    def __init__(self):
        self.children = {}       # {char: TrieNode} — composition: holds more of its kind
        self.is_end = False

class Trie:                      # public container class
    def __init__(self):          # constructor — runs automatically on Trie()
        self.root = TrieNode()   # instance attribute (per-object state)

    def insert(self, word):      # `self` = the instance; NOT passed at call site
        ...                      #  trie.insert(w) binds trie → self

# IDIOMS / GOTCHAS
# - set ALL state in __init__; never put mutable state in a class attribute (shared across instances!)
# - match the problem's method names EXACTLY (insert/search/startsWith, not add/find)
# - `_helper` leading underscore = internal/private (convention, not enforced)
# - composition (container holds node objects) covers most design problems
# - instance attr (self.x, per object) vs class attr (Class.x, shared) — default to instance
# - inheritance / ABC (from abc import ABC, abstractmethod; super().__init__()) — for Cache-family problems