What This Guide Is and What It Isn't
This isn't a list of definitions. It's a field guide for anyone preparing for a mid-to-senior software engineering loop where system design is on the agenda.
Every question here appears in real hiring loops. Not as an abstract thought experiment, but as a reliable signal for senior judgment. These questions expose how you scope problems, estimate scale, reason about trade-offs, and anticipate failure points before anyone asks.
The sample answers are intentionally scoped to the first 8-12 minutes of a real interview. They are not exhaustive. They are the opening that invites a deeper conversation. Strong candidates use answers like this as a launchpad, not a script.
What Interviewers Are Actually Scoring
Most candidates assume system design interviews are about drawing the right boxes. They're not. They're about how you navigate ambiguity, make defensible decisions under time pressure, and demonstrate that you understand what it's like to build and operate systems at scale.
Here's how most hiring teams actually score these rounds:
Navigation and scoping. Did you define the problem before solving it? Candidates who draw architecture before asking a single clarifying question lose ground immediately. This is scored in the first two minutes.
Correctness under concurrency. Payments, bookings, flash sales - do you add idempotency and transactional semantics naturally, or do you skip over the hard parts? This is where senior separates from mid-level.
Systems judgment. Can you articulate bottlenecks and trade-offs without being dragged there by follow-up questions? Proactively surfacing failure modes is one of the clearest senior signals in the format.
Operational realism. Do you think about monitoring, rollout strategy, and backpressure? Mid-level candidates design systems. Senior engineers design systems they'd actually want to be on-call for.
The Answer Framework (Works for Every Question)
Strong system design answers follow a consistent structure. The exact time allocation varies, but the sequence is reliable and it's what separates candidates who look prepared from candidates who just know the content.
1. Restate the prompt (~60 sec). Say back what you heard. Catch misunderstandings before you waste 10 minutes solving the wrong problem.
2. Clarify scope and success metrics (~2 min). What's in? What's out? What does 'working' mean - a latency SLO? A consistency requirement? A scale target?
3. Capacity estimates (~2 min). Traffic, storage, bandwidth. Rough numbers make your architecture choices look intentional rather than arbitrary.
4. Core data model and APIs (~3 min). Schema first, APIs second, both before any boxes. Candidates who jump straight to architecture diagrams often get tripped up by a data model they didn't think through.
5. High-level architecture (~5 min). Read path and write path. Five to seven components maximum. Keep it navigable.
6. Deep dive on the riskiest component (~5–10 min). Hot keys, fanout, idempotency, backpressure - pick the thing most likely to fail in production and go deep.
7. Bottlenecks and trade-offs (~3 min). Name what breaks first and explain why you made the trade-offs you did. Don't wait to be asked.
8. Operational plan (~2 min). Monitoring, rollout strategy, cost awareness. Shows you've thought about the system beyond launch day.
9. Wrap-up (~1 min). What would you tackle next with more time? Shows you know where the rough edges are.
Don't memorize answers, instead internalize the structure. A well-structured mediocre answer often beats an unstructured brilliant one in a real hiring loop.
30 System Design Interview Questions & Answers (2026)
Q1. Design a URL Shortener
Difficulty: Mid-level
What it tests: Read-heavy key-value mapping. ID generation, storage choices, caching strategy, collision handling.
Approach: Generate a short code (Base62 of a monotonically increasing ID, or random with collision check). Store {code → long_url} in a KV store. Redirect is cache-first with a 302. Custom aliases are a separate code path with a uniqueness check.
Key components: Load balancer/API gateway, URL service, ID generator, KV datastore, cache layer, analytics pipeline (async)
Scalability: Shard by code prefix or hash range. Cache hot codes in memory - a small cache covers the vast majority of traffic. Multi-region read replicas. Log clicks asynchronously.
Bottlenecks: Hot keys (viral links), ID generator as SPOF, write amplification from click analytics, custom alias conflicts.
Data model:
Url(code PK, long_url, created_at, expires_at, owner_id, is_custom)Click(code, ts, referrer, geo)- append-only
APIs:
POST /urls {long_url, custom?}→{code}GET /{code}→ 302 redirectDELETE /urls/{code}GET /urls/{code}/stats
Numbers: 200M redirects/day ≈ 2.3k RPS avg, 10× at peak. 7-char Base62 = ~3.5T combinations. p95 redirect < 50ms.
Hiring signal: You call out custom alias conflicts, TTL/expiry policy, abuse/spam protection, and cache invalidation without being prompted.
Q2. Design a Social Feed / Newsfeed / Timeline
Difficulty: Senior
What it tests: Fanout trade-offs, feed ranking, celebrity accounts, caching strategy, the canonical systems-thinking test.
Approach: Hybrid feed generation, fanout-on-write for regular users (pre-populate feed cache when someone they follow posts), fanout-on-read for celebrities (pull at query time to avoid writing to millions of feeds). Rank via a scoring service. Paginate with cursor tokens.
Key components: Post service, follow-graph store, fanout workers + queue, feed store, ranking service, cache layer, soft-delete/moderation pipeline
Scalability: Partition by user_id. Run celebrities through a dedicated pipeline. Precompute and cache the first page of every active user's feed.
Bottlenecks: Write fanout storms when a celebrity posts, cache stampede, expensive ranking joins on large follow graphs.
Data model:
Follow(follower_id, followee_id)Post(post_id, author_id, ts, payload_ref)FeedItem(user_id, ts, post_id, score)
APIs:
POST /postsGET /users/{id}/feed?cursor=POST /followDELETE /posts/{id}(soft-delete with tombstone)
Numbers: 50M DAU × 10 feed opens/day = 500M feed reads/day (~5.8k RPS avg). p95 first-page < 200ms.
Hiring signal: You name the fanout trade-off explicitly and propose a hybrid and not one universal architecture.
Q3. Design a Real-Time Chat / Messaging System
Difficulty: Senior
What it tests: Realtime transport, durable delivery, ordering guarantees, offline handling, three separate concerns candidates often collapse.
Approach: WebSocket gateway for live connections. Write messages durably first, always, then fanout via pub/sub to recipient sessions. Push notifications for offline users. Delivery receipts close the loop without polling.
Key components: WebSocket gateway, chat service, conversation store, message store (append-only, time-ordered), pub/sub layer, presence service, push notification workers
Scalability: Shard by conversation_id. Presence state in in-memory store with TTL. Backpressure on bursts.
Bottlenecks: Gateway connection limits per node, fanout for large group chats, offline catch-up query volume on reconnect.
Data model:
Conversation(id, type, members[])Message(conv_id, msg_id, sender_id, ts, body_ref, status)DeviceToken(user_id, platform, token)
APIs:
POST /conversationsPOST /messagesGET /conversations/{id}/messages?cursor=WS /connect
Numbers: 10M DAU × 20 msgs/day = 200M msgs/day (~2.3k/sec). p95 end-to-end delivery < 300ms.
Hiring signal: You bring up idempotency and at-least-once delivery + client-side de-dupe before the interviewer prompts you.
Q4. Design an API Rate Limiter
Difficulty: Mid-level
What it tests: Quota enforcement under concurrency and partial failures.
Approach: Token bucket or sliding window per identity (API key/IP/user). Atomic counters in a fast store. Enforce at the gateway before requests hit downstream services. Policy config must be hot-reloadable.
Cloudflare has written about how they handle rate limiting at their scale — worth reading before your interview.
Key components: Identity extractor, policy store, atomic counter store (Redis), decision engine, observability layer
Scalability: Partition counters by {entity, window_start}. Approximate counting for extreme scale. Define fail-open vs fail-closed behavior explicitly.
Bottlenecks: Hot keys from a single high-volume client, clock skew affecting window boundaries, counter store saturation.
Data model:
Policy(entity, limit, window_seconds, burst)Counter(entity, window_start, count)
APIs:
GET /limits/{key}POST /admin/policies- Internal:
CheckLimit(entity, cost)→{ allow: bool, retry_after: int }
Numbers: Example: 120 req/min per user, burst 20. p95 decision < 5ms at gateway. 100k checks/sec at peak.
Hiring signal: You answer "fail open vs fail closed" with a business-aware trade-off, not a reflexive answer.
Q5. Design Search Autocomplete / Typeahead
Difficulty: Mid-level
What it tests: Sub-100ms reads, prefix ranking, async update pipelines, perceived latency.
Approach: Precompute a prefix index (Trie or inverted index) offline with incremental streaming updates. Serve top results from in-memory cache. Light personalization as a second-pass re-rank.
Key components: Query API, prefix index store, in-memory cache, query logging pipeline, offline batch index builder, streaming incremental updater
Scalability: Cache hot prefixes in memory. Shard by prefix range. Version index snapshots for zero-downtime atomic swaps.
Bottlenecks: Tail latency from cache misses, index rebuild time during swap, personalization joins on the hot path.
Data model:
Prefix(prefix, candidates[], scores[])QueryLog(user_id?, prefix, clicked_result, ts)
APIs:
GET /typeahead?q=&limit=&cursor=POST /events/typeahead_click
Numbers: p95 < 50ms. Each keystroke can generate 5–10 queries per user session. Top prefixes must live in memory, not on disk.
Hiring signal: You proactively add typo tolerance and explain its latency cost and complexity trade-off.
Q6. Design a Ride-Sharing Backend (Uber/Lyft)
Difficulty: Senior
What it tests: Geo indexing, realtime location updates, driver matching, failure-mode reasoning. One of the richest signal sources.
Approach: Separate location ingest from dispatch - different write patterns, different latency requirements. Maintain driver location in a geo-indexed in-memory store with TTL. Trip lifecycle as an explicit state machine.
Key components: Rider/driver app gateways, realtime WebSocket service, location service (geo-index), dispatch/matching engine, trip service, dynamic pricing, payment, notifications
Scalability: Shard by city or geo-cell. Keep location in memory with TTL (stale after 30s = driver invisible). Async ETA computation. Queue dispatch jobs to avoid thundering herds.
Bottlenecks: Hot downtown geo-cells at rush hour, surge pricing computation spikes, dispatch thundering herds on mass cancellations.
Data model:
Driver(id, status, current_cell, last_seen)Trip(id, rider_id, driver_id, state, pickup, dropoff)LocationUpdate(driver_id, ts, lat, lon)
APIs:
POST /trips/requestPOST /drivers/location(frequent heartbeat)POST /trips/{id}/acceptGET /trips/{id}(SSE/WebSocket for live updates)
Numbers: 50k concurrent drivers × 1 update/3s = ~17k location updates/sec. p95 dispatch decision < 1s.
Hiring signal: You describe what happens when GPS data goes stale and how the system fails gracefully.
Q7. Design a Distributed Cache
Difficulty: Senior
What it tests: Eviction policies, consistency models, cache failure modes. Tests whether you truly understand caching vs just using it.
Approach: Consistent-hash the keyspace across cache nodes. Replicate for availability. Choose write policy (write-through vs write-back) by durability requirements. Lease-based stampede protection for hot keys.
Key components: Cache cluster, client library with consistent hash ring, replication manager, eviction engine (LRU/LFU), stampede protection (leases), warmup/rehydration, metrics
Scalability: Add nodes with minimal key migration. Multi-tier: L1 in-process + L2 distributed. Spread hot keys across virtual nodes.
Bottlenecks: Cache stampede (dog-pile on cold miss), hot keys exceeding single-node capacity, node churn during membership changes.
Data model:
CacheEntry(key, value, ttl, version)Lease(key, holder_id, expires_at)- stampede protection
APIs:
GET key/SET key value ttl?/DELETE keyGET_MANY(keys[])- batchSET_IF_VERSION(key, value, expected_version)- optimistic
Numbers: p95 cache GET < 2–5ms. 200 bytes/entry × 50M entries ≈ 10GB raw.
Hiring signal: You talk about invalidation strategy and stale data handling before the interviewer brings it up.
Q8. Design E-Commerce Checkout / Flash Sale
Difficulty: Senior
What it tests: Transaction boundaries, idempotency, handling demand spikes without overselling.
Approach: Split cart, inventory, order, and payment into separate services. Reserve inventory with a short-lived TTL hold before charging. Orchestrate via saga pattern. Async fulfillment but don't block checkout on email sending.
Key components: Checkout API, inventory service, order service, payment service, reservation queue, fraud check service, notification workers
Scalability: Queue order intents to smooth spikes. Throttle per-user. Shard inventory by SKU. Return "sold out" fast, don't queue indefinitely.
Bottlenecks: Hot SKUs during flash sales, payment provider latency, duplicate submissions from double-clicks.
Data model:
Inventory(sku, available, reserved)Reservation(res_id, sku, qty, expires_at)Order(order_id, user_id, state, amount)PaymentAttempt(idempotency_key, status)
APIs:
POST /checkout(with idempotency key in header)POST /reservationsPOST /paymentsGET /orders/{id}
Numbers: Flash sale peak: 200k checkout attempts/min (~3.3k/sec). Reserve step p95 < 50ms.
Hiring signal: You say "exactly-once is a myth, we do at-least-once plus idempotency keys." Anyone who's shipped payments says this naturally.
Q9. Design a Video Streaming Service (YouTube/Netflix)
Difficulty: Senior
What it tests: Storage + CDN + encoding pipeline + control plane vs data plane separation.
Approach: Upload → transcode into multiple bitrates → store segments in object storage. Playback via CDN with segment-based delivery (HLS/DASH). Metadata and search are separate services. Analytics ingest is async.
Key components: Upload service, transcoder worker fleet, object store, CDN, playback API, catalogue DB, recommendations engine, async analytics ingest
Scalability: Cache hot content at CDN edge. Precompute thumbnails. Keep upload path completely separate from playback path.
Bottlenecks: Transcode backlog during upload spikes, origin egress cost if CDN hit rate drops, CDN cache misses on new releases.
Data model:
Video(video_id, owner_id, status, manifest_url, tags)Segment(video_id, bitrate, segment_id, url)ViewEvent(video_id, ts, device, bitrate)- async
APIs:
POST /videos(init upload)PUT /videos/{id}/parts(multipart)GET /videos/{id}/play(returns manifest URL)GET /search?q=
Numbers: 1M uploads/day × 200MB avg × 6 renditions = ~1.2PB/day internal processing.
Hiring signal: You immediately separate control plane (metadata, search) from data plane (segments, CDN) as your first architectural decision.
Q10. Design File Storage & Sync (Cloud Drive)
Difficulty: Senior
What it tests: Metadata vs blob storage, sync semantics, conflict resolution.
Approach: Files as immutable blobs with versioned metadata. Client syncs via delta + cursor (not full re-upload). Sharing via ACL. Content-hash deduplication for storage efficiency.
Key components: Metadata service, upload/download service with chunking, object store, deduplicator, sync engine, change notification service
Scalability: Chunk large files and deduplicate by content hash, major storage savings. Shard metadata by user_id. CDN for downloads.
Bottlenecks: Large folder listings as a latency trap, hot shared folders with many concurrent writers, conflict storms when clients are offline.
Data model:
File(file_id, owner_id, current_version)FileVersion(file_id, version, blob_ref, hash, size, created_at)ACL(resource_id, principal_id, role)ChangeLog(user_id, seq, op)
APIs:
POST /files(init)PUT /files/{id}/chunksGET /files/{id}GET /changes?since=(sync cursor endpoint)POST /share
Numbers: 10M users × 2GB avg = 20PB raw storage. p95 delta sync < 5s after edit.
Hiring signal: You call out "list folder" as a latency trap and propose pagination + caching immediately.
Q11. Design a Pub/Sub or Message Queue
Difficulty: Senior
What it tests: Partitioning, ordering guarantees, consumer groups, durability trade-offs.
Approach: Append-only log per topic partition. Producers write, brokers persist and replicate. Consumers pull with explicit committed offsets. Consumer groups enable horizontal scaling.
Key components: Broker nodes, coordination/metadata store, topic partitions, retention policy, replication manager, consumer group coordinator
Scalability: Partition by message key. Add brokers horizontally. Consumer group rebalancing on membership change. Tiered storage for long retention.
Bottlenecks: Hot partitions from skewed keys, rebalance churn causing consumer lag spikes, disk IO saturation on hot brokers.
Data model:
TopicPartition(id, leader, replicas[])Message(offset, key, value, ts)ConsumerOffset(group, partition, offset)
APIs:
Produce(topic, key, value)Fetch(topic, partition, offset, max_bytes)CommitOffset(group, partition, offset)
Numbers: 100k msgs/sec across cluster. Replication factor 3. p99 produce ACK < 50ms.
Hiring signal: You explain "ordered within a partition, not globally" and know when that matters for the use case.
Q12. Design Maps / Routing / a Proximity Service
Difficulty: Senior
What it tests: Geo indexing, heavy read patterns, separating static precomputed assets from realtime overlays.
Approach: Static map tiles via CDN. POI search via geo-index (Geohash or Quadtree). Routing via precomputed graph + shortest-path indices per region. Realtime traffic as a streaming overlay on the static graph.
Key components: Tile service, POI search service, routing engine, traffic ingest pipeline, caching layer, offline precompute jobs
Scalability: Cache hot tiles at edge. Shard geo-index by region. Keep path precomputes region-scoped to bound size and rebuild time.
Bottlenecks: Route computation spikes for cross-region queries, traffic update fanout, hotspot urban areas.
Data model:
POI(id, lat, lon, category, tags)RoadEdge(edge_id, from, to, length, speed_limit)Traffic(edge_id, ts, speed)
APIs:
GET /tiles/{z}/{x}/{y}GET /search?lat=&lon=&q=GET /route?from=&to=&mode=POST /traffic_updates
Numbers: Map tile p95 < 50ms (CDN). Route p95 < 300ms for common city distances.
Hiring signal: You cleanly separate static precomputed assets from realtime dynamic data without trying to compute everything live.
Q13. Design a Notification System (Push/SMS/Email)
Difficulty: Mid-level
What it tests: Multi-channel fanout, idempotent retry logic, template rendering, user preference enforcement.
Approach: Event → queue → orchestrator evaluates preferences → channel workers handle delivery per-provider → store delivery state with idempotency keys.
Key components: Event ingest, rules/preference store, template service, delivery queue, per-channel workers, provider adapters (APNs, FCM, Twilio, SES), delivery tracking
Scalability: Partition by user_id. Batch sends where providers support it. Provider fallbacks for reliability. Rate-limit outbound per provider.
Bottlenecks: Provider throttling causing backlog, retry storms during outages, duplicate sends from race conditions.
Data model:
Preference(user_id, channel, opt_in)Notification(id, user_id, template_id, payload, status)DeliveryAttempt(id, provider, status, ts)
APIs:
POST /eventsPOST /notifyGET /notifications/{id}POST /preferences
Numbers: 10M DAU × 5 notifs/day = 50M sends/day. Enqueue p95 < 20ms.
Hiring signal: You talk about dedupe keys and the impossibility of "exactly once" delivery like someone who's been paged for duplicate push notifications.
Q14. Design a Web Crawler
Difficulty: Senior
What it tests: Distributed scheduling, per-host politeness, URL deduplication at scale, storage pipeline design.
Approach: URL frontier queue drives fetchers. Fetchers respect per-host rate limits + robots.txt. Parse response, extract links. Deduplicate via Bloom filter. Store content + push to indexer. Prioritize by freshness and link authority.
Key components: URL frontier (priority queue per host), fetcher fleet, HTML parser, Bloom filter deduplicator, content store, politeness scheduler, indexing pipeline
Scalability: Partition frontier by hostname. Backpressure on slow/blocked hosts. Scale fetchers horizontally, bottleneck is network I/O.
Bottlenecks: Duplicate URL explosion without dedup, infinite crawl traps, politeness constraints reducing effective crawl rate.
Data model:
UrlFrontier(host, priority_queue)Seen(url_hash)- Bloom filterPage(url, ts, content_ref, outlinks[])
APIs: Internal: Fetch(url), Enqueue(urls[]), SubmitPage(page) → indexer
Numbers: 1B pages × 50KB compressed = 50TB raw. Bloom filter handles dedup at this scale; a hash set does not.
Hiring signal: You mention per-domain throttling and Bloom filter deduplication early without prompting.

Q15. Design a Search Engine / Search Service
Difficulty: Staff
What it tests: Inverted index construction, query serving, two-tier ranking, the tail-latency trap of multi-shard fanout.
Approach: Crawl/ingest → build inverted index → shard across nodes. Query service: retrieval + ranking + snippets. Two-tier ranking: cheap first-pass retrieval, expensive ML rerank on top set. Cache popular queries.
Key components: Ingest pipeline, index builders, sharded index nodes, query frontend, two-tier ranker, query result cache
Scalability: Shard by term or document range. Replicate all shards. Two-tier ranking keeps per-query cost manageable.
Bottlenecks: Tail latency from multi-shard fanout (you wait for the slowest shard), hot queries bypassing cache, index merge operations taking nodes offline.
Data model:
InvertedIndex(term → postings(doc_id, tf, positions))DocStore(doc_id, title, url, metadata, embedding)
APIs:
GET /search?q=&page=&cursor=- Internal:
QueryShards(term_list),Rank(candidates, features)
Numbers: p95 < 200ms for common queries. Fan out to 50 shards = you live or die on p99 shard latency.
Hiring signal: You say "fanout makes tail latency" unprompted. That's the insight that separates someone who's thought about this from someone who's memorized it.
Q16. Design a Recommendation System
Difficulty: Staff
What it tests: Offline training vs online serving separation, feedback loops, cold-start handling.
Approach: Collect events → feature store → offline model training + embeddings. Online: candidate generation (ANN on embeddings) + lightweight ranking. Cache per-user recs with short TTL. Decouple model updates from serving.
Key components: Event pipeline, feature store, model training infra, embedding store (vector index), online serving layer, A/B experimentation framework, drift monitoring
Scalability: Precompute candidate sets offline. Keep serving layer stateless. Batch model updates. Cold-start fallback: content-based recommendations for new users/items.
Bottlenecks: Feature freshness lag, model drift when retraining is slow, online latency from heavy feature lookups.
Data model:
Event(user_id, item_id, action, ts)Item(item_id, metadata, embedding)UserProfile(user_id, embedding, prefs)RecList(user_id, items[], gen_ts)
APIs:
GET /recommendations?user_id=&context=POST /events
Numbers: p95 serve < 100ms. 20M DAU × refresh every 10 min = 2M lists/min to generate. Requires batch precompute + caching.
Hiring signal: You mention A/B experimentation, content filtering guardrails, and feedback loop risks as first-class concerns.
Q17. Design a CDN
Difficulty: Senior
What it tests: Edge caching architecture, cache invalidation semantics, protecting origin from being overwhelmed.
Approach: Geo-DNS/Anycast routes to nearest POP. Edge caches with LRU + TTL. Cache miss → regional tier or origin pull. Signed URLs for protected content. Structured invalidation system for deployments.
Key components: Edge POPs, request routing layer, per-POP cache storage, origin fetch service, purge/invalidation system, analytics metering
Scalability: Add POPs geographically. Pre-warm caches before anticipated events. Tiered caching: edge → regional → origin. Coalesce origin requests during cold start.
Bottlenecks: Origin overload during cold starts, cache pollution from rarely accessed content, invalidation storms after bad deploys.
Data model:
CacheKey(url, headers_vary)CacheEntry(key, body_ref, ttl, etag)PurgeRequest(key_pattern, ts)
APIs:
PURGE /cdn/*(admin)GET /content(client)- Internal:
FetchFromOrigin
Numbers: Target edge hit ratio > 90% for static assets. p95 edge response < 50ms. Hit ratio drop = origin cost explosion.
Hiring signal: You explain the difference between TTL expiry and active invalidation and the operational cost of "purge everything" during a rollback.
Q18. Design a Payment Gateway / Payment Processing
Difficulty: Staff
What it tests: Reliability, idempotency, immutable audit trails, regulatory constraints.
Approach: Every payment is a state machine. Every write is idempotent. Ledger is append-only and never update existing entries, only add. Downstream effects (receipts, fulfillment) are async and decoupled.
Key components: Payments API, auth, risk/fraud engine, payment orchestration layer, provider adapters, append-only ledger, reconciliation jobs
Scalability: Partition by merchant + time bucket. Queue provider calls. Circuit breakers on external providers. Idempotency layer in front of every state-changing op.
Bottlenecks: External provider latency variance, duplicate charge risk without idempotency keys, reconciliation mismatches at month-end.
Data model:
PaymentIntent(id, amount, currency, status, idempotency_key)LedgerEntry(tx_id, debit, credit, ts)ProviderAttempt(intent_id, provider, status)
APIs:
POST /payments/intentsPOST /payments/{id}/confirmGET /payments/{id}- Webhook receiver (async provider callbacks)
Numbers: p95 confirm < 2s with external providers. Design for retry storms during provider incidents. Ledger writes must be multi-AZ durable.
Hiring signal: You say "ledger first, then views." That's the voice of someone who has actually moved money in production.
Q19. Design Collaborative Document Editing (Google Docs-style)
Difficulty: Staff
What it tests: Concurrency control (OT or CRDT), realtime sync, correctness under conflict.
Approach: Clients send operation patches. Server assigns global order and broadcasts. Resolve conflicts via Operational Transformation or CRDT. Persist op log + periodic snapshots. Recovery = snapshot + replay.
Key components: Realtime WebSocket gateway, document service, operation log store, snapshot store, presence service, ACL layer
Scalability: Shard by doc_id. Keep active document state in memory. Snapshot periodically to bound replay time. Backpressure for large rooms.
Bottlenecks: Hot documents with thousands of editors, op-log growing without bound, reconnect storms causing replay stampede.
Data model:
Doc(doc_id, owner_id, acl, latest_version)Op(doc_id, seq, actor_id, patch, ts)Snapshot(doc_id, version, blob_ref)
APIs:
GET /docs/{id}WS /docs/{id}/connectPOST /docs/{id}/opsPOST /docs/{id}/share
Numbers: 200 collaborators × 2 ops/sec = 400 ops/sec per active doc. p95 broadcast < 200ms to feel live.
Hiring signal: You explain how the system recovers after a server crash (snapshot + op log replay) before the interviewer asks about failure modes.
Q20. Design a Distributed Job / Task Scheduler
Difficulty: Senior
What it tests: Leader election, lease management, retry logic, the "exactly-once" impossibility.
Approach: Scheduler assigns jobs with short-lived leases. Workers heartbeat to renew. On timeout, scheduler reassigns. Retries assume idempotent tasks. Every state transition stored durably.
Key components: Scheduler API, leader-elected scheduler process, job queue, worker fleet, durable state store, dead letter queue (DLQ), monitoring
Scalability: Partition queues by job type or tenant. Autoscale workers by queue depth. Separate short-running from long-running jobs. Rate-limit noisy tenants.
Bottlenecks: Thundering herds when cron boundaries align (all jobs fire at :00), stuck jobs blocking queues, poison-pill jobs crashing workers repeatedly.
Data model:
Job(job_id, type, payload_ref, schedule, state)Lease(job_id, worker_id, expires_at)Attempt(job_id, attempt_no, status)
APIs:
POST /jobsPOST /jobs/{id}/cancel- Worker:
Poll(queue),Ack(job_id)
Numbers: p95 dispatch < 1s. For cron jobs: add ±30s jitter to spread the :00 thundering herd.
Hiring signal: You mention DLQs and poison-pill jobs as a standard operational concern, not an edge case.
Q21. Design a Key-Value Store / Distributed Datastore
Difficulty: Staff
What it tests: Partitioning strategies, replication models, consistency trade-offs, operational realities of distributed storage.
Approach: Hash-partition keys to shards via consistent hashing. Replicate with configurable factor. Quorum reads/writes based on consistency target. Hinted handoff during node failures + anti-entropy repair. LSM tree for write-optimized storage.
Key components: Consistent-hash router, storage nodes, replication manager, compaction engine, membership/health service, client library
Scalability: Add nodes via consistent hashing with minimal key redistribution. Spread hot keys across virtual nodes. Background compaction keeps read performance in check.
Bottlenecks: Hot partitions from skewed key distribution, compaction IO spike impacting reads, rebalance churn during node changes.
Data model:
Key → Value, version, ttlHintedHandoff(queue)Membership(node_id, status, token_range[])
APIs:
GET key/PUT key value ttl?/DELETE key- Admin:
AddNode,RemoveNode
Numbers: In-memory: p95 < 5ms. Disk-backed: p95 < 20–50ms. Replication factor 3. Plan for 3× storage.
Hiring signal: You articulate your chosen consistency model and explain why - without "it depends" as a complete answer.
Q22. Design Authentication & SSO
Difficulty: Senior
What it tests: Security posture, session/token trade-offs, abuse protections, federated identity complexity.
Approach: Central auth service with password + federated IdP (OIDC/SAML). Short-lived access tokens + longer-lived refresh tokens. MFA and risk-based auth policies. Audit every authentication event.
Key components: Identity store, credential verification, token service (JWT), session store, MFA service, audit log, rate limiter on all auth endpoints
Scalability: Cache public keys via JWKS endpoint. Shard user table by user_id. Rate-limit login hard. Geo-aware risk scoring.
Bottlenecks: Credential stuffing attacks driving 10× spikes, token revocation semantics, session store hotspots.
Data model:
User(user_id, email, password_hash, status)Session(session_id, user_id, expires_at)Token(jti, user_id, exp)AuditEvent(actor, event, ts)
APIs:
POST /loginPOST /refreshPOST /logoutGET /.well-known/jwks.jsonPOST /mfa/verify
Numbers: Design for 10× baseline during attacks. p95 auth token check at services < 10ms, must be cache-heavy.
Hiring signal: You walk through token revocation trade-offs (short TTL vs blacklist) with genuine depth.
Q23. Design Ticket Booking / Reservations (Ticketmaster)
Difficulty: Senior
What it tests: High-concurrency correctness, no double-selling. Clean inventory hold model under extreme demand.
Approach: Seat inventory with short-lived holds before purchase. Confirm atomically. Expire holds via TTL. Idempotent payment + booking finalization. Queue and throttle during onsale events.
Key components: Inventory service, hold/lock manager, checkout and payment, burst queue for onsale spikes, notification workers
Scalability: Partition by event_id. Cache seating map as read-only snapshot. Queue writes during onsale peaks, don't let unconstrained writes hammer the inventory DB.
Bottlenecks: Hot events at onsale (hundreds of thousands hit simultaneously), lock contention on popular seats, bot traffic.
Data model:
Event(event_id, venue_id, start_ts)Seat(seat_id, event_id, status)Hold(hold_id, seat_ids[], user_id, expires_at)Booking(booking_id, status)
APIs:
POST /holdsPOST /checkoutPOST /bookings/{id}/confirmGET /events/{id}/seats?cursor=
Numbers: Onsale peak: 1M users in 2 minutes ≈ 8.3k req/sec. Hold TTL: 2–5 minutes. Seat status update p95 < 50ms.
Hiring signal: You mention bots and fairness queue mechanisms unprompted.
Q24. Design Logging / Metrics / Monitoring
Difficulty: Senior
What it tests: Operational maturity. How you think about running systems, not just building them.
Approach: Agents emit logs, metrics, traces → ingest queue → stream processors enrich → time-series DB + log index → query layer + alert manager. Separate hot path (live dashboards) from cold path (historical queries).
Key components: Collectors/agents, ingest gateway, queue, stream processors, time-series DB, log indexing store, query API, alert manager, dashboard layer
Scalability: Sampling for high-volume traces. Enforce cardinality budgets on metric labels. Tiered retention: hot 30d, warm 1y, cold archive.
Bottlenecks: High-cardinality labels exploding storage, bursty log ingestion during incidents, expensive ad-hoc queries during outages.
Data model:
Metric(name, labels{}, ts, value)Log(service, ts, level, message, trace_id)Trace(trace_id, spans[])
APIs:
POST /ingest/metricsPOST /ingest/logsGET /query?expr=(PromQL-like)POST /alerts/rules
Numbers: 10k hosts × 1k metrics/min = 10M points/min (~166k/sec). Alert latency target < 30–60s for SLO-impacting signals.
Hiring signal: You mention SLOs, cardinality budgets, and why high-cardinality labels are a production hazard, not just an optimization.
Q25. Design Trending Topics / Top-K / Leaderboards
Difficulty: Senior
What it tests: Streaming aggregation, approximate counting at scale, time-window semantics.
Approach: Event stream → windowed count per key → maintain top-K per window using a min-heap or streaming sketch. Serve from cache. Backfill from raw logs for accuracy when needed.
Key components: Event ingest, stream processor (windowed aggregation), state store, top-K service, cache, serving API
Scalability: Partition by key. Count-Min Sketch or similar for very high cardinality. Separate hot-keys fast path. Pre-aggregate common windows offline.
Bottlenecks: State explosion for high-cardinality dimensions, hot partitions, late-arriving events skewing windows.
Data model:
Event(key, ts)WindowCount(window_id, key, count)TopK(window_id, items[], computed_at)
APIs:
POST /events/viewGET /trending?window=1h&k=50GET /topk?window=5m
Numbers: 1B events/day (~11.6k/sec avg), peak 10×. p95 query < 50ms from cache.
Hiring signal: You clearly state which parts are exact vs approximate and why approximate is sufficient for trending.
Q26. Design Video Conferencing (Zoom/Google Meet)
Difficulty: Staff
What it tests: Realtime media constraints, SFU vs MCU vs mesh, graceful degradation under network variability.
Approach: Signaling service negotiates sessions + ICE/SDP. Media through SFU for group calls, receives all streams, forwards selectively. TURN for NAT traversal. Recording is a completely separate pipeline.
Key components: Signaling service, auth, SFU cluster, TURN/STUN servers, meeting state service, in-meeting chat (WebSocket), recording service
Scalability: Region-based SFU clusters. Simulcast (multiple bitrate tracks). Adaptive subscription based on active speaker. Hard fallback to audio-only under constrained network.
Bottlenecks: SFU CPU and bandwidth limits, packet loss cascading into frozen video, mass-join storms at meeting start.
Data model:
Meeting(meeting_id, host_id, policy)Participant(meeting_id, user_id, role, joined_at)Stream(stream_id, user_id, bitrate)
APIs:
POST /meetingsPOST /meetings/{id}/join(returns ICE config + SFU endpoint)WS /signalPOST /meetings/{id}/end
Numbers: 10-person call × 1.5 Mbps at 720p. SFU outbound ≈ 15 Mbps per meeting.
Hiring signal: You discuss graceful degradation, bandwidth-adaptive quality, simulcast fallback, audio-only mode, confidently as a first-class design concern.
Q27. Design an API Gateway
Difficulty: Mid-level
What it tests: Cross-cutting concerns - auth, routing, rate limiting, observability in shared infrastructure.
Approach: Single entry point that authenticates, rate-limits, routes to upstream services, transforms requests/responses, emits structured telemetry. HA deployment. Config hot-reloadable without restarts.
Key components: Edge load balancer, stateless gateway cluster, auth integration, policy/config store, rate limiter, request tracing with correlation IDs, metrics exporter
Scalability: Stateless gateways scale horizontally. Config hot-reloads without downtime. Multi-region active-active. Per-endpoint fail-open/closed rules.
Bottlenecks: Centralized config bottleneck, TLS termination CPU cost at high RPS, noisy clients starving others.
Data model:
Route(service, path, methods, timeout_ms, retries)Policy(entity, limits, auth_required)AuditEvent(actor, route, ts, outcome)
APIs:
- Admin:
POST /routes,POST /policies - Data plane: proxies arbitrary service APIs
- Internal:
CheckAuth,CheckLimit
Numbers: Gateway overhead < 5–10ms p95 added latency. Design for 100k RPS with horizontal autoscaling.
Hiring signal: You say "observability" and mean distributed traces with correlation IDs and not just access logs.
Q28. Design Pastebin / Simple Content Posting Service
Difficulty: Mid-level
What it tests: Clean CRUD, blob storage, abuse controls, retention policy, without over-engineering.
Approach: Create paste → store content blob in object store + metadata in DB. Optional TTL + cleanup worker. Public pastes are CDN-cacheable. Abuse mitigation: rate limits + optional spam scanning.
Key components: Paste API, metadata DB, object/blob store, CDN (public pastes), optional spam scanner, TTL cleanup worker
Scalability: Shard by paste_id. Cache hot public pastes at CDN edge. Object store for content keeps metadata DB small.
Bottlenecks: Viral pastes before CDN warms up, very large paste content, abuse traffic overwhelming ingest.
Data model:
Paste(paste_id, owner_id?, created_at, expires_at, content_ref, visibility, size_bytes)AbuseEvent(ip, ts, action)
APIs:
POST /pastesGET /p/{id}DELETE /p/{id}GET /p/{id}/raw
Numbers: 10M pastes/day × 2KB avg = 20GB/day. p95 read < 100ms. Enforce max size limit (e.g., 1MB).
Hiring signal: You add expiry/TTL and a cleanup story without being asked.
Q29. Design a Distributed Lock / Coordination Service
Difficulty: Staff
What it tests: Consensus protocols, lease semantics, coordination failure modes. Chubby and ZooKeeper territory.
Approach: Strongly consistent coordinator using consensus (Raft or Paxos). Clients acquire TTL-based leases with heartbeat renewal. Leader election built on the same primitive. Ephemeral nodes disappear on client disconnect.
Key components: Consensus group (odd number of nodes), elected leader, replicated log, watch/notification service, client library with retry/backoff
Scalability: Keep lock state compact, not a general-purpose data store. Avoid routing high-QPS traffic through it.
Bottlenecks: Leader overload, network partitions triggering false lease expiry, thundering herds on reconnect after partition heals.
Data model:
Lease(lock_id, owner, expires_at)Event(watcher_id, lock_state_change)- Replicated log entries
APIs:
Acquire(lock_id, ttl)Renew(lock_id)Release(lock_id)Watch(lock_id)
Numbers: p95 < 20ms within region. TTL: typically 5–30 seconds.
Hiring signal: You warn against "coordination everywhere" and explain why distributing coordination indiscriminately kills throughput.
Q30. Design Real-Time Analytics / Stream Processing
Difficulty: Staff
What it tests: Connecting ingestion → stream processing → storage → query with correct windowing semantics.
Approach: Collect events → append to message broker → stream processors compute windowed aggregates → write to OLAP/time-series store → serve dashboards with cache. Pre-aggregate common dimensions at minute and hour granularity.
Key components: Client SDK/collector, message broker, stream processing engine (windowed aggregation), state store, OLAP columnar store, query API, dashboard layer
Scalability: Partition by tenant or high-cardinality key. Late event handling via watermarks. Exactly-once-ish via idempotent writes + checkpoints. Pre-aggregate to protect OLAP from unbounded queries.
Bottlenecks: Skewed keys causing hot partitions, state explosion from high-cardinality dimensions, ad-hoc dashboard queries during incidents.
Data model:
Event(user_id, action, ts, attrs{})Aggregate(metric, window, dims{}, value)Checkpoint(processor_id, offsets{})
APIs:
POST /eventsGET /metrics?from=&to=&dims=GET /dashboards/{id}
Numbers: 5B events/day (~58k/sec avg), peak 10×. Dashboard SLO: p95 < 1s. Pre-aggregate by minute and hour.
Hiring signal: You talk about late arrivals, watermarks, and the difference between event time and processing time, like you've done it, not just read about it.
8 Concepts That Show Up Across Almost Every Question
If you're short on prep time, these eight concepts recur so consistently across the 30 questions that mastering them gives you leverage across the entire question set.
Consistent hashing.
How to partition data across nodes with minimal redistribution when membership changes. Shows up in distributed caches, KV stores, CDNs. Know why it beats modulo hashing.
Idempotency keys.
The answer to 'what happens on retry?' in payments, notifications, checkout, and any state-changing operation. Same key twice, one effect. This concept comes up in at least eight of the thirty questions.
Fanout strategies.
Fanout-on-write (precompute and push) vs fanout-on-read (compute at query time). Trade write amplification against read latency. The right choice depends entirely on the follow-graph distribution - neither is universally correct.
Append-only logs.
Messages in a queue, events in analytics, operations in collaborative editing, entries in a payment ledger. Immutable appends are easier to replicate, reason about, and audit than mutable updates.
Cursor-based pagination.
Offset pagination breaks under concurrent writes. A cursor (opaque token encoding position) is stable, scalable, and what you should reach for in feeds, listings, and change-log APIs.
Reserve + confirm (two-phase operations).
Inventory holds, seat locks, payment intents. Reserve with a short TTL, then confirm. This is how you prevent overselling without distributed transactions. Shows up directly in at least five questions.
Shard by the access pattern.
Shard messages by conversation_id because you read conversations, not user histories. Shard feed items by user_id because you read one user's feed. The access pattern determines the shard key, not the entity.
Observability as a design constraint.
The strongest candidates treat monitoring, tracing, and alerting as architectural concerns, not afterthoughts. Mention key metrics, SLOs, and what you'd alert on for each critical path. It signals you've been on-call.
System Design Glossary Quick Brush-Up
Scan this in two minutes before your interview. These are the terms candidates most often stumble on.
Fundamentals
Trade-offs
Storage
Caching
Distributed Systems
Traffic & Load
Messaging
Observability
Resilience
Frequently Asked Questions About System Design Interviews
What are system design interview questions actually testing?
They are not just testing whether you can draw boxes on a whiteboard. Most interviewers are scoring how you scope the problem, estimate scale, reason through trade-offs, handle concurrency, and think about operational realities like monitoring, rollout, and failure modes.
How should I structure a strong system design interview answer?
A strong answer usually follows a repeatable order: restate the prompt, clarify scope and success metrics, estimate traffic and storage, define the data model and APIs, outline the high-level architecture, go deep on the riskiest component, explain trade-offs, then close with bottlenecks and monitoring.
Should I memorize system design interview answers?
No. Memorizing polished answers usually hurts more than it helps. Strong candidates internalize a structure they can reuse under pressure, then adapt it to the exact problem in front of them.
What are the most important concepts to know before a system design interview?
If time is short, focus on concepts that recur across many questions: consistent hashing, idempotency keys, fanout strategies, append-only logs, cursor-based pagination, reserve-and-confirm flows, sharding based on access pattern, and observability.
How long should a sample system design answer be in an interview?
A strong opening answer is often scoped to the first 8 to 12 minutes. That is usually enough time to show structure, judgment, and trade-off awareness before the interviewer starts pushing deeper into one part of the design.
What mistakes cause candidates to lose points in system design rounds?
The most common mistakes are skipping clarifying questions, drawing architecture too early, ignoring capacity estimates, hand-waving over concurrency issues, over-engineering simple systems, and failing to mention what breaks first in production.
What is a good way to practice system design interview questions?
Pick a small set of common questions and practice giving structured 10-minute answers out loud. Time pressure matters. Mock interviews are especially useful because they force you to communicate trade-offs clearly instead of just thinking through them privately.
Which system design interview questions come up most often?
Common recurring questions include designing a URL shortener, social feed, chat system, rate limiter, autocomplete, ride-sharing backend, distributed cache, checkout flow, video streaming service, pub/sub system, search engine, payment system, and ticket booking platform.

