The candidate drew boxes on the whiteboard with practiced confidence. Load balancers, application servers, databases, caches, message queues—all the components you'd expect to see in a distributed systems diagram. He explained the data flow clearly, mentioned relevant technologies, and handled follow-up questions competently. The interviewer marked him as a strong hire.
Three months later, the same engineer designed a system so overengineered for its requirements that it took twice as long to build as necessary and introduced complexity that nobody could maintain. The architecture looked impressive on paper but was completely wrong for the problem at hand.
The system design interview had tested whether the candidate knew components—whether he could draw the boxes and name the technologies. It hadn't tested whether he could make good architectural decisions: choosing the right approach for the constraints, understanding trade-offs, and building something appropriate for the actual requirements rather than a generic "scalable" system.
These are different skills. Knowing that you could use Redis for caching doesn't mean you know when you should use Redis for caching. Knowing how to build a microservices architecture doesn't mean you know when a monolith would be the better choice. The gap between component knowledge and architectural judgment is where mediocre system design interviews fail.
At SmithSpektrum, I've helped over 60 companies design their technical interview processes, and system design is consistently where calibration matters most[^1]. Done well, these interviews are the best signal for senior engineering judgment. Done poorly, they're expensive exercises that test memorization rather than thinking.
What System Design Should Actually Test
System design interviews should test judgment, not vocabulary.
The ability to gather requirements matters first. Real architectural work starts with understanding what you're building and why. Engineers who dive into solutions without asking questions will build the wrong thing. In the interview, watch for whether candidates ask clarifying questions before designing: How many users? What are the access patterns? What's more important—latency or throughput? Consistency or availability? What can we afford to get wrong?
Trade-off reasoning is the core skill. Every architectural choice involves trade-offs. Choosing SQL over NoSQL trades flexibility for consistency guarantees. Choosing microservices over monolith trades operational complexity for deployment independence. Strong candidates articulate these trade-offs explicitly—not just "I chose X" but "I chose X over Y because given our constraints of Z, the benefits outweigh the costs." Candidates who present architectures without discussing alternatives are showing memorization, not reasoning.
Practical judgment separates senior engineers from juniors. Given the constraints—time, team size, budget, scale requirements—what's the appropriate level of complexity? Overengineering for theoretical scale is as much a failure as underengineering for real requirements. The candidate who proposes a distributed system for a problem that needs a simple database isn't demonstrating sophistication; they're demonstrating poor judgment.
Communication ability matters because architecture is fundamentally about coordination. The engineer who can't explain their thinking clearly won't be able to align a team on technical direction. Watch for how they structure their explanation, whether they check for understanding, and whether they can adjust their communication based on feedback.
Depth in at least one area distinguishes strong candidates from those with only surface knowledge. The best system design candidates can go deep when probed—explaining exactly how a particular component works, what failure modes it has, how they've dealt with similar problems before. Breadth without depth is memorization; depth demonstrates real understanding.
Designing Good Problems
The problem you choose determines what you can assess.
Good problems are open-ended with multiple valid solutions. If there's one right answer, you're testing whether candidates memorized it, not whether they can reason about architecture. "Design a URL shortener" works because you can build it many ways—the trade-offs around ID generation, storage, caching, and analytics create a rich discussion space.
| Assessment Area | Weight | Strong Signal | Weak Signal |
|---|---|---|---|
| Requirements clarification | 15% | Asks business context, scale, constraints | Jumps to solution |
| High-level design | 25% | Clear components, data flow, APIs | Vague boxes |
| Deep dive capability | 25% | Can go deep on any component | Surface-level only |
| Trade-off discussion | 20% | Articulates pros/cons, makes decisions | "It depends" without reasoning |
| Scaling & reliability | 15% | Considers failure modes, bottlenecks | Only happy path |
Good problems are realistic in scope for a 45-60 minute conversation. "Design Google Search" is too broad—you can't meaningfully discuss the architecture of a trillion-document search system in an hour. "Design a rate limiter" is appropriately scoped—complex enough for interesting discussion, bounded enough to complete.
Good problems don't require specialized domain knowledge. Unless you're hiring for a specific domain, problems should be accessible to any competent engineer. A notification system, a booking system, a feed—these are understandable without specialized expertise.
Good problems have depth opportunities where you can probe further. After the high-level design, you should be able to dive into specific components: "How would this cache work exactly?" "What happens when this service fails?" "Walk me through the data model."
Running the Interview Well
Structure matters. Without structure, system design interviews become unfocused conversations that produce inconsistent signal.
Start with the problem and minimal initial scope. "Design a system that allows users to create shortened URLs and redirect users who visit those URLs." Don't front-load requirements—see if the candidate asks clarifying questions. That's a signal in itself.
The clarifying questions phase should take five to ten minutes. The candidate should be asking about scale (how many URLs created per day? how many redirects?), requirements (do shortened URLs expire? do we need analytics?), and constraints (is there a latency budget? any cost constraints?). If they jump straight into drawing boxes, note that—they may not gather requirements well in real work either.
The high-level design phase should take fifteen to twenty minutes. The candidate should sketch the major components and how they interact. At this stage, you're looking for a coherent architecture that addresses the requirements they gathered, not perfection.
The deep dive phase should take another fifteen to twenty minutes. Choose one or two components and probe deeply. If they proposed a caching layer, ask them to explain exactly how it works—invalidation strategy, eviction policy, what happens on cache miss. If they proposed a database schema, ask about indexes, query patterns, and how it scales. This is where you separate candidates with surface knowledge from those with real understanding.
Extensions and edge cases should take five to ten minutes. What happens when a component fails? What if scale increases 10x? What if requirements change in a particular way? This tests adaptability and reveals whether they've thought through operational realities.
Throughout, your role as interviewer is to be a collaborative partner, not an adversary. Provide information when asked. Offer helpful constraints if they're going down an unproductive path. Ask "why" frequently—the reasoning is more important than the specific choice.
Evaluation That Produces Signal
Evaluation should be structured against explicit criteria, not gut feel.
Requirements gathering should be weighted meaningfully—around 15% of the evaluation. Did they identify key requirements? Did they scope appropriately? Did they recognize ambiguity and seek clarification? A candidate who dives into designing without understanding requirements will build the wrong system in practice.
High-level design should be around 25% of the evaluation. Is the architecture reasonable for the requirements? Does it address the major components needed? Is it appropriately complex—not oversimplified, not overengineered?
Trade-off reasoning should be around 25% of the evaluation. This is the core skill. Can they articulate why they chose their approach? Do they understand alternatives? Can they explain when their approach would be the wrong choice? Candidates who can't discuss trade-offs are showing memorization, not architecture skills.
Technical depth should be around 20% of the evaluation. When you probe, do they have real understanding or just surface knowledge? Can they go deep in at least one area—explaining implementation details, failure modes, and real-world considerations?
Communication should be around 15% of the evaluation. Can they explain complex ideas clearly? Do they structure their thinking logically? Can they adapt based on feedback?
For each dimension, use a consistent rubric. A four-point scale works well: 1 (poor) means significant gaps or red flags, 2 (concerning) means below expectations with missing elements, 3 (good) means meets expectations with reasonable performance, 4 (strong) means exceeds expectations with notable strengths.
What Strong Looks Like
Strong candidates demonstrate patterns you can recognize with practice.
They start with questions, not answers. Before drawing anything, they want to understand what they're building. "Before I start designing, can I ask a few questions about the requirements?"
They state assumptions explicitly. "I'm going to assume this is read-heavy, probably 100:1 read to write ratio. Does that sound reasonable?" This shows they're thinking about constraints and gives you opportunities to adjust if needed.
They explain trade-offs without being prompted. "I'm choosing a SQL database here over NoSQL because we need strong consistency for the payment data, and the access patterns are well-suited to relational queries. If this were more about flexible schemas or horizontal scaling, I'd consider differently."
They acknowledge limits. "I'm not certain about the exact numbers here, but my intuition is..." Strong candidates don't bluff; they're honest about what they know and don't know.
They iterate when they realize issues. "Actually, given what you said about latency, I'd change this to..." The ability to adjust based on new information is crucial.
They can go deep when probed. "Let me walk through exactly how this cache would work. We'd use a write-through strategy with TTL-based expiration..."
Red Flags That Predict Problems
Certain patterns reliably predict poor performance in actual work.
No clarifying questions means they'll build without understanding requirements. If they jump straight into architecture, they may be good at memorized patterns but not at solving real problems.
"The right answer is..." indicates they don't understand trade-offs. There's rarely one right answer in system design—there are choices with different trade-offs for different contexts.
Unnecessary complexity suggests overengineering tendencies. Adding microservices, Kafka, and a distributed cache to a problem that could be solved with a single server and a database is a red flag, not a demonstration of sophistication.
Can't explain why reveals memorization without understanding. If they can describe what but not why, they've learned patterns without understanding when to apply them.
Defensive when probed suggests they may not collaborate well. Senior engineers need to handle challenges to their designs gracefully; defensiveness is a collaboration problem.
Can't go deep anywhere means surface knowledge without real expertise. Breadth is good, but if they can't dive into any area, they may struggle with implementation.
Level Calibration
Expectations should scale with level.
Mid-level engineers should be able to design a reasonable system with guidance, understand the basic components, and articulate some trade-offs when prompted. They may need hints to cover all important areas and may miss edge cases.
Senior engineers should design independently with comprehensive coverage, articulate trade-offs unprompted, demonstrate depth in multiple areas, and handle constraints and extensions gracefully. They should need minimal guidance to produce a solid design.
Staff and principal engineers should show nuanced trade-off reasoning that considers organizational and operational context, think about how systems evolve over time, consider cross-system implications, and demonstrate strategic thinking about technology choices. They should produce designs that are clearly thought through by someone with significant experience.
Calibrate your evaluations to level. Holding a mid-level candidate to staff expectations will fail everyone; giving a pass to a staff candidate who performs at mid-level will make bad hires.
The candidate who drew all the right boxes but designed poorly in practice? The company changed their interview approach. More focus on why, less on what. More probing of trade-offs, more evaluation of reasoning.
The next strong candidate they hired drew a simpler diagram but could explain every decision. When asked why she chose a particular approach, she explained three alternatives and why each would be worse for the given constraints.
Six months later, her designs were in production, handling exactly the load she'd estimated. She'd built what was needed—not what would look impressive in a diagram, but what would actually work.
References
[^1]: SmithSpektrum interview design consulting, system design analysis, 2020-2026. [^2]: Alex Xu, "System Design Interview" (Volume 1 & 2). [^3]: Google re:Work, "Structured Interview Guidance." [^4]: Educative.io, "Grokking System Design Interview," adapted patterns.
Designing system design interviews? Contact SmithSpektrum for interview design and calibration.
Author: Irvan Smith, Founder & Managing Director at SmithSpektrum