SmithSpektrum Blog

The engineering director had been trying to hire a DevOps engineer for six months. She'd reviewed over two hundred resumes. She'd interviewed maybe thirty candidates. None had worked out.

"Everyone either wants to just write Terraform all day or has never actually operated production systems," she said. "Where are the people who can do both?"

She'd discovered what many companies learn the hard way: "DevOps Engineer" means wildly different things to different people. To some candidates, it means CI/CD pipeline configuration. To others, it means Kubernetes administration. To still others, it's essentially sysadmin work with a fancier title. The term has become so overloaded that posting a job for "DevOps Engineer" without careful definition attracts a grab bag of candidates who share a job title but little else.

The solution isn't finding a better title—though title precision helps. The solution is defining exactly what you need, then finding candidates who match that specific profile. At SmithSpektrum, after placing over 150 DevOps, SRE, and Platform Engineers, I've learned that successful hiring in this space starts with ruthless specificity about the role[^1].

The Role Confusion Problem

Let me be direct about the taxonomy, though the boundaries blur in practice.

A DevOps Engineer, in the original sense, focused on bridging development and operations: building CI/CD systems, automating deployment, improving developer experience. The emphasis was on tooling and process improvement.

A Site Reliability Engineer, in Google's model that many companies have adopted, is fundamentally a software engineer who happens to work on reliability: building systems to reduce toil, writing software to improve operations, applying engineering rigor to operational problems. The SRE title carries an expectation of stronger software development skills.

A Platform Engineer builds internal platforms: the self-service infrastructure that allows product teams to deploy, monitor, and operate their applications without waiting for a central team. It's DevOps scaled up—instead of helping teams one by one, you're building systems that help all teams at once.

An Infrastructure Engineer focuses on the underlying cloud infrastructure: networking, compute, storage, security configurations. It's closer to traditional systems work, modernized for cloud.

These distinctions matter because they imply different skills, different backgrounds, and different work. An excellent Platform Engineer might be a mediocre incident responder. A great SRE might hate building CI/CD pipelines. Hiring generic "DevOps" without understanding which flavor you need produces mismatches.

Defining What You Actually Need

Before posting a job, answer some questions honestly.

What problems are you solving? If your pain is slow deployments and manual processes, you need someone who can build automation and improve developer workflows—closer to traditional DevOps. If your pain is production incidents and reliability issues, you need someone who can operate systems under pressure and improve their resilience—closer to SRE. If developers are blocked waiting for infrastructure and you want to enable self-service, you need a Platform Engineer.

DevOps Focus Area	Key Skills	Interview Assessment	Common Title
Infrastructure	Terraform, CloudFormation, Kubernetes	Infra design exercise	Platform Engineer
CI/CD	Jenkins, GitHub Actions, ArgoCD	Pipeline design	Build Engineer
Reliability	Monitoring, incident response, SLOs	On-call scenario	SRE
Security	Secrets management, compliance, scanning	Security review	DevSecOps
Automation	Python/Go scripting, custom tooling	Automation problem	Tools Engineer

What's the balance of building versus operating? Some roles are 80% building new infrastructure and tooling, 20% handling incidents and operational issues. Others are the reverse. The people who want to build all day and the people who thrive on incident response are often different people. Be honest about your ratio.

What's the on-call expectation? This is often the dealbreaker in DevOps hiring. Candidates will ask, and you need a real answer. How often? How severe? What support exists? Some excellent candidates will refuse any on-call; others expect it and want to know it's taken seriously.

What's the tech stack? Not a laundry list, but the reality. If you're an AWS shop running Kubernetes with Terraform, say so. If you're a GCP shop running Cloud Run with Pulumi, say so. Candidates have preferences and expertise; let them self-select.

Skills That Matter

The core technical skills for DevOps-adjacent roles are reasonably consistent, though emphasis varies.

Linux fundamentals are essential and non-negotiable. Process management, file systems, networking, troubleshooting—the ability to get into a system and understand what's happening is foundational. Someone who can only operate through abstractions will hit walls when those abstractions fail.

Networking basics matter more than people expect. DNS, load balancing, TCP/IP, debugging connectivity issues—these come up constantly. The candidate who can't reason about why traffic isn't reaching a service will struggle.

Cloud platform knowledge is expected: AWS, GCP, or Azure, depending on your environment. Not just surface-level familiarity, but understanding of the services, their trade-offs, their failure modes. Multi-cloud experience is increasingly valuable as companies avoid lock-in.

Infrastructure as Code has become table stakes. Terraform is the most common, but the skill is more about the approach—declarative infrastructure, state management, modules and abstraction—than the specific tool. Someone skilled in Terraform can learn Pulumi.

CI/CD systems knowledge includes both understanding the patterns (deployment strategies, pipeline stages, environment management) and experience with specific tools. The patterns transfer; the tools are learnable.

Containers and Kubernetes have gone from nice-to-have to expected for most roles. The depth required varies—operating a large Kubernetes cluster is different from using containers in a managed service—but the concepts are essential.

Monitoring and observability span from basic (what to monitor, how to alert) to advanced (tracing, SLOs, observability platform design). The right depth depends on the role.

Beyond technical skills, DevOps roles require collaboration and communication that many technical roles don't. These engineers work across teams constantly: with developers who need deployment help, with product people who need to understand infrastructure constraints, with leadership who needs to understand costs and reliability trade-offs. Someone who can't communicate across boundaries will struggle regardless of technical skill.

Compensation Reality

DevOps engineers command strong compensation, driven by demand that exceeds supply.

In major US tech hubs, junior DevOps engineers (zero to two years) typically earn base salaries from $110K to $140K, with total compensation reaching $130K to $170K. Mid-level (two to five years) runs $145K to $180K base and $175K to $240K total. Senior (five to eight years) earns $175K to $220K base and $240K to $330K total. Staff level pushes $210K to $260K base with total compensation from $310K to $430K.

Specializations carry premiums. Deep Kubernetes expertise adds 10-15%. Security focus (DevSecOps, the awkward portmanteau) adds 10-20%. ML/AI infrastructure experience adds 15-25% because so few people have it. Multi-cloud architecture expertise adds 10-15%. FinOps and cost optimization are emerging premium areas.

SRE roles typically command 5-15% more than equivalent DevOps roles because of the stronger software engineering expectation. Platform Engineering pays similarly to SRE. The premium reflects the scarcity of people who can both write good code and operate production systems.

When budgeting, don't assume DevOps compensation matches general software engineering. It's often higher, and lowballing offers will cost you candidates.

Sourcing Strategies

DevOps candidates are hard to source because they're in high demand and not actively looking on job boards.

Referrals remain the highest-quality source. If you have any DevOps people already, their networks are gold. If you don't, ask your engineering team broadly—someone knows a good DevOps engineer even if they've never worked with one directly.

DevOps and cloud meetups and conferences put you in front of people who care about this work. Local DevOps Days events, Kubernetes meetups, AWS user groups—these attract engaged practitioners. Sponsoring is expensive but targeted.

Cloud community events (AWS re:Invent, Google Cloud Next, KubeCon) are large but concentrated. The density of relevant candidates makes them worthwhile if you can justify the investment.

LinkedIn outreach can work but requires care. Generic "DevOps opportunity" messages get deleted. Specific outreach that mentions your actual infrastructure, the interesting problems you're solving, and what makes the role distinctive might get responses. Mention on-call honestly—DevOps engineers will ask anyway.

Reddit communities (r/devops, r/sre, r/kubernetes) and Hacker News reach engaged practitioners but require thoughtful participation, not just job postings.

What attracts DevOps talent varies, but some themes recur. Modern infrastructure: nobody wants to maintain legacy systems without a path forward. Autonomy: trust to make infrastructure decisions rather than implementing others' specifications. Learning opportunities: cloud certifications, conferences, new technologies. Reasonable on-call: not 24/7 hero culture, but shared responsibility with real support. Developer respect: being a partner, not a ticket-taker who implements requests.

Interview Process Design

A strong DevOps interview process assesses both technical depth and the practical judgment that operations work requires.

I recommend a process that includes: recruiter screen for logistics and basic fit, hiring manager conversation for experience and role alignment, technical deep-dive for systems knowledge, practical exercise for hands-on skills, cross-functional conversation for collaboration ability, and culture/leadership conversation for values and career.

The technical deep-dive should cover Linux fundamentals (troubleshooting, performance investigation, process management), networking (DNS, load balancing, connectivity debugging), cloud architecture (service selection, design decisions, trade-offs), infrastructure as code (structure, patterns, state management, best practices), CI/CD (pipeline design, deployment strategies, environment management), and monitoring (what to alert on, how to design observability).

The practical exercise distinguishes candidates who talk well from candidates who do well. Options include: debugging a production scenario (gives a broken system, asks them to diagnose), designing infrastructure (describes requirements, asks them to architect), reviewing Terraform code (shows code with issues, asks them to identify problems and improvements), improving a CI pipeline (shows a slow or brittle pipeline, asks how they'd improve it), or incident simulation (simulates a production issue, observes how they respond and communicate).

What to assess beyond technical knowledge: do they understand why, not just how? Can they reason about systems holistically rather than just individual components? Do they approach troubleshooting systematically? Is their first instinct to automate rather than do things manually? Can they explain complex systems clearly?

Signals That Matter

Certain patterns predict success in DevOps roles.

Strong candidates have operated systems at scale. Not just set them up, but run them, debugged them when things went wrong, learned from failures. They can discuss past incidents thoughtfully—what happened, what they learned, what they'd do differently. They show genuine curiosity about your architecture; they ask questions because they're interested, not just to seem engaged. They balance building and operating; they understand both sides and can talk about the trade-offs between moving fast and maintaining stability.

Weaker candidates have tool expertise without fundamentals. They know Terraform syntax but can't reason about what happens when state gets corrupted. They've never been on-call and may not handle production pressure well. They define their role narrowly—"I just write Terraform" or "I just do Kubernetes"—in ways that won't serve a small team where everyone must flex. They blame developers for operational problems rather than building bridges.

The best signal is how they talk about past incidents. Do they describe what happened with specificity? Do they explain what they learned? Do they take appropriate ownership without excessive blame? Incident discussion reveals character and experience in ways that technical questions alone don't.

Team Structure Questions

As you hire, think about how DevOps engineers fit into your organization.

The embedded model puts DevOps engineers into product teams. They gain context and can be highly responsive, but you may get inconsistency across teams and the DevOps engineers can feel isolated from peers.

The central platform team model has DevOps engineers build shared infrastructure that all teams use. It creates consistency and efficiency but risks becoming a bottleneck that can't keep up with product team needs.

Most companies larger than a handful of engineers end up with hybrids: a central platform team that builds shared infrastructure, with some embedded DevOps support in teams that need it most.

Team sizing varies significantly based on infrastructure complexity and automation maturity. As a rough guide: at startup scale (under 50 engineers), one DevOps person per 10-15 engineers; at growth scale (50-200 engineers), one per 8-12; at larger scale, one per 6-10. These ratios are very rough—a company with simple infrastructure needs fewer DevOps people; a company with complex, heterogeneous systems needs more.

The engineering director who couldn't find the right DevOps engineer eventually stopped looking for "DevOps" generically. She rewrote the job description to specify her actual problems: slow deployments she wanted automated, reliability issues she wanted reduced, developer experience she wanted improved. She specified the infrastructure: AWS, Kubernetes, Terraform. She specified the ratio: 60% building new systems, 40% operating and improving existing ones.

The next candidate she interviewed understood exactly what he was signing up for. He'd done similar work at a similar scale. He had specific ideas about the problems she'd described.

Six months after he started, deployment frequency had tripled and reliability incidents had halved. Not because she found a magical unicorn, but because she'd defined what she actually needed and found someone who matched.

References

[^1]: SmithSpektrum DevOps/SRE placements, 150+ engineers, 2020-2026. [^2]: DORA, "State of DevOps Report," 2025. [^3]: Google SRE, "Site Reliability Engineering" book and practices. [^4]: Platform Engineering community surveys, 2025.

Hiring DevOps, SRE, or Platform Engineers? Contact SmithSpektrum for hiring strategy and search support.

Author: Irvan Smith, Founder & Managing Director at SmithSpektrum