DevOps Engineer

Infrastructure Remote Full-time
$150,000 - $210,000 USD Posted November 20, 2025

About the Role

We are looking for a DevOps Engineer to join our infrastructure team and help scale the platform that runs millions of tests every day for our customers. This role is central to Primates' ability to deliver fast, reliable test execution at scale -- you will own the systems that provision ephemeral test environments, manage compute resources across multiple cloud regions, and ensure that our platform maintains the 99.95% uptime SLA that our enterprise customers depend on. Our infrastructure runs on AWS, orchestrated by Kubernetes (EKS) across three regions (us-east-1, us-west-2, eu-west-1). We use Terraform for infrastructure-as-code, ArgoCD for GitOps-based deployment, and Karpenter for intelligent node autoscaling. The platform's unique challenge is its burst workload pattern: test execution demand can spike by 10x within minutes as customers' CI pipelines trigger simultaneously, and our infrastructure needs to scale up instantly and scale down efficiently to manage costs. You will work on improving our deployment pipeline (currently achieving 15-minute deploy times, with a goal of under 5 minutes), enhancing our observability stack (Datadog for metrics, Loki for logs, Tempo for traces), and building the automation that keeps our multi-region infrastructure running smoothly. You will also be responsible for cost optimization -- with cloud infrastructure as our largest expense category, finding ways to deliver the same performance at lower cost has a direct impact on the company's bottom line. Security is a first-class concern in this role. Our customers send their source code and test results through our platform, and we take that trust seriously. You will work with our security team to implement and maintain SOC 2 Type II controls, manage secrets rotation, configure network policies, and ensure that our infrastructure meets the compliance requirements of our enterprise and healthcare customers. This is a fully remote role open to candidates in North American time zones. You will be part of a 4-person infrastructure team that operates with high autonomy and a strong culture of documentation. We believe that well-documented infrastructure is the foundation of reliability, and we invest significant time in maintaining runbooks, architecture decision records, and operational playbooks.

Responsibilities

  • Design, build, and maintain Kubernetes-based infrastructure across multiple AWS regions using Terraform and ArgoCD
  • Optimize the CI/CD pipeline for Primates' own codebase, reducing deploy times and improving release reliability
  • Implement and manage autoscaling strategies for burst workloads, balancing performance with cost efficiency
  • Build and maintain the observability stack (Datadog, Loki, Tempo) including dashboards, alerts, and SLO tracking
  • Manage secrets, certificates, and access controls across production and staging environments
  • Contribute to SOC 2 Type II compliance by implementing and documenting infrastructure security controls
  • Participate in on-call rotation for infrastructure incidents, with a focus on rapid resolution and thorough postmortem analysis
  • Author and maintain infrastructure documentation including architecture decision records, runbooks, and operational playbooks

Requirements

Required

  • 5+ years of experience in DevOps, SRE, or infrastructure engineering roles
  • Deep expertise with Kubernetes in production, including cluster management, networking (CNI, service mesh), and troubleshooting
  • Strong proficiency with Terraform or equivalent infrastructure-as-code tools for managing cloud resources at scale
  • Experience with AWS services (EKS, EC2, RDS, S3, IAM, VPC, CloudFront) in multi-region production environments
  • Proficiency in at least one scripting or programming language (Go, Python, or Bash) for automation and tooling
  • Experience implementing CI/CD pipelines using GitHub Actions, GitLab CI, or similar platforms
  • Strong understanding of networking fundamentals including DNS, load balancing, TLS, and firewall configuration

Preferred

  • Experience with GitOps tools (ArgoCD, Flux) for Kubernetes deployment management
  • Familiarity with Karpenter, Cluster Autoscaler, or custom autoscaling solutions for Kubernetes
  • Background in cost optimization for cloud infrastructure (reserved instances, spot instances, right-sizing)
  • Experience with SOC 2 Type II audit preparation and compliance control implementation
  • Familiarity with container security scanning (Trivy, Snyk) and runtime security monitoring (Falco)

Benefits

  • Competitive salary range of $150,000 - $210,000 plus equity (0.05% - 0.15% based on experience)
  • Comprehensive health, dental, and vision insurance with 100% premium coverage for employees
  • 401(k) with 4% company match, vested immediately
  • Unlimited PTO with a minimum of 15 days encouraged, plus 10 company holidays
  • Annual learning and development budget of $3,500 for conferences, certifications, and courses
  • Home office setup stipend of $2,500 for new hires
  • Coworking space reimbursement up to $300/month for remote employees
  • Annual company retreat (3 days) at a destination location for team building
  • 12 weeks paid parental leave for all new parents
  • AWS certification exam fees covered by the company

Interested in this role?

We'd love to hear from you. Submit your application and our recruiting team will be in touch within 5 business days.

Apply Now