Staff Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
Hiring Remotely in U.S.
Remote
165K-230K Annually
Senior level
Information Technology • Security
The Role
The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.
Summary Generated by Built In

SimSpace serves as an AI Proving Ground where organizations can confidently train, test, and outmaneuver adversaries in any environment. Trusted by allied governments, militaries, enterprises, and research institutions worldwide, SimSpace enables adaptive, AI-ready defenses that stay ahead of evolving threats. Founded in 2015 by experts from U.S. Cyber Command and MIT Lincoln Laboratory, the platform unifies training, testing, and validation in a realistic, live-fire simulation—helping teams evaluate security investments, optimize performance, and compress cyber readiness cycles from months to days.

Why join SimSpace? We are an organization that is focused on building our culture and mindfully enhancing our atmosphere every day which is why we have collaborated on an integral value system. Our governing philosophy of being Human Centered is deeply embedded within our value system. We apply this philosophy to every one of our internal team members, external clients, and their customers.

How Do We Work? We believe that people are at the center of everything we do. SimSpace fosters a culture of continuous learning, curiosity, and professional growth. That belief shows up in action: in-house training, internal and external learning platforms, cyber conferences, industry events, and dedicated time for skill development. Our people are empowered to shape their careers - and it shows. Year over year, SimSpace consistently outperforms industry benchmarks in internal mobility, promotions, and total rewards growth.

Who Thrives Here? We are a team of innovators, protectors, and problem-solvers. We believe diversity of thought and experience fuels better solutions, and we’re committed to building teams that reflect the communities we serve. Whether you’re remote or office-based, you’ll collaborate with talented colleagues across departments and time zones, united by the mission to create a safer digital world.

We invite you to apply today!

About the Role We are looking for a Staff Site Reliability Engineer to define the technical vision, lead the architecture, and secure the infrastructure that powers the SimSpace cyber range platform. The ideal candidate is a deeply experienced SRE and exceptional software engineer who thinks strategically about distributed systems, reliability, and operability at a global scale. At the Staff level, you will act as a force multiplier—architecting resilient systems, driving engineering standards, and solving our most complex infrastructure challenges rather than relying on manual processes or localized fixes.

In this position, you'll provide overarching technical leadership across our SRE practice, bridging traditional site reliability, DevOps, and DevSecOps. You'll architect the systems and strategies that allow SimSpace to deliver software seamlessly across our own data centers, to customers who bring their own hardware, and as pre-packaged appliances with bundled hardware and software. As our on-premises product matures and scales, you will design the long-term automation frameworks that make these varied deployments robust, secure, and repeatable.

What will you be doing as a Staff SRE at SimSpace?

  • Technical Strategy & Architecture: Design and architect the overarching infrastructure strategy that enables consistent, repeatable, and secure deployments across SimSpace-hosted data centers, customer-provided hardware, and highly restricted air-gapped environments.

  • Platform Evolution & Configuration Management: Lead the evolution of our CI/CD and Kubernetes platforms. Drive advanced application packaging, templating, and configuration management strategies using Jsonnet and Grafana Tanka (alongside Kustomize). Move beyond maintaining pipelines to architecting multi-cluster, multi-environment deployment frameworks that drastically improve developer velocity.

  • Reliability Leadership: Define, measure, and govern Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets across the engineering organization. Partner with product and engineering leadership to balance feature delivery with platform stability.

  • Advanced Observability: Architect our enterprise observability strategy using the Grafana stack. Design frameworks for proactive monitoring, complex anomaly detection, and distributed tracing that give teams unparalleled visibility into system health, pod scaling, and latency bottlenecks.

  • Security & Compliance Architecture: Drive the infrastructure security posture at an architectural level. Embed advanced container security, zero-trust network segmentation, and automated compliance policies directly into our deployment pipelines and runtime environments.

  • Cross-Functional Enablement: Serve as a strategic partner and consultant to development teams. Advocate for an "SRE culture" by designing self-service tooling, establishing "paved roads" for developers, and reducing operational toil across the entire engineering org.

  • Incident Command: Act as an Incident Commander during complex, high-severity outages. Drive blameless post-mortems and engineer long-term, systemic, and architectural fixes to ensure classes of failures never repeat.

  • Mentorship & Multiplier: Act as a technical mentor to senior and mid-level engineers. Raise the baseline of engineering excellence across the company by coaching, documenting best practices, and leading by example.

Who you are:

  • Experience: 8+ years of experience in Site Reliability, Platform, or DevOps engineering, with a proven track record of operating at a Staff, Principal, or Lead level to drive organization-wide infrastructure initiatives.

  • Expert Software Engineering: You possess deep software engineering skills (beyond scripting) and can architect complex, production-quality systems. You design clean interfaces, build maintainable tooling, and can dictate the technical direction of our internal toolchain. Language agnostic, but highly proficient in at least one modern language (e.g., Go, Python).

  • Advanced Kubernetes & Configuration Mastery: Deep, architectural understanding of Kubernetes in multi-tenant and multi-cluster production environments. You possess expert-level knowledge of Jsonnet and Grafana Tanka for managing complex, scalable Kubernetes configurations and application packaging.

  • GitOps & IaC Expertise: Extensive experience architecting sophisticated CI/CD pipelines and GitOps workflows using GitHub Actions, ArgoCD, and infrastructure-as-code principles at an enterprise scale.

  • Complex Deployments: Systems-level thinking with the ability to design architectures that span self-hosted, on-premises, VMware-based, and air-gapped deployment models.

  • Observability Expert: Deep expertise with observability platforms (Grafana stack preferred) and a proven ability to design alerting and monitoring strategies for complex distributed systems.

  • Security Mindset: Strong background in infrastructure security architecture, including container hardening, network security, vulnerability management, and delivering software to heavily regulated or customer-managed environments.

  • Influential Communicator: Exceptional communication and stakeholder management skills. You have a service-oriented mindset, but you also have the ability to influence cross-functional leadership, negotiate reliability tradeoffs, and align engineering teams behind a unified technical vision.

We’re proud to offer a competitive and comprehensive package designed to support your well-being, growth, and success:

  • Compensation. Base salary range: $165,000 - $230,000 reflecting our confidence in your expertise and impact, with the opportunity for bonuses tied to company performance and individual contributions.

  • Health & Wellness. Comprehensive medical, dental, and vision benefits, plus savings plans—coverage starts on day one!

  • Mental Health Support. Access to company-paid counseling, coaching, and resources for you and your family through Spring Health.

  • Financial Well-Being. Plan for your future with a 401(k)-retirement savings plan featuring a company match.

  • Flexible Time Off. Take the time you need with unlimited vacation and dedicated health & wellness days. SimSpace provides flexible solutions to meet the diverse work-life needs of team members.

  • Parental Leave. Paid leave plans to support you and your loved ones during life’s most important moments.

  • Ownership Opportunities. Equity stock options at hire, with annual performance-based grants—become an invested stakeholder in our shared success.

  • Referral Rewards. Earn $1,500–$3,500 for every qualified hire through our employee referral program.

  • Peloton Interactive Wellness Program. Full- and partial- subsidized membership plans and equipment discounts to help you reach your personalized fitness goals.

  • Continuous Learning. Access a LinkedIn Learning membership to prioritize your personal and professional development.

  • Social Connections. Monthly reimbursements for meaningful connections with teammates through our SocialSpace Community.

  • Extra Perks. Legal plan coverage, pet insurance, wellness reimbursements, and more to simplify life’s details.

Join SimSpace and enjoy benefits that enhance your career, health, and happiness!

SimSpace is an Equal Opportunity Employer:

In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification document form upon hire.

SimSpace is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, pregnancy, genetic information, disability, status as a protected veteran, or any other protected category under applicable federal, state, and local laws. We are committed to providing an inclusive and welcoming environment for all members of our staff, clients, volunteers, subcontractors, vendors, and clients.

Research shows that women and people from underrepresented groups only apply to jobs if they meet all of the qualifications. However, no one ever meets 100% of the qualifications. SimSpace encourages you to break that statistic and to apply. We look forward to your application!

We also consider qualified applicants regardless of criminal histories, in accordance with applicable law. We are committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. If you need assistance or accommodation due to a disability, please contact [email protected].

SimSpace does not accept unsolicited resumes from employment agencies.

Actual compensation for the position is based on a variety of factors, including, but not limited to affordability, skills, qualifications and experience, and may vary from the range.

Skills Required

  • 8+ years of experience in Site Reliability, Platform, or DevOps engineering
  • Deep software engineering skills and ability to architect complex systems
  • Expert-level knowledge of Kubernetes
  • Extensive experience architecting CI/CD pipelines
  • Deep expertise with observability platforms
  • Strong background in infrastructure security architecture
  • Exceptional communication and stakeholder management skills
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Boston, MA
161 Employees
Year Founded: 2015

What We Do

Founded in 2015 by experts from the U.S. Cyber Command and MIT’s Lincoln Laboratory, SimSpace combines the highest-fidelity, military-grade cyber ranges and training content with unique user and adversary emulation techniques. By providing team and individual training exercises, attack simulations, mission rehearsals, and product evaluations that leverage its cyber range, the SimSpace Cyber Force Platform delivers quantitative and actionable insights into how an organization can protect critical assets against cyber threats. SimSpace prepares individuals, teams and leaders for continued success against ever-evolving adversaries. No other organization has SimSpace’s depth of experience in creating high fidelity cyber ranges with unique user and adversary emulation techniques. These techniques are designed to stress people, process and technology across individual and team-level training exercises, attack simulations, mission rehearsals, and product evaluations. SimSpace's mission is to provide an automated, cost-effective evaluation method for calculating cyber risks based on realistic comprehensive assessments of holistic capability to yield more secure networks globally

Similar Jobs

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
Crystal City, VA, USA
8697 Employees
140K-200K Annually

Airwallex Logo Airwallex

Senior Site Reliability Engineer

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote or Hybrid
San Francisco, CA, USA
2000 Employees

ServiceNow Logo ServiceNow

Site Reliability Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
28000 Employees
166K-290K Annually

Andromeda (andromeda.ai) Logo Andromeda (andromeda.ai)

Site Reliability Engineer

Artificial Intelligence • Cloud • Information Technology • Software
In-Office or Remote
San Francisco, CA, USA
17 Employees

Similar Companies Hiring

Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Milestone Systems Thumbnail
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Lake Oswego, OR
1500 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account