Second Front Systems

Site Reliability Engineer - Observability

Second Front Systems

Site Reliability Engineer - Observability

Sorry, this job was removed at 08:20 p.m. (CST) on Friday, Oct 10, 2025

Hiring Remotely in USA

Remote

160K-180K Annually

Cloud • Software

The Role

ABOUT THE ROLE

Second Front Systems' (2F) Product team is seeking a highly skilled and motivated Senior Site Reliability Engineer to join our Observability team. We are a small team working to accelerate the deployment of emerging technology into national security use-cases. We are seeking technical professionals who want to operate on the front lines of an exciting and disruptive mission.

As a Senior SRE for Second Front Systems, you'll be responsible for deploying, maintaining, and scaling our observability infrastructure across multiple DoD networks. You'll work with Kubernetes-based platforms, BigBang charts from DoD Platform One, and build automation to make our monitoring stack easier to deploy for new customers. You'll be empowered to collaborate with others to implement infrastructure that delivers unique capabilities for our commercial and government customers, including the Department of Defense.

The Observability team is looking for a strong SRE with deep DevSecOps and Kubernetes experience. Someone who has deployed and maintained monitoring infrastructure at scale, with an eye for security in highly-regulated environments. Experience with DoD software deployments, Platform One, and single-tenant architectures is highly valued.

We are a fast-growing entrepreneurial team working at the convergence of technology and national security. If this type of effort interests you, come join us!

Note: This position requires U.S. citizenship due to government contract requirements.

Candidates must be located in the following geographic areas: DMV (DC/Maryland/Virginia), Raleigh/Durham/Chapel Hill, Denver/Colorado Springs, and Dallas/Fort Worth.

What You’ll Do

Deploy and maintain observability stack (Grafana, Mimir, Prometheus) across multiple customer clusters and DoD networks
Build Helm chart abstractions and automation to streamline monitoring deployments for new customers
Troubleshoot and debug complex Kubernetes issues, networking problems, and monitoring stack failures
Configure and maintain BigBang charts and DoD Platform One integrations
Design and implement infrastructure automation using tools like Pulumi, ArgoCD, and Flux
Work with Istio service mesh and Keycloak for authentication in secure environments
Monitor and optimize performance of monitoring infrastructure across multiple environments
Collaborate with security teams to ensure compliance with NIST requirements and DoD standards
Participate in on-call rotation and incident response for production environments

Skills You’ll Bring to Our Team

5+ years of Site Reliability Engineering or DevOps experience
Deep experience with Kubernetes administration, troubleshooting, and scaling
Hands-on experience deploying and maintaining observability tools (Prometheus, Grafana, Mimir/Cortex)
Strong understanding of Helm charts, GitOps practices, and CNCF tooling
Experience with service mesh technologies (Istio preferred)
Proven ability to debug complex distributed systems and networking issues
Understanding of authentication systems and security in regulated environments
Ability to work independently and collaborate with team members in a remote environment

Preferred Qualifications

Active security clearance or ability to obtain a Secret-level security clearance
Previous experience with DoD software deployments and Platform One
Experience with BigBang charts and Iron Bank containers
Experience working in national security or highly regulated environments
Familiarity with compliance frameworks (NIST, FedRAMP, etc.)
Experience with infrastructure as code (Pulumi, Terraform)

Technologies we Use

Observability: Grafana stack, Prometheus, custom alerting tools
Kubernetes: Helm, ArgoCD, Flux, Tekton, BigBang charts
Security: Istio, Keycloak, Kyverno
Infrastructure: AWS/GCP/Azure, Pulumi, Git/GitLab
Languages: YAML, Bash, Go

Second Front Systems Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Second Front Systems and has not been reviewed or approved by Second Front Systems.

Healthcare Strength — Health coverage is described as 100% employer-paid for employees and dependents, which is positioned as a standout element of the package. This signals strong protection for families without added premium costs.
Leave & Time Off Breadth — Time off policies include flexible PTO, paid parental leave, and recognition of federal holidays, indicating broad leave options. Employer materials cite 11 federal holidays, with flexibility noted across sources.
Fair & Transparent Compensation — Publicly posted salary bands and role-based ranges point to market-aware, competitive pay for multiple positions. Aggregated compensation snapshots also indicate strong on‑target earnings for sales and solid total compensation in senior technical roles.

Learn more about Second Front Systems's Compensation & Benefits →

Second Front Systems Insights

What's It Like to Work at Second Front Systems? Second Front Systems Culture & Values Second Front Systems Career Growth & Development What's the Work-Life Balance Like at Second Front Systems? Second Front Systems Leadership & Management Second Front Systems Company Growth, Stability & Outlook

View all jobs at Second Front Systems

View Second Front Systems Profile

Report Job

Similar Jobs

Lambda

Senior Site Reliability Engineer

Artificial Intelligence • Cloud • Machine Learning • Infrastructure as a Service (IaaS)

Remote or Hybrid

750 Employees

240K-401K Annually

NVIDIA

Senior Site Reliability Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse

In-Office or Remote

Santa Clara, CA, USA

21960 Employees

176K-334K Annually

Optum

Senior Site Reliability Engineer

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

In-Office or Remote

La Crosse, WI, USA

160000 Employees

92K-164K Annually

Waabi

Staff Software Engineer

Transportation

Remote or Hybrid

200 Employees

148K-249K Annually

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

Tysons, VA

140 Employees

Year Founded: 2014

What We Do

At Second Front Systems, we build software that accelerates delivery of emerging commercial technologies to U.S. warfighters. By harnessing insights and methodologies from the private sector and aligning them with government priorities and processes, we enable defense and national security professionals to effectively engage in long-term, continuous competition for access to emerging technologies.