Senior Site Reliability Engineer, Hawaii

Reposted Yesterday
Be an Early Applicant
2 Locations
In-Office
180K-220K Annually
Senior level
Software • Defense
Building the future of the military staff.
The Role
We are hiring a Senior Site Reliability Engineer to ensure deployment stability and service quality, working in on-premise DoD and AWS environments.
Summary Generated by Built In
About Onebrief

Onebrief is collaboration and AI-powered workflow software designed specifically for military staffs. By transforming this work, Onebrief makes the staff as a whole superhuman - meaning faster, smarter, and more efficient.

We take ownership, seek excellence, and play to win with the seriousness and camaraderie of an Olympic team. Onebrief operates as an all-remote company, though many of our employees work alongside our customers at military commands around the world.

Founded in 2019 by a group of experienced planners, today, Onebrief’s team spans veterans from all forces and global organizations, and technologists from leading-edge software companies. We’ve raised $123m+ from top-tier investors, including Battery Ventures, General Catalyst, Insight Partners, and Human Capital, and today, Onebrief is valued at $1.1B. With this continued growth, Onebrief is able to make an impact where it matters most.

Security Clearance, Location, and Onsite Notice:

This role requires regularly working on-site at customer locations on Oahu, Hawaii, specifically Camp H.M. Smith and Joint Base Pearl Harbor-Hickam.

If you are not currently within commuting distance, you must be willing to relocate (note that Onebrief will provide relocation assistance).

Active Top Secret Clearance required; SCI eligibility is a plus.

About The Role

We are hiring a Site Reliability Engineer (Hawaii) to join our Infrastructure & Security team. You’ll report to our Director of Infrastructure and work closely with fellow SREs, security, and customer success.

You will be the first line of support for our mission critical deployments, and responsible for ensuring best-in-class service quality and issue resolution. You will work in both on-premise DoD environments and AWS cloud environments. Your lessons from the field will shape how our team works, from policy to implementation.

In addition to working at the customer, you will contribute directly to solutions that increase stability, performance, and security of our deployments, and improve the overall experience of deploying and managing Onebrief on premise.

About You

You are a force multiplier who views reliability as the most critical feature of any application and/or platform and believe that "reliability beats novelty." You see infrastructure and operability as a product to be automated, documented, and continuously improved, always leaving systems easier to operate than you found them.

You are equally comfortable leading a post-incident review, designing SLOs in a system design session, or diving into a kubectl shell to triage a complex production issue. You don't just fix problems; you translate constraints and failure modes into clear, automated guardrails and scalable, resilient architecture. For you, robust monitoring, actionable alerting, and insightful runbooks are core parts of the engineering process, not afterthoughts.

You mentor others, fostering a culture of blameless postmortems and proactive reliability. You collaborate naturally with application and platform teams, helping them move quickly but safely by building the tools, processes, and observability that make "fast recovery" a reality.

What You'll Do

You'll own the reliability, scalability, and security of the production application and/or platform. You will do this by:

  • Building a World-Class Observability Platform: Design, implement, and manage our monitoring, logging, and alerting stack (e.g., Prometheus, Loki, Alloy, and Grafana). You won't just track metrics; you'll create the actionable insights and automated alerting that allow teams to identify and resolve issues before they impact users.

  • Defining and Upholding Reliability: Define, measure, and own alerting that feeds into our Service Level Objectives (SLOs) and increases trust internally and externally. You will be the organization's expert on what it means for our systems to be reliable and how to measure it.

  • Leading Incident Response: Act as the incident responder and potentially incident commander during critical incidents You will lead blameless post-mortems / After Action Reviews (AARs) that identify true root causes and drive automated, long-term solutions to prevent recurrence.

  • Automating for Scale and Security: Partner with platform engineers to design, build, and manage secure, resilient Kubernetes clusters and cloud/on-prem environments using Infrastructure-as-Code (Terraform, Ansible). You will embed security and compliance controls (RMF, STIGs) directly into this automation.

  • Eliminating Toil and Scaling the Team: Proactively identify and eliminate operational toil by building automation. You will act as a force multiplier by advising other teams on best practices in air-gapped environments and production readiness.

What We Look For
  • 3 years of experience in Site Reliability Engineering or a related field, with firsthand experience managing mission-critical systems within DoD’s air-gapped environments

  • An active Top Secret security clearance. U.S. citizenship required.

  • Experience automating software delivery, deployment, and providing documentation and self-service tools for engineering teams and customers.

  • A strong understanding of Linux, containerization and orchestration, and virtual machines

  • Experience with centralized logging, metrics, and observability using tools such as Prometheus, Loki, Grafana, ELK stack, or Datadog.

  • Networking fundamentals: core protocols and secure configurations.

  • A deep understanding of incident response processes, with experience conducting thorough root cause analyses and driving continuous improvement

  • Clear, concise writing; strong documentation habits and async communication.

    • Core skills and technologies: VMWare, Kubernetes, Docker, Helm, Ansible, Terraform, Linux, AWS, DoD compliance, Monitoring and Observability tools, AWS.

Bonus points (nice to have)

  • Experience with compliance frameworks (RMF, STIGs/SRGs, ICD 503).

  • Security‑minded design for air-gapped environments.

  • Active Security+ or another DoD 8570.01-approved security credential, or the ability to obtain the valid credentials within 3 months of employment.

Top Skills

Ansible
AWS
Docker
Dod Compliance
Helm
Kubernetes
Linux
Terraform
VMware
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Honolulu, HI
220 Employees

What We Do

Before Onebrief, military planning and collaboration was slow, inefficient, and resource-intensive. Building slides with no version control as partners collaborated would have staffs spend weeks or months on a single product or document.

With Onebrief, these workflows are now simple and collaboration between large commands is efficient. Staff optimization is the key to building a more resilient, more effective military. Today Onebrief users report at least 2x time savings - and growing.

Onebrief is a first of its kind software for the military. While many others have tried to build a solution for this problem, Onebrief’s “card” structure for reusing data and enabling real time updates is what makes this possible. Core features and attributes that make this platform powerful include:
- Global Collaboration
- Real-Time Updates
- AI Automation
- Interoperability + Integrations
- Deployable across Secret and Top Secret Networks

Mission Driven

Onebrief is composed of professionals from backgrounds of all kinds - spanning veterans across forces and organizations, and technologists from leading-edge software giants.

Onebrief is more than just a software platform; it's a mission-driven company dedicated to improving the efficiency and effectiveness of military planning. By joining the team, you'll contribute to solutions that directly support national security and the work of service members.

Your work directly addresses critical challenges that military planners and operators face daily. Every line of code and every design decision contributes to real-world outcomes.

The software was designed and built by a team of experienced planners - lending a nuanced perspective on the challenges our partners face. Our team embeds alongside users - from
the Pentagon to the Indo-Pacific - to build a platform that meets their unique needs.

Rapid, Strategic Growth

Our users love the platform and growth is scaling, most recently reporting operational usage growth at a 19,600% annualized rate. Stronger utilization is underway and we’re at an exciting period of advancement.

As a rapidly growing organization, you'll directly influence its direction and long-term success. Over the past year we’ve seen exciting growth metrics:

First, our headcount has grown 150% YoY to keep pace with our product advancement and customer growth.

Our funding has skyrocketed, most recently raising our Series C, led by top-tier venture investors who have deep expertise in defense tech.

Why Work With Us

Impactful Transformation

At Onebrief, we believe optimizing the military staff is the most impactful thing - on a per-dollar basis - in defense tech right now. This has the potential to save the department of defense billions of dollars and save users countless hours. It’s a longstanding problem that we’re uniquely positioned to solve.

Gallery

Gallery
Gallery
Gallery

Onebrief Offices

Remote Workspace

Employees work remotely.

Typical time on-site: None
United States

Similar Jobs

Onebrief Logo Onebrief

Operations Manager

Software • Defense
In-Office
Wahiawa, HI, USA
220 Employees
155K-175K Annually

Onebrief Logo Onebrief

Solutions Engineer

Software • Defense
In-Office
Honolulu, HI, USA
220 Employees
190K-220K Annually

Onebrief Logo Onebrief

Engagement Manager

Software • Defense
In-Office
Honolulu, HI, USA
220 Employees
145K-175K Annually

Onebrief Logo Onebrief

Operations Manager

Software • Defense
In-Office
Honolulu, HI, USA
220 Employees
170K-190K Annually

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account