Senior Site Reliability Engineer, Robotics & Cloud Infrastructure

Posted 2 Hours Ago
Hiring Remotely in USA
Remote
164K-220K Annually
Senior level
Robotics • Software
The Role
Own reliability across vehicle and cloud stacks for AUV operations: onboard Jetson/ROS2 compute, topside systems, cloud ingestion/processing and customer platform. Build automation, observability, runbooks, and self-recovery to reduce on-call toil; manage AWS infrastructure, IaC, container orchestration, and reliability targets. Participate in shared 12-hour on-call shifts and field deployments, mentor team on operational excellence.
Summary Generated by Built In
About Bedrock Ocean Exploration

Bedrock Ocean builds and operates autonomous underwater vehicles (AUVs) that collect georeferenced ocean-floor data at commercial scale. We deliver bathymetric and imagery data products to customers through our own platform, and we’re scaling toward continuous, around-the-clock data collection campaigns spanning months at a time. Keeping vehicles in the water and data flowing reliably is a core engineering problem and this role owns the reliability of the systems on both ends of that pipeline.

Headquartered in Richmond, California, Bedrock Ocean Exploration is building autonomous ocean intelligence that will enable the ocean economy to solve the world’s most pressing challenges in maritime security, infrastructure, energy, and climate. Our modular architecture, driven by Siren (autonomous underwater vehicles), Trident (command and control), and Mosaic (subsea data fusion), delivers entirely new intelligence capabilities for government and commercial partners. Missions mobilize from any vessel of opportunity in 24 to 72 hours, and our automated pipeline returns comprehensive insights in hours, not weeks like the incumbents, keeping crews safe on shore while cutting cost and time.

The Role

We’re looking for an SRE who is equally comfortable on the robotics side- compute on the vehicle, topside operator machines, field deployments- and the cloud side: data ingestion, processing pipelines, and our customer-facing platform. You’ll build the automation, observability, and operational guardrails that let a small team run continuous AUV operations without continuous heroics, turning manual recovery steps into self-healing systems and shrinking the set of failures that only one person knows how to fix.

This is a hands-on senior infrastructure role with a strong automation mandate and a shared on-call rotation. You’ll set reliability direction across vehicle-side and cloud-side systems, raise the operational bar for the team, and mentor others toward it. You’ll be a force multiplier for reliability across the company, not a ticket queue.

Reports to: Head of Software.

East Coast location is required to support coverage across both European operations and the East Coast during 12-hour on-call shifts.

Travel to field deployments and Richmond HQ is expected (approximately 5–15%).

What You’ll Do
  • Own reliability across the full path from vehicle to customer: AUV onboard compute (Jetson-class modules, ROS 2), topside/operator systems, cloud data pipelines, and the platform that delivers data products.

  • Build and extend infrastructure automation- provisioning, configuration management, deployment, and self-recovery- so that routine field operations and pipeline runs require minimal manual intervention.

  • Design and improve observability: metrics, logging, tracing, and alerting that give both robotics and data teams early, actionable signal across vehicle fleets and cloud services.

  • Drive down on-call burden by identifying and eliminating single points of failure, writing runbooks, and automating the manual steps that currently require tribal knowledge.

  • Participate in a shared on-call rotation covering both robotics-side and cloud-side incidents in 12-hour shifts spanning European and East Coast business hours; lead and contribute to blameless post-incident reviews.

  • Define and track reliability targets, availability, data yield, recovery time, tied to continuous-operations goals, and partner with robotics and data teams to meet them.

  • Manage cloud infrastructure on AWS (compute, storage, networking, IaC, cost, and security posture) for data processing and platform workloads.

  • Improve fleet- and vehicle-level configuration management, deployment safety, and rollback so changes reach the field reliably and predictably.

What We’re Looking For
  • 5+ years in an SRE, DevOps, or infrastructure engineering role running production systems with real uptime and on-call responsibilities, including senior-level ownership of reliability outcomes.

  • Experience implementing a scalable incident management and operational excellence mechanism that treats operators as customers, building processes and tooling that serve the people running operations day to day, not just the engineering team.

  • Strong automation instincts: comfortable scripting and building tooling in Python and/or Go and Bash, and using infrastructure-as-code (Terraform or equivalent).

  • Hands-on AWS experience across compute, storage, networking, and IAM, plus containerization and orchestration (Docker, Kubernetes or similar).

  • Working knowledge of Linux internals, networking, and observability tooling (Prometheus/Grafana or equivalents).

  • Comfort operating across environments that aren’t just cloud: embedded or edge compute, intermittent connectivity, and physical systems that fail in messy ways.

  • A reliability mindset: you instrument before you guess, you automate the second time you do something manually, and you write things down so the next person or the system can handle it without you.

  • Strong ownership and communication in a small, fast-moving team.

Nice to Have
  • Experience with robotics or embedded systems: ROS / ROS 2, Jetson or similar edge compute, sensor integration.

  • Background supporting field operations, autonomous systems, or hardware-in-the-loop environments.

  • Familiarity with data pipelines and geospatial or large-binary data formats.

  • Experience standing up on-call practices and incident response from an early stage.

  • Some connection to the ocean: professional, academic, or personal. You’re excited to be around people who dive, sail, build, and explore offshore.

  • Active U.S. Secret security clearance or above.

Why This Role Matters

Our biggest operational goal depends on systems that stay up and data that stays valid for long, continuous stretches with a small team and a limited rotation. The reliability and automation you build directly determines whether we can run continuous campaigns at scale. This is high-leverage infrastructure work with a clear, measurable mission.

Not a Fit If…
  • You prefer environments where cloud and hardware never mix.

  • You’d rather build tickets than eliminate them.

  • You’re not comfortable with on-call ownership on a small team.

  • You want to optimize existing systems, not build the reliability practice alongside the product.

Compensation

$164,000–$220,000 base salary annually, depending on location. The upper end of the range reflects compensation in the New York, NY metro. In addition, we offer comprehensive employee benefits and equity.

Work Authorization

Candidates must have legal authorization to work in the United States without visa sponsorship. Bedrock does not sponsor employment visas.

Due to the nature of our government and defense work, candidates must be eligible to obtain a U.S. Secret security clearance if requested. An active Secret or higher clearance is not required to apply, but candidates who hold one are strongly preferred.

Bedrock Ocean Exploration is an equal opportunity employer.

Skills Required

  • 5+ years in SRE, DevOps, or infrastructure engineering with production uptime and on-call responsibilities
  • Proven ownership of reliability outcomes and implementation of incident management/operational excellence
  • Comfortable scripting and building tooling in Python and/or Go and Bash
  • Experience with infrastructure-as-code (Terraform or equivalent)
  • Hands-on AWS experience across compute, storage, networking, and IAM
  • Experience with containerization and orchestration (Docker, Kubernetes or similar)
  • Working knowledge of Linux internals, networking, and observability tooling (Prometheus/Grafana or equivalents)
  • Comfort operating in embedded/edge compute environments with intermittent connectivity
  • Willingness to participate in shared on-call rotation covering European and East Coast 12-hour shifts
  • East Coast location is required to provide coverage
  • Legal authorization to work in the United States without visa sponsorship (Bedrock does not sponsor visas)
  • Eligible to obtain U.S. Secret security clearance if requested
  • Experience with robotics or embedded systems (ROS/ROS 2, Jetson)
  • Background supporting field operations, autonomous systems, or hardware-in-the-loop environments
  • Familiarity with data pipelines and geospatial or large-binary data formats
  • Active U.S. Secret security clearance or above
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Brooklyn, NY
30 Employees
Year Founded: 2019

What We Do

We are a vertically-integrated ocean exploration company developing proprietary robotics and software to quickly and cheaply explore the entirety of the Earth’s oceans on a scale not possible with today’s technology. We offer subsea survey services (Hydro, Geo, UXO, etc.) as well as paired cloud & local software to effectively store, process and make sense of your organization's subsea data.

Similar Jobs

monday.com Logo monday.com

Customer Success Manager

Artificial Intelligence • Productivity • Sales • Software
Remote or Hybrid
New York, NY, USA
3049 Employees

Wise Logo Wise

Support Engineer

Fintech • Mobile • Payments • Software • Financial Services
Remote or Hybrid
Austin, TX, USA
9000 Employees

HopSkipDrive Logo HopSkipDrive

Pod Lead, Specialty Account Management

Automotive • Edtech • Kids + Family • Mobile • Social Impact • Transportation
Easy Apply
Remote
USA
450 Employees

Toast Logo Toast

Senior Manager, Growth Operations - Strategic Growth R/MM/E

Cloud • Fintech • Food • Information Technology • Software • Hospitality
Remote
United States
5000 Employees
149K-238K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account