Site Reliability Engineer

Posted 2 Days Ago
San Francisco, CA, USA
In-Office
Mid level
Artificial Intelligence • Big Data • Information Technology • Software • Analytics
The Role
Own reliability for a live fleet of Linux-based edge sensors and cloud infrastructure. Triage and recover field hardware, perform SSH-based diagnostics, build fleet management and OTA systems, implement observability and alerting, automate operational tasks, develop runbooks, and participate in on-call rotations to prevent and resolve incidents.
Summary Generated by Built In

Company Background
Specter's mission is to help automate the physical world.

Today, we build video sensors with state-of-the-art AI agents that answer any question, anywhere in their environments. Our systems can automatically detect and reason about any physical activity captured on camera, from security incidents (e.g. perimeter intrusion, theft, LPR), to safety monitoring (e.g. PPE detection, injured people), to operational efficiency (e.g. material tracking, congestion monitoring). We offer both long range wireless (1km range) and wired sensor variants to suit any deployment.

Our co-founders Xerxes and Philip are passionate about empowering our partners in the fast approaching world of physical AI and robotics. We are a small, fast growing team who hail from Anduril, Tesla, Uber, and the U.S. Special Forces.


The Role
We're hiring a Site Reliability Engineer to own the operational health of our connected sensor platform — spanning a live fleet of edge hardware deployed at customer sites and the cloud infrastructure behind it.

This is a high-ownership role at the intersection of ops and platform engineering. You'll drive reliability across our sensor fleet — triaging issues in the field, building the systems that prevent them from recurring, and owning the observability that keeps us ahead of problems as we scale.
You set your own priorities across all three:
Responsibilities:
Reactive — Triage & Recovery

  • Debug and triage issues across a live fleet of diverse Linux-based sensor nodes and edge appliances deployed at customer sites.

  • SSH into field hardware to diagnose, patch, and recover systems — often with limited remote access and incomplete information.

  • Own site bring-ups end to end; be the person who gets things back online.

Systems Builder — Close the Loop

  • Build and maintain fleet management systems: OTA update pipelines, device health tracking, remote diagnostics, and lifecycle tooling.

  • Identify repeat fires and eliminate them — build tooling, pre-deployment checks, and root cause processes that prevent recurrence.

  • Automate toil relentlessly: if you're doing something twice, you should be scripting it.

  • Collaborate with embedded systems, and platform teams to define reliability and deployment requirements.

Observability Owner — Fleet Visibility

  • Design and implement observability (logging, metrics, alerting) across edge devices and cloud infrastructure (AWS).

  • Surface and close telemetry gaps; build fleet-wide visibility that enables data-driven reliability decisions.

  • Develop runbooks, incident response procedures, and participate in on-call rotations.

Qualifications:

  • Strong Linux systems administration — comfortable working over SSH in production, not just dev environments.

  • Experience with edge or on-prem hardware alongside cloud infrastructure.

  • Solid networking fundamentals: DNS, firewalls, VPNs, subnets, secure remote access.

  • Scripting or programming in Python, Go, or Bash for operational tooling.

  • Familiarity with containerization (Docker, Kubernetes a plus).

  • Embedded systems experience — reading firmware logs, understanding hardware-software boundaries, and reasoning about what's happening below the OS is a meaningful edge in this role.

  • Deeper cloud experience (AWS infrastructure, IAM, networking, observability tooling) is a strong plus for owning the cloud side of the fleet.

  • Rust or C experience — we have firmware in both; being able to read and reason about low-level code accelerates triage significantly.

Skills Required

  • Strong Linux systems administration, comfortable working over SSH in production
  • Experience with edge or on-prem hardware alongside cloud infrastructure
  • Networking fundamentals: DNS, firewalls, VPNs, subnets, secure remote access
  • Scripting or programming in Python, Go, or Bash for operational tooling
  • Familiarity with containerization (Docker)
  • Kubernetes
  • Embedded systems experience (reading firmware logs, hardware-software boundaries)
  • AWS infrastructure, IAM, networking, observability tooling
  • Rust or C experience for firmware triage
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
50 Employees
Year Founded: 2020

What We Do

Specter is an AI-powered platform that delivers real-time data and insights on private companies, enabling investors to make informed decisions. It offers an AI-driven deal sourcing platform and a dual-lens classification system for industries and tech verticals.

Similar Jobs

DraftKings Logo DraftKings

Site Reliability Engineer

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Remote or Hybrid
United States
6400 Employees
200K-250K Annually

Domino Data Lab Logo Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning
Easy Apply
Remote or Hybrid
US
200 Employees
200K-230K Annually

CrowdStrike Logo CrowdStrike

Site Reliability Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Hybrid
Sunnyvale, CA, USA
10000 Employees
140K-215K Annually

Relativity Space Logo Relativity Space

Site Reliability Engineer

Aerospace • Hardware • Robotics • Software • Manufacturing
Easy Apply
In-Office
Long Beach, CA, USA
2200 Employees
140K-214K Annually

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account