Senior Site Reliability Engineer (SRE)

Reposted 7 Days Ago
Palo Alto, CA, USA
Hybrid
158K-225K Annually
Senior level
Other
The Role
The Senior SRE will deploy and operate commercial SaaS platforms, utilizing advanced skills in cloud infrastructure, automation, and systems engineering while promoting efficiency and reliability.
Summary Generated by Built In
Manufacturing advanced electronics requires understanding millions of signals generated across complex assembly processes. Instrumental builds systems that capture and analyze those signals — images, test results, and process data — enabling engineers to discover failures, identify root causes, and deploy production controls that improve yield and product maturity. Leading companies such as NVIDIA, Cisco, and Meta rely on Instrumental to accelerate new product development and scale manufacturing across global factories. Instrumental has become mission-critical for manufacturers building and scaling the next generation of AI infrastructure hardware.

The Instrumental platform collects, intelligently transforms, and contextually presents manufacturing data to technical end-users, enabling them to optimize their manufacturing process in real-time. Our core technology is proprietary ML algorithms, packaged in an accessible, user-centric user interface – we believe we must have both the best technology and the best access to that technology to win.

Requirements:
  • 5 or more years of DevOps or SRE experience deploying and operating commercial SaaS platforms on public cloud infrastructure, AWS preferred.
  • Expert knowledge with Linux, shell, containerization, Kubernetes, IaC (terraform preferred), monitoring, logging, and APM tools.
  • Proven ability to take initiative and drive impactful projects to completion efficiently and independently.
  • Comfort with ambiguity, pace, and frequent pivots inherent in a startup environment, with a track record of creating clarity for teams.
  • Experience introducing and integrating AI tools/processes into development and operation workflows.
  • Demonstrated skill in setting, iterating on, and measuring KPIs to ensure ongoing performance, reliability and efficiency.
  • Network/application security and compliance experience is a plus.

Who You Are:
  • Dead serious about performance, scalability, and reliability (PSR): You care deeply about how systems behave in the real world and sweat the details around latency, uptime, and scale.
  • Systems engineering & infrastructure expertise: You’ve spent real time building and running distributed systems and know your way around cloud infrastructure, networks, and operating systems.
  • Automation, automation, automation: If something is repetitive or error-prone, your first instinct is to automate it and make it disappear.
  • Operating in ambiguity & high-growth environments: You’re comfortable making good calls without perfect information and adapting as the system and company grow fast.
  • Dependable, trustworthy: People trust you to own problems, show up when things are broken, and follow through.

This position requires access to items and data that are developed under U.S. government contracts and subject to dissemination controls that limit access to U.S. citizens only.

We’re a growing team that works collaboratively, is supportive of each other, and is highly energized by the opportunity for a large impact. We actively work to promote an inclusive environment, valuing passion and the ability to learn. You’re encouraged to apply even if your experience doesn’t precisely match the job description!

The following is a representative annual base salary range for this position within the Bay Area: $158-225k. We consider candidates at multiple levels for this role. Job level and salary opportunities are evaluated through our interview process – we review the experience, knowledge, skills, and abilities of each applicant.

Instrumental is proud to offer a highly-rated variety of benefits, including health, vision, dental, commuter plans, and parental leave. 

At Instrumental, protecting company and customer information is a shared responsibility. Employees are expected to comply with company engineering, security, access control, and privacy policies, and promptly report suspected security incidents or policy violations.

Skills Required

  • 5 or more years of DevOps or SRE experience
  • Expert knowledge with Linux, shell, containerization, Kubernetes, IaC (terraform preferred)
  • Experience introducing and integrating AI tools/processes into development and operation workflows
  • Network/application security and compliance experience is a plus
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, CA
85 Employees
Year Founded: 2015

What We Do

Accelerate product maturity, de-risk ramp, and improve yields. Instrumental helps you find and fix issues you didn't even know were there.

Similar Jobs

Airwallex Logo Airwallex

Senior Site Reliability Engineer

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Hybrid
San Francisco, CA, USA
2200 Employees
160K-250K Annually

Zocdoc Logo Zocdoc

Senior Site Reliability Engineer

Healthtech • Information Technology • Software • Telehealth
Easy Apply
Hybrid
5 Locations
900 Employees
210K-270K Annually

ServiceNow Logo ServiceNow

Site Reliability Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
29000 Employees
166K-290K Annually

Order.co Logo Order.co

Senior Site Reliability Engineer

eCommerce • Fintech • Payments • Software
Remote or Hybrid
United States
120 Employees
175K-200K Annually

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account