SRE

Reposted 13 Days Ago
Be an Early Applicant
San Mateo, CA
In-Office
Senior level
Software
The Role
Design, implement, and maintain scalable backend systems and APIs; build cloud infrastructure (preferably GCP) using Terraform; operate containerized workloads with Kubernetes; ensure reliability, security, and performance; participate in on-call rotations, architecture discussions, and cross-functional delivery.
Summary Generated by Built In

Luminary Cloud helps engineering companies be more competitive by getting to market faster, creating new, better products, and reducing development risk. We do this with our Physics AI platform, the fastest and easiest way to build and deploy models to understand and instantly predict physical reality with precision. Customers span industries from automotive and aerospace, to leading sporting equipment providers, including Otto Aviation, Joby Aviation, Piper Aircraft and Trek Bikes. Luminary is a Series B company and is headquartered in San Mateo, California.

About Luminary Cloud

Luminary Cloud helps engineering companies be more competitive by getting to market faster, creating new, better products, and reducing development risk. We do this with our Physics AI platform, the fastest and easiest way to build and deploy models to understand and instantly predict physical reality with precision. Customers span industries from automotive and aerospace to leading sporting equipment providers, including Otto Aviation, Joby Aviation, Piper Aircraft, and Trek Bikes. Luminary is a Series B company and is headquartered in San Mateo, California.


Role Description


The Luminary Physics AI platform is a SaaS offering that runs on GCP. It uses GPUs for data generation, model training, and mode inference and supports accelerated engineering design workflows. The product generates and consumes large volumes of data for Physics AI models and is used by some of the most demanding customers in automotive, aerospace and defense industries. An elevated security and compliance posture, the ability to maintain five-nine SLAs, use automation for most tasks and managing large data volumes make this an exciting opportunity for a production Site Reliability Engineer. 


The right candidate will apply software engineering principles to operations, focusing on system reliability, performance, and scalability. You will collaborate closely with engineering and product teams to design, deliver, and scale the core systems that power our platform. You will be responsible for suggesting product changes that allow us to manage 10k users simultaneously on the platform with effective resource management


Key Duties, Responsibilities, and DeliverablesReliability & Operational Excellence
  • Participate in on-call rotations and incident response, implementing effective remediation strategies and leading post-incident reviews to prevent recurrence
  • Define, monitor, and enforce Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to meet internal and external reliability targets
  • Apply software engineering practices to eliminate toil by automating operational tasks, improving overall efficiency, and contributing to the operational reliability of the platform
Infrastructure & Platform Engineering
  • Develop and enhance our cloud infrastructure (GCP preferred) through automation and Infrastructure as Code (Terraform)
  • Develop, oversee, and maintain operational systems (from deployment pipelines to orchestration layers) ensuring application health, reliability, and scalability using containerized solutions like Kubernetes
  • Execute scalability and performance optimization strategies to ensure systems efficiently handle increasing workloads and future growth
Architecture & Systems Design
  • Contribute to the design and implementation of highly-available and fault-tolerant systems, leveraging Service-Oriented Architecture (SOA) or microservices principles
  • Participate in architectural discussions that influence the platform’s long-term reliability, performance, and scalability
Security & Documentation
  • Collaborate with security experts to integrate IAM, authentication, authorization, encryption, and related best practices into the infrastructure
  • Create and maintain comprehensive documentation on system architecture, infrastructure, and security practices
Expertise and QualificationsRequired
  • Proven experience designing and implementing scalable SaaS backend systems
  • Strong understanding of cloud infrastructure (GCP preferred), CI/CD pipelines, and core SRE/DevOps concepts.
  • 5+ years of experience building performant, scalable, distributed systems (or equivalent experience).
  • 10+ years of experience required for Senior/Lead candidates.
  • Proficiency in Golang and Python is highly desirable.
  • .
  • Familiarity with Kubernetes and container orchestration.
  • Experience with Infrastructure as Code (Terraform) and cloud automation.
  • Strong understanding of operational practices and willingness to participate in on-call rotations.
  • Knowledge of modern security principles and IAM fundamentals.
Preferred Qualifications (Senior Candidates)
  • Demonstrated success scaling infrastructure in a startup environment, including multicloud, hybrid, or on-prem deployments.
  • Proven experience mentoring and guiding engineers, supporting technical growth and career development.
  • Ability to act as a technical architect, making high-impact design decisions for reliable, scalable, and secure platform systems.

Top Skills

Ci/Cd
Cloud Automation
Container Orchestration
Go
Google Cloud Platform
Iam
Infrastructure As Code
Kubernetes
Microservices
Python
Service-Oriented Architecture
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Redwood City, California
96 Employees
Year Founded: 2019

What We Do

Luminary Cloud is an early-stage tech startup focused on innovations in high-performance computing for enterprise industrial R&D. The company is currently developing its initial product in stealth mode.

Similar Jobs

Milestone Systems Logo Milestone Systems

Site Reliability Engineer

Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Remote or Hybrid
United States
1500 Employees
160K-180K Annually

DFIN Logo DFIN

Site Reliability Engineer

Fintech • Software
Remote or Hybrid
United States
1750 Employees

Anduril Logo Anduril

Site Reliability Engineer

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
124K-231K Annually

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
2 Locations
8697 Employees
119K-170K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account