Software Engineer - Cloud Engineering

Reposted 25 Days Ago
Mountain View, CA
Hybrid
145K-215K Annually
Mid level
Artificial Intelligence • Cloud • Machine Learning • Software • Database
The Role
As a Cloud Infrastructure Engineer, you will manage and optimize Kubernetes clusters across multiple cloud platforms while enhancing automation and reliability for AI applications.
Summary Generated by Built In
About Kumo.ai

Kumo is building a next-generation AI platform that empowers organizations to make predictive decisions faster—without the overhead of traditional ML pipelines. Backed by Sequoia and led by ex-Airbnb, Pinterest, and LinkedIn leaders, we’re scaling rapidly and looking for a cloud infrastructure engineer to build and run the backbone of our AI platform.

Your work will directly power the models and applications our customers rely on every day.  If you’re passionate about multi-cloud infrastructure, Kubernetes at scale, and building the infrastructure that powers the next generation of AI applications — we’d love to talk.

Why Kumo.ai?

  • Work alongside world-class engineers & scientists (ex-Airbnb, Pinterest, LinkedIn, Stanford).
  • Be a foundational voice in designing a platform powering enterprise-scale AI.
  • Competitive Series B compensation package (salary + meaningful equity).

The Opportunity - The Cloud Infrastructure team builds and operates our Kubernetes-based, multi-cloud AI platform across AWS, Azure, and GCP.

  • As a Cloud Infrastructure Engineer, you’ll work on scaling, securing, and optimizing the platform that powers massive multi-tenant clusters running Big Data and AI/ML workloads.
  • You’ll collaborate closely with senior engineers, ML scientists, and product teams to deliver automation, improve reliability, and expand our multi-cloud capabilities.
  • This role offers the chance to deepen your Kubernetes and cloud expertise while taking ownership of impactful projects.

What You’ll Do

  • Deploy, operate, and maintain infrastructure across AWS, Azure, and GCP.
  • Build and manage Kubernetes clusters (EKS, AKS, GKE) with a focus on performance, availability, and cost efficiency.
  • Develop and maintain automation using Infrastructure-as-Code tools (Terraform, Pulumi, Crossplane).
  • Implement and enhance GitOps workflows using Argo CD or Flux.
  • Set up and maintain observability systems (Prometheus, Grafana, OpenTelemetry) to monitor workloads and clusters.
  • Collaborate with the team to design, test, and roll out improvements to scaling and reliability.
  • Troubleshoot incidents and participate in on-call rotations to ensure platform uptime.
  • Contribute to security best practices, including RBAC, tenant isolation, and cloud identity management.

What You Bring

  • 3–5 years of experience building or operating cloud-native infrastructure in production.
  • Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP) and (preferably) exposure to multi-cloud environments.
  • Solid understanding of Kubernetes concepts and operational experience with production clusters.
  • Proficiency with Infrastructure-as-Code tools (Terraform, Pulumi, or similar).
  • Experience with GitOps workflows and tools like Argo CD, Flux, or Argo Workflows.
  • Familiarity with monitoring, logging, and tracing for distributed systems.
  • Scripting or programming skills in Python, Go, or Bash.
  • Strong problem-solving skills and a collaborative approach.

Nice to Have

  • Experience with multi-tenant Kubernetes clusters for AI/ML or big data workloads.
  • Knowledge of compliance and security standards (SOC2, GDPR, ISO27001).
  • Contributions to open-source cloud-native projects.
  • Familiarity with Kubernetes operators, controllers, or custom resources.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Top Skills

Argo Cd
AWS
Azure
Bash
Crossplane
Flux
GCP
Go
Grafana
Kubernetes
Opentelemetry
Prometheus
Pulumi
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Mountain View, CA
38 Employees
Year Founded: 2021

What We Do

Democratizing AI on the Modern Data Stack!

The team behind PyG (PyG.org) is working on a turn-key solution for AI over large scale data warehouses. We believe the future of ML is a seamless integration between modern cloud data warehouses and AI algorithms. Our ML infrastructure massively simplifies the training and deployment of ML models on complex data.

With over 40,000 monthly downloads and nearly 13,000 Github stars, PyG is the ultimate platform for training and development of Graph Neural Network (GNN) architectures. GNNs -- one of the hottest areas of machine learning now -- are a class of deep learning models that generalize Transformer and CNN architectures and enable us to apply the power of deep learning to complex data. GNNs are unique in a sense that they can be applied to data of different shapes and modalities.

Similar Jobs

Kumo Logo Kumo

Software Engineer

Artificial Intelligence • Cloud • Machine Learning • Software • Database
Hybrid
Mountain View, CA, USA
38 Employees
175K-250K Annually

Anduril Logo Anduril

Senior Electrical Engineer

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Mountain View, CA, USA
6000 Employees
159K-211K Annually

PagerDuty Logo PagerDuty

Operations Coordinator

Artificial Intelligence • Cloud • Information Technology • Machine Learning • Software • Big Data Analytics • Automation
Easy Apply
Hybrid
San Francisco, CA, USA
1200 Employees
60K-100K Annually

Verkada Inc Logo Verkada Inc

Operations Associate

Cloud • Hardware • Security • Software
In-Office
San Mateo, CA, USA
2000 Employees
110K-135K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account