Software Engineer - Cloud Engineering

Posted 6 Days Ago
Mountain View, CA
Hybrid
Mid level
Artificial Intelligence • Cloud • Machine Learning • Software • Database
The Role
As a Software Engineer on the Cloud Engineering team at Kumo, you'll architect scalable systems for the Kumo AI platform, collaborate with teams to influence ML tech scaling, and enhance CI/CD processes. You will be responsible for developing infrastructure microservices, building tools for deployment, and managing ML Ops, thereby influencing the productivity of engineers and users through big data solutions.
Summary Generated by Built In

The Cloud Infrastructure team at Kumo manages the Kubernetes-based, cloud-native Kumo AI platform. They define service level objectives, ensure capacity, maintain cost visibility, and uphold security compliance for the Multi-Cloud Platform.


As a key team member, you will architect scalable systems for the Kumo platform, making it the top choice for Big Data and AI workloads. Joining early, you'll design the platform to handle large datasets, enhancing productivity for engineers and users. Collaborating with ML scientists, product engineers, and leaders, you'll influence scaling ML tech, develop tools for speed, and craft full-stack experiences. Engineers at Kumo wear many hats, leading the design of core systems from scratch and shaping product direction. You'll dive into foundational work, managing model lifecycles, ML Ops, CI/CD, and deployment strategies.

The Value You'll Add:

  • Build and extend components of the core Kumo Cloud Infrastructure and Kumo infrastructure
  • Define a culture of engineering excellence and operational efficiency, especially as it relates to development and productization
  • Build and automate CI-CD pipelines, release tooling to support continuous delivery, and true zero-downtime deployments across different cloud providers using the latest cloud-native technologies
  • Work on advanced tools developed for the world’s leading cloud-native machine learning engine that uses graph deep learning technology
  • Develop the infrastructure microservices for features such as usage tracking, diagnostics, monitoring, and alerting at the cloud scale
  • Lead automation efforts to streamline global deployment effort
  • Build the Kumo ML Ops platform, which will be able to data drift, track model versions, report on production model performance, alert the team of any anomalous model behavior, and run programmatic A/B tests on production models.

Your Foundation:

  • BS (preferred MS, PhD.) in Computer Science or a related field
  • 3+ years of experience writing production code in C++, Python, Go, or similar languages.
  • Experience with Infrastructure-as-Code development (e.g., Terraform, CloudFormation, Ansible, Chef, Bash scripting, etc.)
  • Experience with B2B SaaS and architecting experience in building a large-scale distributed system at scale
  • Experience with productionizing cloud applications, including Docker and Kubernetes 
  • Experience with CI/CD and advanced packaging, versioning, and deployment strategies 
  • Hands-on experience with Kubernetes (e.g., EKS, GKS, AKS, or OpenSource) on public clouds (AWS, GCP) at scale

Your Extra Special Sauce:

  • Experience with popular MLOps tooling from cloud vendors like GCP (Vertex AI), AWS (SageMaker), or Azure Machine Learning, MLFlow, Kubeflow, etc.
  • Experience with managing popular Data platforms such as AWS EMR, Snowflake, Databricks, etc.
  • Experience with industry standard security practices, such as security testing, vulnerability assessments, ISO27001, GRC, and risk under compliance
  • Extensive experience with Docker/Containers, Jenkins/Flux/Argo, and Terraform in a Linux environment
  • Experience with monitoring tools such as Prometheus, Grafana, etc.
  • Proficiency in developing customer-facing Web Front Ends or public APIs/SDKs for the application

Benefits:

  • Stock
  • Competitive Salaries
  • Medical Insurance
  • Dental Insurance



We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Top Skills

C++
Go
Python
The Company
HQ: Mountain View, CA
38 Employees
On-site Workplace
Year Founded: 2021

What We Do

Democratizing AI on the Modern Data Stack!

The team behind PyG (PyG.org) is working on a turn-key solution for AI over large scale data warehouses. We believe the future of ML is a seamless integration between modern cloud data warehouses and AI algorithms. Our ML infrastructure massively simplifies the training and deployment of ML models on complex data.

With over 40,000 monthly downloads and nearly 13,000 Github stars, PyG is the ultimate platform for training and development of Graph Neural Network (GNN) architectures. GNNs -- one of the hottest areas of machine learning now -- are a class of deep learning models that generalize Transformer and CNN architectures and enable us to apply the power of deep learning to complex data. GNNs are unique in a sense that they can be applied to data of different shapes and modalities.

Similar Jobs

BAE Systems, Inc. Logo BAE Systems, Inc.

DevSecOps Engineer (Hybrid)

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
San Diego, CA, USA
40000 Employees
112K-191K Annually

Philo Logo Philo

Sr. Backend Software Engineer (Infrastructure)

Cloud • Digital Media • News + Entertainment • On-Demand
Easy Apply
San Francisco, CA, USA
160 Employees

General Motors Logo General Motors

JR-202421799 Sr. Dev Ops Software Engineer - Commercial Software

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
Mountain View, CA, USA
165000 Employees
152K-233K Annually

Snowflake Logo Snowflake

Senior Software Engineer- Cloud Engineering

Artificial Intelligence • Big Data • Cloud • Machine Learning • Software • Database • Analytics
San Mateo, CA, USA
7630 Employees

Similar Companies Hiring

InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees
RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account