Staff Software Engineer, Platform Infrastructure (Foundations)

Reposted 23 Hours Ago
2 Locations
Hybrid
Mid level
Artificial Intelligence • Software
The Role
The Software Engineer will develop and optimize infrastructure for distributed AI workloads, focusing on cloud-native deployments and systems orchestration while enhancing reliability, performance, and scalability.
Summary Generated by Built In

About Anyscale:


At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, Cruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.


With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.


Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.

About the Role

Anyscale is seeking a Staff Software Engineer to lead the technical vision for our Infrastructure team. As a Staff Engineer, you will be responsible for the architectural evolution of our control plane and data plane, ensuring that our "infinite laptop" vision scales to meet the most demanding distributed AI workloads in the world. You will act as a force multiplier, setting the standards for Kubernetes-based cloud-native infrastructure while mentoring engineers and driving cross-functional alignment across the Ray open-source community and our proprietary product teams.

 
 
Key Responsibilities
  • Architectural Leadership: Define and drive the multi-year technical roadmap for services that orchestrate Ray clusters across diverse cloud and on-premises environments.

  • Systemic Optimization: Lead the design and optimization of high-performance control plane components specifically tailored for large-scale, heterogeneous AI/ML workloads.

  • Platform Reliability: Establish the organization-wide standards for the reliability, scalability, and observability of Anyscale-managed infrastructure.

  • Strategic Integration: Direct the long-term strategy for accelerator integration (GPUs, TPUs) and container management to ensure seamless execution of distributed workloads.

  • Technical Governance: Lead complex design and architecture discussions, resolving deep technical debt and ensuring engineering excellence across the organization.

  • Cross-Functional Influence: Partner with ML experts and customer-facing teams to translate market needs into robust infrastructure foundations.

 
 
Qualifications
  • Experience: 5+ years of experience writing high-quality production code and leading complex distributed systems projects.

  • Architectural Depth: Proven track record of designing and maintaining highly available, scalable, and secure cloud-native platforms (AWS, Azure, or GCP).

  • Kubernetes Mastery: Deep expertise in Kubernetes-based deployments and container orchestration at massive scale.

  • System Foundations: Advanced knowledge of Linux kernel, networking, and low-level operating system foundations.

  • Technical Proficiency: Mastery of Go and Python, with the ability to set coding standards and best practices for the team.

  • Leadership Skills: Demonstrated ability to mentor senior engineers, influence technical direction without direct authority, and navigate complex trade-offs in a fast-paced environment.

Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. 

Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish

Skills Required

  • Bachelor's degree in Computer Science or equivalent experience
  • 3+ years of experience writing high-quality production code
  • Experience in building and maintaining scalable distributed systems
  • Expertise in cloud-native technologies and Kubernetes
  • Understanding of cloud networking, security, and authentication
  • Familiarity with observability stacks like Prometheus, Grafana
  • Proficiency in Go and Python
  • Knowledge of Linux kernel and container foundations

Anyscale Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Anyscale and has not been reviewed or approved by Anyscale.

  • Fair & Transparent Compensation Pay is considered market-based with target ranges shown in postings and a stated market-based philosophy. Feedback suggests this clarity and consistency aid confidence in pay fairness.
  • Equity Value & Accessibility Equity is commonly included in offers and is positioned as a meaningful part of total compensation for many roles. Feedback suggests this equity participation enhances perceived overall pay competitiveness.
  • Healthcare Strength Health, dental, and vision coverage are described as robust with many plan options, alongside mental-health support and fertility benefits. Feedback suggests this strong core healthcare offering increases perceived benefits quality.

Anyscale Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
115 Employees
Year Founded: 2019

What We Do

Distributed computing made simple Anyscale enables developers of all skill levels to easily build applications that run at any scale, from a laptop to a data center.

Similar Jobs

Commerce Logo Commerce

Senior Product Manager

Artificial Intelligence • Cloud • Consumer Web • eCommerce • Information Technology • Software
In-Office or Remote
3 Locations
1200 Employees
121K-182K Annually

Benchling Logo Benchling

Head of Scientific Development Solutions Engineering

Cloud • Healthtech • Social Impact • Software • Biotech
Hybrid
San Francisco, CA, USA
605 Employees
232K-386K Annually

Mastercard Logo Mastercard

Sales Manager

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Remote or Hybrid
San Diego, CA, USA
38800 Employees
115K-197K Annually

Micron Technology Logo Micron Technology

Armed Executive Protection Agent

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
San Jose, CA, USA
45000 Employees
119K-202K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account