About Anyscale:
At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, Cruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.
With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.
Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.
About the role
Ray aims to provide a universal API for building distributed applications. To achieve this goal requires a distributed system with high levels of performance and reliability. We're looking for engineers with systems software experience that are interested in contributing to the Ray backend.
About the Ray Core Team
The Ray Core team develops and maintains the Ray C++ backend (e.g., distributed scheduler, language runtime integration, I/O and memory subsystems). We are responsible for the reliability, scalability, and performance of Ray as well as ensuring that Ray provides the right feature set to support higher level libraries and use cases. The team works on a balance of new features / distributed libraries, test infra improvements, debugging, and longer-term architectural improvements to Ray.
A snapshot of projects you can work on:
- Optimizing performance of large-scale workloads on Ray
- Stability and stress testing infrastructure
- Improving fault tolerance (HA)
As part of this role, you will:
Develop high quality open source software to simplify distributed programming (Ray)
Identify, implement, and evaluate architectural improvements to Ray core
Improve the testing process for Ray to make releases as smooth as possible
Communicate your work to a broader audience through talks, tutorials, and blog posts
We'd love to hear from you if have:
At least 2 year of relevant work experience
Solid background in algorithms, data structures, system design
Experience in building scalable and fault-tolerant distributed systems
Knowledge of distributed model training and inference (e.g. tensor parallel, pipeline parallel) is preferred
Knowledge of GPU programming is preferred
Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.
Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish
Top Skills
What We Do
Distributed computing made simple
Anyscale enables developers of all skill levels to easily build applications that run at any scale, from a laptop to a data center.