Join a world-class team of scientists, ML researchers, and engineers working together to reshape the future of drug discovery.
Work on cutting edge ML infrastructure at frontier scale: massive compute, massive data, and massive ambition.
Own impactful work end-to-end — from ideation to architecture to deployment on large-scale infrastructure.
Work in an environment that rewards rigor, speed, and a builder’s mindset.
Achira is building best-in-class foundation models to solve the most challenging problems in simulation for drug discovery and beyond. Atomistic Foundation simulation models (FSMs) as world models of the physical microcosm span machine learning interaction potentials (MLIPs), neural network potentials (NNPs), and diverse classes of generative models.
We're seeking a Software Engineer passionate about distributed computing and its applications in machine learning. You'll have the opportunity to architect and build from the ground up the infrastructure for our ML data generation pipelines, model training, and fine-tuning workflows across large-scale distributed systems.
Your expertise will ensure our compute clusters are efficient, observable, cost-effective, and reliable—helping us push the boundaries of ML development. If you're passionate about distributed systems, performance optimization, and cloud cost efficiency, we'd love to hear from you.
You'll be empowered to eat, breathe, and think about the orchestration of complex workloads on multiple vendors scattered anywhere on the planet. Achira is a company which lives and breaths on computation, facile access at the lowest cost for our uniquely suited workloads is a mission critical endeavor.
What You’ll DoArchitect & Build: Design, implement, and optimize distributed compute infrastructure for ML data processing, training, and fine-tuning.
Optimize & Monitor: Improve cluster observability, scheduling, and resource utilization (CPU/GPU/TPU).
Compute Efficiency: Research and implement cost-efficient compute solutions (spot instances, auto-scaling, multi-cloud strategies).
Tooling: Develop tools for monitoring, debugging, and performance tuning of large-scale ML workloads.
Collaboration: Collaborate with ML engineers to accelerate training pipelines and reduce bottlenecks.
Innovation: Stay current with emerging technologies in distributed computing (e.g., Ray, Kubernetes, Spark, Slurm) and apply them strategically.
You are excited about and have lots of experience in building or working with distributed computing frameworks (e.g., Ray, Dask, Celery)
You have a good grasp of parallel computing, job scheduling, and resource management.
You're comfortable identifying and resolving performance issues in distributed systems (profiling, bottlenecks, network overhead)
You've implemented solutions using cloud compute platforms (AWS, GCP, Azure) and cluster orchestration (Kubernetes, Slurm)
You are familiar with popular ML frameworks (PyTorch, TensorFlow, or JAX) and MLOps best practices such as model deployment and GPU performance monitoring
Eligibility
In compliance with United States federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to provide required employment eligibility verification documentation upon hire.
Top Skills
What We Do
Achira is building atomistic foundation simulation models to power the future of drug discovery.






