Achira

SWE - Distributed

Posted 3 Days Ago

Be an Early Applicant

2 Locations

In-Office

Mid level

Biotech • Pharmaceutical

The Role

You will design and optimize distributed compute infrastructure for ML applications, improve resource utilization, and collaborate with ML engineers on large-scale workloads.

Summary Generated by Built In

Why Achira

Join a world-class team of scientists, ML researchers, and engineers working together to reshape the future of drug discovery.
Work on cutting edge ML infrastructure at frontier scale: massive compute, massive data, and massive ambition.
Own impactful work end-to-end — from ideation to architecture to deployment on large-scale infrastructure.
Work in an environment that rewards rigor, speed, and a builder’s mindset.

About the Role

Achira is building best-in-class foundation models to solve the most challenging problems in simulation for drug discovery and beyond. Atomistic Foundation simulation models (FSMs) as world models of the physical microcosm span machine learning interaction potentials (MLIPs), neural network potentials (NNPs), and diverse classes of generative models.

We're seeking a Software Engineer passionate about distributed computing and its applications in machine learning. You'll have the opportunity to architect and build from the ground up the infrastructure for our ML data generation pipelines, model training, and fine-tuning workflows across large-scale distributed systems.

Your expertise will ensure our compute clusters are efficient, observable, cost-effective, and reliable—helping us push the boundaries of ML development. If you're passionate about distributed systems, performance optimization, and cloud cost efficiency, we'd love to hear from you.

You'll be empowered to eat, breathe, and think about the orchestration of complex workloads on multiple vendors scattered anywhere on the planet. Achira is a company which lives and breaths on computation, facile access at the lowest cost for our uniquely suited workloads is a mission critical endeavor.

What You’ll Do

Architect & Build: Design, implement, and optimize distributed compute infrastructure for ML data processing, training, and fine-tuning.
Optimize & Monitor: Improve cluster observability, scheduling, and resource utilization (CPU/GPU/TPU).
Compute Efficiency: Research and implement cost-efficient compute solutions (spot instances, auto-scaling, multi-cloud strategies).
Tooling: Develop tools for monitoring, debugging, and performance tuning of large-scale ML workloads.
Collaboration: Collaborate with ML engineers to accelerate training pipelines and reduce bottlenecks.
Innovation: Stay current with emerging technologies in distributed computing (e.g., Ray, Kubernetes, Spark, Slurm) and apply them strategically.

About You

You are excited about and have lots of experience in building or working with distributed computing frameworks (e.g., Ray, Dask, Celery)
You have a good grasp of parallel computing, job scheduling, and resource management.
You're comfortable identifying and resolving performance issues in distributed systems (profiling, bottlenecks, network overhead)
You've implemented solutions using cloud compute platforms (AWS, GCP, Azure) and cluster orchestration (Kubernetes, Slurm)
You are familiar with popular ML frameworks (PyTorch, TensorFlow, or JAX) and MLOps best practices such as model deployment and GPU performance monitoring

Eligibility

In compliance with United States federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to provide required employment eligibility verification documentation upon hire.