ML Training Platform Intern (6 months)

Reposted 11 Days Ago
Be an Early Applicant
Seattle, WA
In-Office
Internship
Artificial Intelligence • Information Technology • Software
The Role
The ML Training Platform Intern will learn distributed training architectures, implement solutions, assist with customer workshops, and contribute to optimization tools and documentation.
Summary Generated by Built In

aion is building the next generation of AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performance, aion democratizes access to compute power for AI training, fine-tuning, inference, data labeling, and beyond.

By leveraging underutilized resources such as idle GPUs and data centers, AION provides a scalable, cost-effective, and sustainable solution tailored for developers, researchers, and enterprises.

Led by high-pedigree founders with previous exits, aion is well-funded by major VCs with strategic global partnerships. Headquartered in the US with global presence, the company is building its initial core team in London, Seattle and India.

Who You Are

You're an aspiring ML engineer passionate about distributed training and helping customers succeed with large-scale ML workloads. You love solving complex technical problems, learning from customer challenges, and building solutions that accelerate AI development. You're excited to learn cutting-edge training techniques while working directly with customers to implement distributed training architectures and advanced ML workflows.


RequirementsKey Responsibilities
  • Learn and implement distributed training architectures including data parallelism, model parallelism, and pipeline parallelism under mentorship.
  • Build reference implementations for training workflows including DDP setups, gradient synchronization, and multi-GPU configurations.
  • Develop training optimization tools including efficient data loading pipelines, memory optimization techniques, and performance monitoring.
  • Create customer documentation and tutorials covering distributed training best practices and implementation guides.
  • Assist with customer workshops and training sessions on distributed training methodologies and platform usage.
  • Build debugging and profiling tools for identifying bottlenecks in distributed training workloads.
  • Experiment with emerging techniques including reward model training, DPO optimization, and constitutional AI workflows.
  • Contribute to training framework improvements based on customer feedback and platform optimization opportunities.
Skills & Experience
  • High agency individual looking to own customer success and influence training platform architecture.
  • Working knowledge of deep learning fundamentals including neural networks, transformers, and basic training/inference concepts.
  • Working PyTorch experience with some knowledge of distributed training, DDP implementation, and multi-GPU optimization.
  • High level understanding of distributed training techniques including data parallelism, model parallelism, pipeline parallelism.
  • Basic working knowledge of any of the training infrastructure tools such as Megatron-LM, DeepSpeed, FairScale, or similar frameworks.
  • Surface level understanding of reasoning techniques including Chain-of-Thought prompting and advanced reasoning workflows.
  • Previous internships or projects in ML infrastructure, contributions using PyTorch/ML frameworks, competitive programming achievements, research experience in ML systems, familiarity with agent systems or reasoning techniques.
  • Strong coding and implementation skills in Python and C++ with demonstrated ability to write performant, production-quality code.
  • Experience reading and contributing to large codebases with proof of open-source contributions (GitHub profile required).
  • Proof of technical work through projects like Google Summer of Code, hackathon wins, competitive programming, or significant open-source contributions.

Benefits
  • Join the ground floor of a mission-driven AI startup revolutionizing compute infrastructure.
  • Learn from world-class engineers and gain hands-on experience with cutting-edge inference optimization techniques.
  • Work with a high-caliber, globally distributed team backed by major VCs.
  • Significant learning and growth opportunity in one of the fastest-moving areas of AI infrastructure.
  • Competitive internship compensation with potential for full-time conversion.
  • Fast-paced, flexible work environment with room for ownership and impact.

In case you got any questions about the role please reach out to hiring manager on linkedin or X.

Top Skills

C++
Python
PyTorch
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
21 Employees
Year Founded: 2023

What We Do

Everyday AI Platform: aion collapses the entire ai development lifecycle into a single, unified workspace. From data to deployment - everything at your fingertips. aion simplifies AI infrastructure the way Stripe simplified payments:

Plug-and-Play Multi-Provider Access
Customer Infrastructure Management
Deploy and optimize AI infrastructure via prompts with integrated cost tracking and performance analytics
Partner Sales & Resource Optimization

Track opportunities with confidential pricing, manage real-time inventory allocation, and monitor profitability from aion workloads

Similar Jobs

ServiceNow Logo ServiceNow

Consultant

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Kirkland, WA, USA
147K-242K Annually

ServiceNow Logo ServiceNow

Senior Linux System Admin - Federal

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Kirkland, WA, USA
127K-215K Annually

ServiceNow Logo ServiceNow

Sr Manager Solution Consulting - (Strat Tech)

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Seattle, WA, USA
149K-246K Annually
Hybrid
Lake Stevens, WA, USA
23-31

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account