Weekday, Inc.

PyTorch & MLOps AI Specialist

Posted 3 Days Ago

Hiring Remotely in United States

Remote

70-110 Hourly

Junior

Artificial Intelligence • HR Tech • Professional Services • Software

The Role

Contribute to generative AI model training and evaluation by designing and solving ML infrastructure and systems challenges. Build and optimize distributed training, custom GPU kernels, evaluation frameworks, and provide technical reviews and feedback to improve training data and model capabilities.

Summary Generated by Built In

This role is for one of our clients

Compensation: $70-$110 per hour

Join a leading AI lab's cutting-edge Generative AI team and play a key role in developing next-generation large language models. We are seeking experienced MLOps and ML Systems Engineers with deep expertise in PyTorch and kernel-level programming frameworks such as Triton or Pallas.

In this role, you will contribute to AI model training and evaluation initiatives by designing, solving, and reviewing advanced machine learning infrastructure and systems challenges. Your expertise will help improve the quality of training data used to develop frontier AI systems.

This is a full-time (40 hours/week) engagement supporting high-impact AI research and engineering efforts.

RequirementsKey Responsibilities

Partner with research and engineering teams to identify and address knowledge gaps in MLOps, machine learning infrastructure, and model training systems.
Design challenging, real-world tasks focused on distributed training, ML frameworks, model optimization, and infrastructure engineering.
Develop accurate, well-structured solutions to complex MLOps and ML systems problems.
Evaluate technical tasks and solutions, providing detailed and actionable feedback.
Create evaluation frameworks and scoring rubrics for training pipeline architecture, distributed systems reasoning, performance optimization, and kernel-level programming.
Contribute domain expertise to improve AI model capabilities in machine learning engineering topics.
Collaborate with other subject matter experts to ensure consistency, quality, and technical accuracy across datasets and evaluations.

Required Qualifications

2+ years of professional experience in ML Infrastructure, MLOps, ML Systems Engineering, or a closely related field.
Strong hands-on experience building and operating production-scale machine learning systems.
Advanced proficiency with PyTorch, including model training, optimization, and deployment workflows.
Experience developing, tuning, or optimizing custom GPU kernels using Triton, Pallas, or similar frameworks.
Demonstrated career growth and increasing technical responsibility.
Ability to commit to a full-time, 40-hour-per-week schedule during standard business days.
Excellent written communication skills and the ability to clearly explain complex technical concepts and engineering decisions.

Preferred Qualifications

Experience with large-scale distributed training frameworks and infrastructure.
Knowledge of GPU performance optimization and compiler-level ML tooling.
Familiarity with modern AI training pipelines, model evaluation methodologies, and LLM development workflows.
Experience mentoring engineers or contributing to technical standards and best practices.
Background in cloud-native ML infrastructure and production deployment environments.

Why Join

Work alongside leading AI researchers and engineers on frontier AI systems.
Influence the development and evaluation of next-generation large language models.
Apply your expertise to solve challenging machine learning infrastructure and optimization problems.
Contribute to high-impact projects at the forefront of AI innovation.

Additional Information

Full-time engagement requiring 40 hours per week.
Dedicated commitment is expected during the engagement period.
Responsibilities and project scope may evolve based on research priorities and business needs.

Equal Opportunity Statement

All qualified applicants will be considered without regard to legally protected characteristics. Reasonable accommodations are available upon request.

Skills Required

2+ years of professional experience in ML Infrastructure, MLOps, ML Systems Engineering, or a closely related field.
Hands-on experience building and operating production-scale machine learning systems.
Advanced proficiency with PyTorch, including model training, optimization, and deployment workflows.
Experience developing, tuning, or optimizing custom GPU kernels using Triton, Pallas, or similar frameworks.
Ability to commit to a full-time, 40-hour-per-week schedule during standard business days.
Excellent written communication skills and ability to explain complex technical concepts and engineering decisions.
Demonstrated career growth and increasing technical responsibility.
Experience with large-scale distributed training frameworks and infrastructure.
Knowledge of GPU performance optimization and compiler-level ML tooling.
Familiarity with modern AI training pipelines, model evaluation methodologies, and LLM development workflows.
Experience mentoring engineers or contributing to technical standards and best practices.
Background in cloud-native ML infrastructure and production deployment environments.

View all jobs at Weekday, Inc.

View Weekday, Inc. Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

Year Founded: 2021

What We Do

Weekday is an AI-powered recruitment platform that helps startups hire top-tier engineering and product talent. By leveraging a massive database of white-collar professionals and advanced outreach tools, the company streamlines the hiring process through automated sourcing, AI-driven resume screening, and white-glove contingency services. Their mission is to modernize recruitment by enabling companies to discover and engage passive candidates efficiently, ensuring high-quality hires for critical roles.