Machine Learning Engineer, Training Infrastructure

Reposted 2 Days Ago
San Francisco, CA
In-Office
150K-250K Annually
Mid level
HR Tech • Information Technology
The Role
Manage and optimize computational infrastructure for training ML models, ensuring scalability for large datasets and performance optimization.
Summary Generated by Built In
Job Title: Machine Learning Engineer, Training Infrastructure
Position Type: Full time
Location: San Francisco, CA, USA
Salary Range: $150,000 - $250, 000 (USD)
Job ID#: 158135
Job Description:

We are looking for an ML Engineer with 3+ YOE in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate has diverse experience managing ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if you don't meet every requirement — we value curiosity, creativity, and the drive to solve hard problems.

Responsibilities
  • Design, implement, and maintain scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets.

  • Manage and optimize the performance of our computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed training.

  • Ensure that our infrastructure can handle the resource-intensive tasks associated with training large generative models.

  • Monitor system performance and implement improvements to maximize efficiency and utilization, using tools like Airflow for orchestration.

  • Collaborate across research teams to understand their computational needs and provide appropriate solutions, facilitating seamless model deployment.

Requirements:
  • Bachelor’s degree in Computer Science, Information Technology, or a related field, with a focus on system administration.

  • Experience with cloud computing platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure, essential for managing large-scale ML workloads.

  • This role is vital for ensuring the computational backbone supports the company’s ML efforts, focusing on deployment and scalability.

  • Values engineering processes and version control (CI/CD).

  • Knowledge of containerization technologies like Docker and Kubernetes required for deployments at scale.

  • Understanding of distributed training techniques and how to scale models across multi-node clusters aligning with video generation needs.

  • Strong problem-solving and communication skills, given the need to collaborate with diverse teams.

About Us:
Founded in 2009, IntelliPro is a global leader in talent acquisition and HR solutions. Our commitment to delivering unparalleled service to clients, fostering employee growth, and building enduring partnerships sets us apart. We continue leading global talent solutions with a dynamic presence in over 160 countries, including the USA, China, Canada, Singapore, Japan, Philippines, UK, India, Netherlands, and the EU.
IntelliPro, a global leader connecting individuals with rewarding employment opportunities, is dedicated to understanding your career aspirations. As an Equal Opportunity Employer, IntelliPro values diversity and does not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, genetic information, disability, or any other legally protected group status. Moreover, our Inclusivity Commitment emphasizes embracing candidates of all abilities and ensures that our hiring and interview processes accommodate the needs of all applicants. Learn more about our commitment to diversity and inclusivity at https://intelliprogroup.com/.
Compensation: The pay offered to a successful candidate will be determined by various factors, including education, work experience, location, job responsibilities, certifications, and more. Additionally, IntelliPro provides a comprehensive benefits package, all subject to eligibility.

Top Skills

Airflow
AWS
Docker
GCP
High-Performance Computing
Kubernetes
Machine Learning
Azure
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
638 Employees
Year Founded: 2009

What We Do

IntelliPro Group Inc. is one of the fastest growing IT services and HR solutions companies in Americas & APAC. We provide comprehensive IT services to help clients with IT Strategic Planning, Implementation, Deployment, IT Support on Artificial Intelligence, Big Data, Cloud Computing, Mobile Application Development, Data Mining and Business Intelligence, Enterprise Data Warehouse, and more.

Besides our established IT services, our new business now is quickly extending to one-stop HR Solution Services, including Oversea Branch Setup Consulting, Compensation & Benefits Policy Consulting, Payroll Management Service, Talent Recruiting, and Employer Branding to satisfy our clients’ fast business expansion requirement.

We have built our business on our company-wide commitment to continually overdeliver on the high expectations of our clients, employees, and business partners. The secret to our success is that our unified team works harder, faster, smarter, and more collaboratively than anyone else in the talent acquisition business. In addition to the immense talent and proprietary technology, IntelliPro Group is proud to offer continual professional development and extraordinary benefits to both consultants and full-time employees.

Similar Jobs

Hedra Logo Hedra

Machine Learning Engineer

Consumer Web • Digital Media • Enterprise Web • Marketing Tech • News + Entertainment • Software • Generative AI
In-Office
San Francisco, CA, USA
14 Employees

Snap Inc. Logo Snap Inc.

Software Engineer

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
5 Locations
5000 Employees
178K-313K Annually

P-1 AI Logo P-1 AI

Machine Learning Engineer

Artificial Intelligence • Software
In-Office
San Francisco, CA, USA
11 Employees

Similar Companies Hiring

Compa Thumbnail
Software • Other • HR Tech • Business Intelligence • Artificial Intelligence
Irvine, CA
60 Employees
Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account