ML Engineer - Inference

Reposted 4 Days Ago
San Jose, CA, USA
In-Office
Mid level
Artificial Intelligence • Software • Generative AI
The Role
As an ML Engineer specializing in inference, you will optimize models for production, implement neural network techniques, and collaborate with researchers.
Summary Generated by Built In
About us:

At Phota Labs, we’re building visual GenAI that helps people capture, express, and relive their memories — in ways that feel effortless, personal, and emotionally resonant. Our core technology enables personalized image generation that faithfully reflects who you are and the moments you experienced. Our first goal is to bring visual GenAI into everyday photography.

We're a small team of researchers, engineers, and designers who have always been at the forefront of how people capture, edit, and share images and videos. We build with our hands and hearts. We believe GenAI is the next shift for photography, and are seeking builders who share this vision — people like us, like you. We're just getting started!


The role:

As our first ML Engineer specializing in inference and optimization, you'll bridge the gap between cutting-edge research models and production systems. Your expertise will transform PyTorch research code into highly optimized, low-latency inference solutions that power our user-facing applications. You'll work closely with our GenAI researchers, vision ML engineers, and backend team to deliver exceptional performance.

What you’ll do:

  • Deploy and integrate researcher-trained model checkpoints into our cloud infrastructure and production pipelines.
  • Conduct thorough performance profiling and benchmarking to identify and eliminate computational bottlenecks.
  • Implement neural network optimization techniques including quantization, pruning, and architectural refinements while preserving model accuracy.
  • Develop efficient training and fine-tuning strategies with optimal precision trade-offs and parallelism.
  • Build and maintain scalable multi-GPU inference solutions with sophisticated model parallelism and serving architectures.
  • Collaborate with the research team to ensure optimization integrate smoothly with model development workflows.

You may be a strong fit if you:

  • Have experience deploying and optimizing deep learning models for production environments, particularly with multi-GPU inference and large-scale model serving.
  • Are well-versed in cutting-edge techniques for optimizing both inference and training workloads.
  • Possess strong knowledge of efficient attention mechanisms and algorithms.
  • Have hands-on experience implementing model quantization and working with inference frameworks.
  • Can write production-quality code and successfully integrate ML models into robust inference pipelines.
  • Are familiar with various cloud platforms, storage solutions, and modern training frameworks.

Logistics:

  • This role is based in San Jose, where we work in person. We believe the best ideas come from being in the same room.
  • We sponsor visas. We are committed to working through the process together for the right candidates. If you're currently outside the US, we're also committed to helping you relocate to the US throughout this process.
  • We offer generous health, dental, and vision coverage, unlimited PTO, paid parental leave, and relocation support as needed.
  • Don't meet every single qualification? That’s okay — we care more about your trajectory than checking every box. If the role excites you and the mission resonates, we'd love to hear from you.

Note: In the event your application is successful and an offer of employment is made to you, any offer of employment will be conditional on the results of a background check, performed by a third party acting on our behalf.

Skills Required

  • Experience deploying and optimizing deep learning models in production environments
  • Familiarity with multi-GPU inference and large-scale model serving
  • Knowledge of attention mechanisms and optimization algorithms
  • Hands-on experience with model quantization and inference frameworks
  • Ability to write production-quality code
  • Familiarity with cloud platforms and modern training frameworks
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Jose, CA
8 Employees

What We Do

Reimagine photography with personalized GenAI.

Similar Jobs

Rhoda AI Logo Rhoda AI

Machine Learning Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics
In-Office
Mountain View, CA, USA
73 Employees

General Motors Logo General Motors

Senior ML Inference Engineer - Platform

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Remote or Hybrid
3 Locations
165000 Employees
129K-261K Annually

Unity Logo Unity

Senior Back-end Engineer

AdTech • Artificial Intelligence • Gaming • Machine Learning • Software • Virtual Reality • Metaverse
Hybrid
Mountain View, CA, USA
4500 Employees
136K-237K Annually
In-Office
2 Locations
2359 Employees
170K-216K Annually

Similar Companies Hiring

Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
LTX Thumbnail
Conversational AI • Generative AI
Jerusalem, Israel
360 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account