Machine Learning Engineer — Multilingual Data

Reposted 20 Days Ago
Hiring Remotely in World Golf Village, FL, USA
In-Office or Remote
Mid level
Artificial Intelligence • Information Technology • Software
The Role
Design and maintain multilingual datasets, develop data pipelines, implement quality filters, and analyze dataset biases while collaborating with researchers.
Summary Generated by Built In

We’re looking for a Machine Learning Engineer to own and scale our multilingual data pipeline—from sourcing and curation to evaluation and continuous improvement. You’ll work closely with researchers and infra engineers to ensure our models perform robustly across languages, scripts, and cultural contexts.

This role sits at the intersection of data, research, and production ML and is ideal for someone who cares deeply about data quality, linguistic diversity, and model generalization beyond English.

What You’ll Do
  • Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages

  • Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling

  • Implement quality filters using statistical, heuristic, and model-based methods

  • Work with researchers to define language coverage, benchmarks, and evaluation metrics

  • Analyze dataset bias, coverage gaps, and failure modes across regions and scripts

  • Support training, fine-tuning, and distillation workflows with high-quality multilingual data

  • Continuously iterate on datasets based on model performance and real-world usage

What We’re Looking For
  • 3+ years of experience as an ML Engineer, Applied Scientist, or similar role

  • Strong experience working with multilingual or non-English datasets

  • Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)

  • Experience building scalable data pipelines (Python, Spark, Ray, or similar)

  • Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks

  • Comfort collaborating with researchers and translating research needs into production systems

Nice to Have
  • Experience with low-resource languages or multilingual benchmarks (e.g. FLORES, XTREME)

  • Exposure to LLM training, fine-tuning, or distillation

  • Linguistics background or experience working with native language experts

  • Contributions to open-source datasets or ML tooling

  • Experience with data quality evaluation at scale

Why Join
  • Real ownership over a core differentiator of the product

  • Work on models used globally, not just in English-speaking markets

  • Small, high-caliber team with deep ML and systems experience

  • Competitive compensation + meaningful equity at Series A stage

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
20 Employees
Year Founded: 2023

What We Do

We enable serverless inference via our GPU orchestration and model load-balancing system. We unlock fine-tuning by enabling organizations to size their server fleet to throughput needs, not number of models in the catalogue. See it in action on our public cloud, which offers inference for 10k+ open weight models.

Similar Jobs

General Motors Logo General Motors

Sales Manager

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Remote or Hybrid
United States
165000 Employees

General Motors Logo General Motors

Designer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Remote or Hybrid
United States
165000 Employees
135K-208K Annually

General Motors Logo General Motors

Commercial Zone Manager North Central Region - GM Fleet

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Remote or Hybrid
United States
165000 Employees

Rula Logo Rula

Project Manager

Healthtech • Other • Social Impact • Software • Telehealth
Remote
United States
595 Employees
95K-106K Annually

Similar Companies Hiring

Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account