Associate Data Scientist

Posted 7 Days Ago
Hiring Remotely in India
Remote
Junior
Artificial Intelligence • Cloud • Hardware • Software
The Role
Prepare, clean, and validate structured and unstructured data for LLM-driven systems; build training datasets, support RAG and NL->SQL pipelines, perform data quality checks, and assist in data pipelines/APIs and model evaluation.
Summary Generated by Built In
Role Overview

We are seeking an Associate Data Scientist to support AI/ML engineering efforts by preparing, validating, and structuring data for LLM-driven systems. This is a hands-on role focused on real-world data processing, pipeline support, and model evaluation.

Key Responsibilities
  • Process and clean structured and unstructured data for AI/ML pipelines.

  • Prepare training-ready datasets for LLM fine-tuning and evaluation workflows.

  • Support RAG and NL→SQL systems through data preparation and validation.

  • Perform data quality checks and ensure completeness and consistency.

  • Assist in building and maintaining data pipelines and APIs (e.g., FastAPI).

  • Collaborate with engineering teams to troubleshoot and optimize data workflows.

Required Skills
  • 1–3 years of experience in data processing or data-focused roles.

  • Strong Python skills with experience in data libraries (Pandas, NumPy, Scikit-learn).

  • Experience supporting LLM workflows (fine-tuning, prompt engineering, evaluation).

  • Familiarity with structured (SQL) and unstructured text data.

  • Understanding of data preparation for AI/ML systems.

Nice to Have
  • Exposure to RAG pipelines, embeddings, or evaluation metrics.

  • Experience with ML frameworks (PyTorch/TensorFlow) and Docker-based workflows.

  • Experience with CI/CD pipelines for ML systems.

  • Familiarity with vector databases (e.g., Chroma) and reranking techniques.

  • Research exposure to Transformer-based architectures.

Top Skills

Python,Pandas,Numpy,Scikit-Learn,Sql,Fastapi,Llms
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
28 Employees

What We Do

Billions of unstructured engineering and manufacturing data points turned into actionable insights.
The most accurate AI for product engineering and manufacturing unstructured data. 5x more relevant output than generalized LLMs.

Apiphany connects and transforms your existing unstructured systems, including RFPs, program planning, design specs, functional requirements, test data, purchasing, finance, field reports, and FMEAs, into real-time intelligence.

Similar Jobs

CrowdStrike Logo CrowdStrike

Automation Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
India
10000 Employees

Motorola Solutions Logo Motorola Solutions

Devops Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
India
23000 Employees

Motorola Solutions Logo Motorola Solutions

Devops Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
India
23000 Employees

Snyk Logo Snyk

Technical Support

Artificial Intelligence • Cloud • Information Technology • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
India
1000 Employees

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account