Embedded Data Scientist, Chanakya

Posted Yesterday
Be an Early Applicant
2 Locations
In-Office
Mid level
Artificial Intelligence • Software
The Role
Deploy alongside clients to transform heterogeneous multimodal data into semantic structures for AI reasoning. Design ontologies, segmentation, embedding and retrieval strategies, collaborate on ingestion pipelines, evaluate retrieval and reasoning performance, and translate insights into product and engineering signals.
Summary Generated by Built In
About Sarvam

Sarvam is building the bedrock of Sovereign AI for India. The company is developing India's full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India's leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.

About the Role

Embedded Data Scientists transform complex client data into structures that AI systems can reliably reason over. You are deployed alongside Strategic Deployment Engineers at client sites, working directly with client data environments to understand, structure, and operationalise large-scale datasets.

This means working with heterogeneous, multimodal data — including documents, images, audio, geospatial data, and structured records — and designing the semantic structures that allow AI systems to interpret and reason over that data.

You will define how data is represented inside the AI system: how documents are segmented, how metadata is defined, how entities and relationships are represented, and how different data modalities connect. You will design ontologies, tagging systems, and knowledge graph structures that allow the reasoning engine to operate effectively.

You will often work with classified or operationally sensitive datasets in environments where standard tooling may not exist. You will own the quality of the data layer in your assigned accounts, ensuring the system is built on a foundation that enables reliable reasoning at scale.

What You'll Do
  • Understand the client's data landscape across documents, imagery, audio, geospatial data, and structured records — including data sources, formats, workflows, and domain terminology

  • Design domain ontologies representing entities, relationships, hierarchies, and operational concepts within the client's data environment

  • Define document segmentation and chunking strategies that preserve semantic meaning and support effective retrieval

  • Work with heterogeneous datasets and define how different modalities should be indexed, embedded, and linked

  • Collaborate with Strategic Deployment Engineers to translate semantic structures into operational data ingestion pipelines

  • Evaluate how well the AI system retrieves and reasons over client data, and refine structures to improve performance

  • Collaborate with the models and other teams to define benchmarks and evaluation criteria that reflect real-world deployment conditions

  • Translate insights from client data environments into structured signals for product and engineering teams

What We're Looking ForHard Skills & Experience
  • 2–5 years in data science, applied machine learning, or large-scale data analysis roles

  • Strong Python skills including pandas, NumPy, and modern NLP or LLM tooling

  • Solid grounding in ML fundamentals — enough to understand model behaviour, contribute to evaluation design, and collaborate with a models team on training and benchmarking

  • Experience working with large unstructured datasets including documents, transcripts, reports, or operational records

  • Familiarity with LLM-based systems, retrieval pipelines, or vector search systems

  • Experience designing or working with data schemas, metadata frameworks, entity models, or semantic data structures

Signals We Look For
  • You've worked with real-world, messy, unstructured data and built something rigorous from it

  • You are comfortable designing structure where none exists — defining schemas, ontologies, and metadata frameworks from scratch

  • You can translate complex data insights into explanations that engineers and client stakeholders can act on

Who You Are
  • You are comfortable operating with autonomy in client environments; you don't need a data team around you to do rigorous work

  • You move fluently between domain understanding, data modelling, and AI system design

  • You move between the technical and the operational: you understand what the data means in the context of what operators actually do with it

Bonus Points
  • Familiarity with knowledge graphs, ontologies, or semantic data modelling

  • Experience with multimodal datasets (text, imagery, audio, geospatial, or structured data)

  • Experience operating in constrained or air-gapped environments

Why Sarvam?

Sarvam is a fast-moving, high talent-density team building full-stack AI for India, working on problems that push the frontiers of AI with real population-scale impact.

  • Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar

  • High ownership and high impact, from day one

  • Everything we do is AI-first, from the way we build and ship to the way we think about problems

  • You can work on problems that could change how an entire country learns, works, and communicates

If you want to work on problems at the frontier of AI in India, Sarvam is the place to be.

Skills Required

  • 2-5 years in data science, applied machine learning, or large-scale data analysis roles
  • Strong Python skills including pandas and NumPy
  • Experience with modern NLP or LLM tooling
  • Solid grounding in ML fundamentals to understand model behavior and contribute to evaluation design
  • Experience working with large unstructured datasets (documents, transcripts, reports, operational records)
  • Familiarity with LLM-based systems, retrieval pipelines, or vector search systems
  • Experience designing or working with data schemas, metadata frameworks, entity models, or semantic data structures
  • Familiarity with knowledge graphs, ontologies, or semantic data modelling
  • Experience with multimodal datasets (text, imagery, audio, geospatial, structured)
  • Experience operating in constrained or air-gapped environments
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Bangalore, Karnataka
50 Employees
Year Founded: 2023

What We Do

We are an AI/ML research and development company on a mission to build reliable, performant, enterprise-grade AI systems at scale for India. We are committed to build the full-stack for generative AI for the rich & diverse landscape of India, mainly investing in: 1) Models: developing both efficient large scale Indic language models as well as bespoke enterprise models 2) Platform: building an enterprise-grade platform that empowers organisations to develop and ship creative and performant genAI applications at scale 3) Ecosystem: contributing to open-source models and datasets, as well as leading efforts for large scale data curation in public-good space

Similar Jobs

Remote or Hybrid
India
897 Employees

Capco Logo Capco

Delivery Coach

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
India
6000 Employees

Pfizer Logo Pfizer

Healthcare Executive

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
In-Office
Delhi, Connaught Place, New Delhi, Delhi, IND
121990 Employees

Comcast Logo Comcast

Workday Technical Analyst 4

Digital Media • Information Technology • News + Entertainment
Remote or Hybrid
India
115000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account