Sarvam is building the bedrock of Sovereign AI for India. The company is developing India's full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India's leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.
About the RoleYou will work across the full lifecycle of vision-language model (VLM) development — data, training, evaluation, and production. The team's scope will evolve as the field does; we want researchers who are comfortable with that and can lead.
What You'll DoResearch vision-language architectures — encoders, fusion mechanisms, pretraining objectives, and scaling behaviour
Design training methods (pretraining, SFT, RLHF, DPO) adapted for multilingual VLMs
Investigate data strategies — what mixtures, quality signals, and synthetic data approaches actually move the needle
Build evaluation frameworks and benchmarks, especially for Indic multimodal tasks
Study model failure modes, robustness, and interpretability
Work closely with engineers to ensure ideas are testable at scale — prototype fast, then validate properly
Engage with the broader research community through open-source contributions and collaborations
Deep understanding of vision-language models — training dynamics, architecture tradeoffs, and failure modes
Track record of good research — through publications, technical reports, or impactful shipped work
Rigorous experimental design — able to isolate variables and draw defensible conclusions
Strong PyTorch skills — runs experiments end to end
Intellectual range — willing to work across data, training, and evaluation problems
PhD/Master's with relevant research experience in ML, Computer Vision, NLP, or related field
Research papers published at A/A* venues
Experience with multilingual or low-resource language modelling
Familiarity with document understanding, OCR, or structured visual prediction
Experience with large-scale data curation and its effect on model quality
Sarvam is a fast-moving, high talent-density team building full-stack AI for India, working on problems that push the frontiers of AI with real population-scale impact.
Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar
High ownership and high impact, from day one
Everything we do is AI-first, from the way we build and ship to the way we think about problems
You can work on problems that could change how an entire country learns, works, and communicates
If you want to work on problems at the frontier of AI in India, Sarvam is the place to be.
Skills Required
- Deep understanding of vision-language models, training dynamics, architecture tradeoffs, and failure modes
- Track record of research via publications, technical reports, or impactful shipped work
- Rigorous experimental design skills to isolate variables and draw defensible conclusions
- Strong PyTorch skills with end-to-end experiment execution
- Ability to work across data, training, and evaluation problems for VLMs and productionize research
- PhD or Master's with relevant ML/CV/NLP research experience
- Publications at top-tier (A/A*) venues
- Experience with multilingual or low-resource language modelling
- Familiarity with document understanding, OCR, or structured visual prediction
- Experience with large-scale data curation and its impact on model quality
What We Do
We are an AI/ML research and development company on a mission to build reliable, performant, enterprise-grade AI systems at scale for India. We are committed to build the full-stack for generative AI for the rich & diverse landscape of India, mainly investing in: 1) Models: developing both efficient large scale Indic language models as well as bespoke enterprise models 2) Platform: building an enterprise-grade platform that empowers organisations to develop and ship creative and performant genAI applications at scale 3) Ecosystem: contributing to open-source models and datasets, as well as leading efforts for large scale data curation in public-good space









