Senior Machine Learning Engineer

Posted Yesterday
Be an Early Applicant
Navi Mumbai, Thane, Maharashtra, IND
Hybrid
Senior level
Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Empowering Investor Success
The Role
The Senior Machine Learning Engineer will lead the design, development, and optimization of ML/NLP solutions for data extraction, enhance system reliability, and collaborate with cross-functional teams to improve product impact and performance.
Summary Generated by Built In
As a Senior Machine Learning Engineer (MLE) on the AI & ML (Data Collection) team, you will play a critical role in delivering AI-powered systems that extracts meaningful data from PitchBook's wealth of structured and unstructured data, including reports, news, and other textual content. This role requires deep technical expertise in advanced data analytics and machine learning, as well as a hands-on approach to designing, building, and optimizing ML solutions that empower the PitchBook Platform.
You will be deeply involved in the end-to-end development and operationalization of ML models, including their architecture, training, deployment, and ongoing maintenance. Your focus will span across natural language processing (NLP), generative AI (GenAI), large language models (LLMs), and scalable data systems. You will be expected to tackle complex technical challenges, contribute to architectural decisions, and collaborate closely with cross-functional stakeholders, including product managers, engineers, and domain experts, to translate requirements into scalable AI/ML solutions. Strong communication, collaboration, and a focus on delivering reliable systems will be key to your success in this role.
Your contributions will help unlock value for PitchBook customers by improving the accuracy, scalability, and efficiency of data extraction, enabling faster and more reliable access to structured information across the platform. This includes developing models that can infer meaning and structure from millions of discrete data sources, and applying ML to enrich our datasets with predictive and generative intelligence. As a senior engineer, you will take ownership of key technical components and ensure that our systems meet the highest standards of performance, reliability, and security.
You will be expected to contribute to the team's technical excellence by providing guidance through code and design reviews, sharing best practices, and supporting peers when needed. While this role does not involve direct people management, you will lead by example through strong ownership and high-quality execution. You will actively contribute to solving complex technical problems and ensure your work aligns with broader product and business objectives.
In addition to driving product impact, you will have opportunities to deepen your expertise and contribute to the broader AI/ML community through experimentation, technical contributions, or knowledge sharing. A strong interest in advancing capabilities in areas such as generative AI, Agentic AI, LLMs, and applied NLP will help you continuously improve systems and deliver meaningful impact.
You will be part of a multidisciplinary team of ML engineers and data scientists responsible for building AI & ML solutions and services as part of robust data collection pipelines handling large volumes of unstructured data. Team will focus on building scalable and reliable systems to process and categorize data that is essential for downstream data collection processing.
Primary Job Responsibilities:
  • AI & ML Extraction Contribution: Build and deliver high-impact AI/ML solutions focused on extracting structured data from unstructured sources. Ensure outputs improve data quality, coverage, and reliability across data collection pipelines.
  • Technical Execution: Design, develop, and deploy ML/NLP/LLM-based extraction systems. Contribute to building scalable, efficient, and production-grade services with strong focus on accuracy, latency, cost, and robustness.
  • Extraction System Development: Develop and optimize extraction workflows using techniques such as document parsing, chunking, embeddings, RAG, and LLM-based extraction methods.
  • Evaluation & Quality Improvement: Define and implement evaluation frameworks (precision, recall, F1, field-level accuracy) and continuously improve extraction performance through iterative experimentation.
  • Data Pipeline Contribution: Work on high-throughput data collection pipelines, ensuring seamless integration of extraction components with upstream and downstream systems.
  • MLOps & Reliability: Contribute to model deployment, monitoring, logging, and CI/CD pipelines. Ensure models are observable and reliable in production environments.
  • Collaboration & Stakeholder Alignment: Partner with Product, Data Collection Engineering, and Platform teams to translate requirements into scalable extraction solutions aligned with business goals.
  • Code Quality & Knowledge Sharing: Maintain high standards of code quality, participate in design and code reviews, and share knowledge to improve overall team capability.
  • Innovation & Continuous Improvement: Explore and apply advancements in NLP, LLMs, and extraction techniques to improve system performance, scalability, and cost efficiency.
  • Process & Delivery Efficiency: Contribute to efficient development cycles by following Agile practices and continuously improving workflows and automation.
  • Hiring & Onboarding Support: Support hiring efforts through technical interviews and help onboard new team members via documentation and knowledge sharing.

Skills & Qualifications:
  • Bachelor's or Master's degree in Computer Science, Data Science, Mathematics, or a related technical field
  • 5+ years of experience in machine learning engineering or data science, with a strong focus on unstructured data processing and information extraction
  • Strong hands-on experience in NLP and extraction-focused ML problems, including classifiers, transformer models, and LLM-based extraction techniques
  • Experience building and deploying production-grade ML/LLM systems for tasks such as document parsing, information extraction, and text processing
  • Proficiency in Python and SQL, with experience using standard ML/data libraries such as scikit-learn, pandas, numpy, and deep learning frameworks like PyTorch or TensorFlow
  • Practical experience with embeddings, retrieval techniques (RAG), and LLM-based workflows for extraction use cases
  • Preferred experience with the LangChain ecosystem (LangChain, LangGraph, LangSmith) for building and orchestrating LLM-based extraction workflows is strongly preferred
  • Experience working with large-scale unstructured datasets (documents, PDFs, text pipelines) including preprocessing, chunking, and feature engineering
  • Familiarity with data pipeline and orchestration tools such as Kafka, Airflow, or similar technologies
  • Experience with cloud platforms (AWS or GCP) and building scalable, production-ready systems
  • Working knowledge of containerization (Docker) and exposure to Kubernetes is a plus
  • Understanding of MLOps practices including model deployment, monitoring, evaluation, and iterative improvement
  • Strong problem-solving skills with the ability to debug and improve complex ML systems
  • Effective communication and collaboration skills, with experience working cross-functionally with product, engineering, and data teams
  • Experience working in fast-paced, data-driven environments; prior exposure to financial data or similar domains is a plus

Working Conditions
The job conditions for this position are in a standard office setting. Employees in this position use PC and phones on an ongoing basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.
Morningstar's hybrid work environment gives you the opportunity to collaborate in-person each week as we've found that we're at our best when we're purposely together on a regular basis. In most of our locations, our hybrid work model is four days in-office each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity

Top Skills

Airflow
AWS
Docker
GCP
Kafka
Kubernetes
Langchain
Langgraph
Langsmith
Numpy
Pandas
Python
PyTorch
Scikit-Learn
SQL
TensorFlow

What the Team is Saying

Anna
Upasna
Saurabh
Wendell
Raaghavendar
Jeff
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Chicago, IL
11,500 Employees
Year Founded: 1984

What We Do

At Morningstar, we believe in building great products in-house in a highly collaborative, agile environment where we focus on technical excellence, the user experience, and continuous improvement. Our technologists represent a range of skills and experience levels, but they all view their work as a craft and push technology’s boundaries.

Why Work With Us

Imagining big things is in our blood -- it's transformed us from a company with just a few employees in 1984 to a leading independent investment research company with a worldwide presence today. As of April 2020, we acquired Sustainalytics to drive long-term meaningful outcomes for investors in the ESG space. Join us on this exciting journey!

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Morningstar Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Typical time on-site: 3 days a week
HQGlobal Headquarters
Mexico City
Santiago Province
LU
NSW
Amsterdam, NL
Bangkok, TH
Cape Town, ZA
Dubai, Dubai
Frankfurt am Main, DE
Frederiksberg, DK
London, GB
Madrid, ES
Milano, IT
Navi Mumbai, Maharashtra
New York, NY
Oakland, MD
Oslo, NO
Paris, FR
São Paulo, São Paulo
PitchBook US Headquarters
Stockholm, SE
Tokyo, JP
Toronto, ON
Toronto, Ontario
Zürich, CH
Learn more

Similar Jobs

Morningstar Logo Morningstar

Senior Machine Learning Engineer

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
11500 Employees
50K-80K Annually

Morningstar Logo Morningstar

Senior Machine Learning Engineer

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
11500 Employees
3-3 Annually

Morningstar Logo Morningstar

Accountant

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
11500 Employees

Morningstar Logo Morningstar

Salesforce Administrator

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
11500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account