Machine Learning Engineer

Reposted 3 Days Ago
Be an Early Applicant
Navi Mumbai, Thane, Maharashtra, IND
Hybrid
Mid level
Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Empowering Investor Success
The Role
As a Machine Learning Engineer, you will design and deploy ML-driven data pipelines, focusing on extracting and enriching unstructured data and improving data quality through collaboration with cross-functional teams.
Summary Generated by Built In
As a Machine Learning Engineer (MLE) on the AI & ML (Data Collection & Enrichment) team, you will play a critical role in building intelligent systems that acquire, process, and enrich PitchBook's structured and unstructured data at scale. Your work will directly impact the quality, coverage, and usability of the data that powers downstream analytics, insights, and customer-facing features.
This role requires deep expertise in machine learning, data engineering, and natural language processing (NLP), with a strong emphasis on extracting, structuring, and augmenting data from diverse sources such as reports, filings, news, and web content.
You will design and deploy ML-driven pipelines for entity extraction, entity resolution, classification, and data augmentation, leveraging techniques from NLP, large language models (LLMs), and generative AI. You will be responsible for the full lifecycle of these systems-from data ingestion and model development to deployment, monitoring, and continuous improvement.
Your contributions will ensure that PitchBook maintains high-quality, comprehensive, and timely datasets by transforming raw information into structured, enriched, and reliable data assets.
You will be part of a team of machine learning engineers focused on building scalable systems for data acquisition, extraction, normalization, and enrichment. The team enables high-quality datasets that power critical features across the PitchBook Platform.
You will collaborate closely with data collection teams, platform engineers, and product stakeholders to ensure that data pipelines are robust, efficient, and aligned with business priorities.
Primary Job Responsibilities:
  • Design and build ML-driven data pipelines that ingest and process structured and unstructured data from multiple sources.
  • Develop models for information extraction, entity recognition (NER), entity resolution, classification, and data normalization.
  • Apply NLP, transformer models, and LLMs to extract and enrich data from documents such as reports, filings, and news articles.
  • Build systems that improve data coverage, accuracy, freshness, and consistency across datasets.
  • Integrate ML models into scalable production systems with strong reliability, latency, and throughput guarantees.
  • Collaborate with data collection and curation teams to incorporate human-in-the-loop feedback and improve model performance.
  • Design evaluation frameworks and metrics for data quality, extraction accuracy, and enrichment effectiveness.
  • Optimize pipelines for large-scale processing using distributed systems and streaming technologies.
  • Contribute to architecture decisions for data infrastructure, ensuring scalability and maintainability.
  • Stay current with advancements in NLP, GenAI, and information extraction, and translate research into production-ready systems.
  • Ensure best practices in monitoring, observability, data governance, and responsible AI usage.
  • Mentor junior engineers and contribute to a culture of technical excellence through reviews and knowledge sharing.

Skills & Qualifications:
  • Bachelor's (or higher) in Computer Science, Data Science, Mathematics, or a related field.
  • 2+ years of experience in ML engineering, data engineering, or applied AI roles focused on data extraction, enrichment, or processing pipelines.
  • Strong experience in NLP, including NER, parsing, classification, and transformer-based models.
  • Hands-on experience with LLMs / GenAI for structured data extraction, augmentation, or labeling workflows.
  • Preferred experience building data pipelines and distributed systems (e.g., Kafka, Airflow, Spark, Snowflake).
  • Proficiency in Python and SQL with experience using ML frameworks such as PyTorch, TensorFlow, scikit-learn.
  • Preferred experience deploying ML systems in production, including monitoring and iteration loops.
  • Familiarity with LangChain ecosystem (LangSmith, LangGraph) or similar orchestration tools is a plus.
  • Experience with entity resolution, knowledge graphs, or data deduplication systems is desirable.
  • Strong problem-solving skills and ability to work on ambiguous data challenges.
  • Experience collaborating cross-functionally with engineering, product, and data teams.
  • Prior exposure to financial datasets or fintech ecosystems is a plus.
  • Research experience or publications in NLP/ML conferences (e.g., ACL, EMNLP, NeurIPS) is a strong plus.

Working Conditions
The job conditions for this position are in a standard office setting. Employees in this position use PC and phones on an ongoing basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.
Morningstar's hybrid work environment gives you the opportunity to collaborate in-person each week as we've found that we're at our best when we're purposely together on a regular basis. In most of our locations, our hybrid work model is four days in-office each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity

Skills Required

  • Bachelor's (or higher) in Computer Science, Data Science, Mathematics, or a related field
  • 2+ years of experience in ML engineering, data engineering, or applied AI roles
  • Strong experience in NLP, including NER, parsing, classification, and transformer-based models
  • Hands-on experience with LLMs / GenAI for structured data extraction, augmentation, or labeling workflows
  • Proficiency in Python and SQL with experience using ML frameworks such as PyTorch, TensorFlow, scikit-learn
  • Preferred experience building data pipelines and distributed systems (e.g., Kafka, Airflow, Spark, Snowflake)
  • Preferred experience deploying ML systems in production, including monitoring and iteration loops
  • Experience with entity resolution, knowledge graphs, or data deduplication systems
  • Strong problem-solving skills and ability to work on ambiguous data challenges

What the Team is Saying

Anna
Upasna
Saurabh
Wendell
Raaghavendar
Jeff

Morningstar Compensation & Benefits Highlights

  • Leave & Time Off Breadth Time away options include a paid six‑week sabbatical every four years and flexible time off in North America, with broad usage reported in 2024. Feedback suggests these programs are a distinctive strength by market standards.
  • Parental & Family Support A global minimum beginning in 2025 provides at least 16 weeks paid leave for primary caregivers and up to 8 weeks for secondary caregivers, plus at least 6 weeks paid caregiving leave. This breadth positions family support as a clear pillar of the package.
  • Equity Value & Accessibility An optional Shared Ownership program lets employees direct part of a bonus into RSUs with a 50% company match. This structure adds a notable long‑term ownership component to total rewards.

Morningstar Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Chicago, IL
11,500 Employees
Year Founded: 1984

What We Do

At Morningstar, we believe in building great products in-house in a highly collaborative, agile environment where we focus on technical excellence, the user experience, and continuous improvement. Our technologists represent a range of skills and experience levels, but they all view their work as a craft and push technology’s boundaries.

Why Work With Us

Imagining big things is in our blood -- it's transformed us from a company with just a few employees in 1984 to a leading independent investment research company with a worldwide presence today. As of April 2020, we acquired Sustainalytics to drive long-term meaningful outcomes for investors in the ESG space. Join us on this exciting journey!

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Morningstar Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Typical time on-site: 3 days a week
HQGlobal Headquarters
Mexico City
Santiago Province
LU
NSW
Amsterdam, NL
Bangkok, TH
Cape Town, ZA
Dubai, Dubai
Frankfurt am Main, DE
Frederiksberg, DK
London, GB
Madrid, ES
Milano, IT
Navi Mumbai, Maharashtra
New York, NY
Oakland, MD
Oslo, NO
Paris, FR
São Paulo, São Paulo
PitchBook US Headquarters
Stockholm, SE
Tokyo, JP
Toronto, ON
Toronto, Ontario
Zürich, CH
Learn more

Similar Jobs

Morningstar Logo Morningstar

Senior Machine Learning Engineer

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
11500 Employees

Morningstar Logo Morningstar

Senior Machine Learning Engineer

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
11500 Employees
50K-80K Annually

Morningstar Logo Morningstar

Lead Machine Learning Engineer

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
11500 Employees

Morningstar Logo Morningstar

Lead Machine Learning Engineer

Artificial Intelligence • Big Data • Enterprise Web • Fintech • Software • Financial Services
Hybrid
Navi Mumbai, Thane, Maharashtra, IND
11500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account