Minerva Group, LLC

Data Scientist

Reposted Yesterday

New York City, NY, USA

In-Office

200K-225K Annually

Mid level

Agency

The Role

Build and deploy production-scale feature engineering pipelines and predictive models over terabytes of consumer data. Own end-to-end path from raw data to reliable features/models consumable by autonomous agents, improve income/wealth and propensity models, and help architect the lakehouse and data infrastructure to support agentic systems.

Summary Generated by Built In

About Minerva

Minerva builds AI for marketing leaders. Our platform allows marketers to focus on telling their brand's story, delegating operationally intensive to our AI agents which handle data management, analytics, campaign generation, measurement, and reporting.

Everything is built on Minerva's proprietary consumer graph, an identity and attribute layer covering 270M+ U.S. consumers across 1,000+ temporal attributes. We have two agentic systems built through an OpenAI research partnership: an Agentic Data Engineer that unifies and standardizes a brand's first party data in hours, and an Agentic Data Scientist that trains robust targeting models at scale. Together, these systems enhance the quality of first party data, increase campaign performance, and give marketing teams back their time.

Our clients include leading consumer brands across categories: the NBA, Ramp, Capital One, Hard Rock Stadium Group / Miami Dolphins, Wander, and Trust & Will. We have raised $20M from The General Partnership, 8VC, Lingotto, NBA Investments, Topology Ventures, Future Positive, Background Capital, and others.

About the Role

As a Data Scientist at Minerva, you build the models and features that power our consumer graph and the agents that run on top of it. You sit at the intersection of heavy data engineering and applied modeling: you architect feature engineering pipelines that are computed over terabytes of data, train and sharpen the models that drive targeting and prediction, and ensure the outputs are robust enough to be consumed autonomously by our Minerva Agents and our world-class modeled attributes (i.e. income / wealth).

This is a role that will be deploying constantly to production. The models you build are not handed off to be deployed by someone else, you own the path from raw data to a feature or model that an agent can call reliably at scale. As we grow, your work becomes the foundation other systems are built on.

What You'll Do

Create new features for models and agents, expanding the predictive surface area of our consumer data lake and building the pipelines that turn raw signal into trusted attributes.
Improve existing models through rigorous feature engineering, including our income/wealth, home buyer, and home seller models.
Play a pivotal role in the buildout of our world-class data lake, shaping how terabytes of consumer data are stored, transformed, and made queryable for both humans and agents.
Build feature engineering pipelines that run efficiently at terabyte scale, with the data engineering rigor to make them reliable in production. This is a 70/30 split DS/DE role.
Ensure model and feature outputs are reliable enough to be consumed agentically, writing the validations and guardrails that let our agents act on your work without a human in the loop.

Our Data Stack

Dagster for all things orchestration
dbt-core within Dagster as the primary data transformation surface
Spark, Iceberg, Trino, AWS Glue for Lakehouse workloads
Modal for ML eng
Frontier + OSS models & agent SDKs. We are heavy users of OpenAI/Anthropic batch APIs

Qualifications

2-4+ years working as a data scientist, applied machine learning focused data engineer or software engineer in a data-heavy context. Simply put, you live and breathe data.
Highly proficient at Python and SQL.
You are driven by first-principles thinking and are a go-getter. You reason about what datasets and features are necessary to solve a modeling problem, and are scrappy and clever enough to bring that to life.
Strong intuition for data engineering principles, especially around data cleaning/ingestion and data modeling. We prefer these core skills to be second-nature, freeing up thinking for architecting and executing large-scale data initiatives, especially given the advancement of AI coding tools.
Strong engineering background. You are comfortable deploying complicated production pipelines and working within larger production systems, not just in sandboxed or research environments.
Willingness to work in office in NYC (we provide a relocation package).
Flexibility and openness to wearing several hats. We are lean and things are always changing.
Eagerness to learn and grow with the company and your coworkers.

Preferred

Experience building and training predictive models (e.g. lead scoring, LTV, propensity, lookalike modeling).
Experience with orchestration tools like Dagster, Airflow, Prefect and SQL transformation tools like dbt, SQLMesh.
Experience with both transactional databases (e.g. Postgres, MySQL) and analytical databases (e.g. Snowflake, Redshift), with a bias toward the latter.
Familiarity with a cloud resource provider (e.g. AWS, GCP).
Familiarity with backend and ML/AI engineering.
Experience with AI coding tools (e.g. Cursor, Claude Code, OpenCode) as a force multiplier.
Prior work at an early-stage startup.

You don't need to tick every box. If you're strong on the engineering side and hungry to build models that matter, we want to hear from you.

Compensation

Base salary: $200,000 to $225,000, commensurate with experience. Competitive equity and a marquee benefits package.

Skills Required

2-4+ years working as a data scientist, applied ML-focused data engineer, or software engineer in a data-heavy context
Highly proficient in Python
Highly proficient in SQL
Strong intuition for data engineering principles (data cleaning, ingestion, data modeling)
Strong engineering background; comfortable deploying production pipelines and working within larger production systems
Experience building feature engineering pipelines and models that run reliably at terabyte scale
Ability to write validations and guardrails so models/features can be consumed autonomously by agents
Willingness to work in-office in NYC (relocation package provided)
Flexibility and openness to wearing several hats at an early-stage company
Eagerness to learn and grow with the company
Experience building and training predictive models (lead scoring, LTV, propensity, lookalike)
Experience with orchestration tools (Dagster, Airflow, Prefect) and SQL transformation tools (dbt, SQLMesh)
Experience with transactional and analytical databases (Postgres, MySQL, Snowflake, Redshift)
Familiarity with cloud providers (AWS, GCP)
Familiarity with backend and ML/AI engineering and ML deployment tooling (Modal, agent SDKs, OpenAI/Anthropic APIs)
Experience with AI coding tools (Cursor, Claude Code, OpenCode) and prior early-stage startup experience

View all jobs at Minerva Group, LLC

View Minerva Group, LLC Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Denver, CO

5 Employees

Year Founded: 2015

What We Do

Our team provides virtual CTOs that specialize in creating both back-end and front-end applications that scale. We have worked on a wide range of enterprise applications. Whether you are looking to go from one thousand customers to one million, or get your technology to a state where your company can be funded.