Guild.ai

Data Scientist

Sorry, this job was removed at 12:08 a.m. (CST) on Wednesday, Apr 15, 2026

San Francisco, CA, USA

Hybrid

Artificial Intelligence • Software • Automation

The Role

The Opportunity: Build the Data & Evaluation Backbone for AI-Native Developer Workflows

This isn’t a typical DS role focused on optimizing a mature funnel. As the first Data Scientist at Guild.ai, you’ll establish the company’s “truth layer”—from product instrumentation and decision metrics to evaluation frameworks for autonomous, event-driven AI systems.

We’re tackling one of the hardest—and most important—problems in software engineering: helping developers understand, evolve, and operate complex systems using autonomous and event-driven AI. Your work will ensure we ship the right things, know whether they’re working, and continuously improve quality, reliability, and user trust.

If you thrive in ambiguity, love turning messy signals into crisp insight, and want to build the measurement culture for a 0→1 product with real technical depth, this role is for you.

What You Will Do

Define What “Good” Means: Partner with founders, engineering, and design to define product KPIs and quality metrics—especially around AI behaviors (helpfulness, correctness, reliability, latency, cost, user trust).
Build the Measurement Foundation: Establish event taxonomy, instrumentation standards, and core datasets. Ensure we can answer product questions quickly and confidently.
Create AI Evaluation & Monitoring Systems: Develop offline/online evaluation approaches for agentic workflows (e.g., golden sets, human review loops, heuristic + model-based scoring, regression detection, error taxonomies).
Run Experiments That Change Decisions: Design and analyze A/B tests and quasi-experiments; bring statistical rigor to iteration speed.
Turn Insight into Action: Produce analyses, narratives, and recommendations that directly shape roadmap tradeoffs and product direction.
Enable Self-Serve Analytics: Build dashboards and lightweight tooling that help the entire team understand usage, performance, and customer outcomes.
Be a Cross-Functional Glue Layer: Work tightly with engineering on logging/telemetry, with PM on prioritization, and with GTM/customer conversations to connect product behavior to real-world impact.
Define Data Science at Guild.ai: Establish best practices for metrics, experimentation, and decision-making frameworks that scale as the team grows.

What You Will Bring

Strong foundations in statistics, experimentation, and causal reasoning, with a track record of driving product decisions through data
Fluency in SQL and Python, and comfort working across the data stack (from raw events to analysis-ready datasets)
Experience building analytics and measurement systems in a fast-moving environment (startup and/or high-ownership teams)
Ability to translate ambiguous questions into well-scoped analyses and clear recommendations
High judgment and crisp communication—especially when data is incomplete or messy
A founder’s mentality: comfortable building from scratch, prioritizing ruthlessly, and owning outcomes end-to-end

Bonus Points

Experience evaluating or monitoring LLMs / agentic systems (quality measurement, human-in-the-loop evals, regression testing, safety/reliability metrics)
Familiarity with developer tools, infrastructure, observability, or Git-based workflows
Comfort with modern data tooling (warehouses, dbt, orchestration, BI) and event-driven architectures
Experience establishing experimentation and analytics culture at an early-stage startup

Benefits & Perks

Significant equity in an early-stage, venture-backed startup
Comprehensive Health Benefits (Medical, Dental, Vision)
Flexible PTO to ensure you have the time you need to recharge

Thank you for your interest—we can’t wait to meet you.

View all jobs at Guild.ai

View Guild.ai Profile

Report Job

Similar Jobs

Micron Technology

Data Scientist

Artificial Intelligence • Hardware • Information Technology • Machine Learning

In-Office

45000 Employees

136K-290K Annually

GRAIL

Data Scientist

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech

Hybrid

Menlo Park, CA, USA

918 Employees

156K-187K Annually

PwC

Data Scientist

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

Hybrid

370000 Employees

99K-232K Annually

SoFi

Data Scientist

Fintech • Mobile • Software • Financial Services

Easy Apply

Hybrid

San Francisco, CA, USA

4500 Employees

154K-264K Annually

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Francisco, California

25 Employees

Year Founded: 2025

What We Do

Guild turns agents into shared production infrastructure, with a managed software center for trusted agent capabilities, and an agent hub for discovering and sharing agents. For Enterprises. AI, Trusted in Production Autonomous software requires the same guardrails as any production system. Guild enforces centralized identity, least-privilege access, and immutable audit logging so enterprise governance extends to AI agents. Agents can act on code, tickets, and operational workflows without bypassing identity controls or becoming a black box. For Developers. AI, Built Like Real Software Guild gives developers the primitives they expect: typed interfaces, versioned releases, safe execution boundaries, and full execution traces, so agents behave like systems, not scripts. The Agent Hub is a public GitHub-like platform for broad discovery and reuse of agents, allowing developers to build agents like real software and ship them as products. One Platform. Any Model Universal by design. Guild is neutral toward models, vendors, and frameworks, doesn’t lock governance into a single stack, and works with Anthropic, OpenAI, Google, and open-source models. Companies can run agents via chat, APIs, webhooks, and schedules, as well as publish trusted capabilities to version, reuse, and improve - so teams don't start from zero. Access can be controlled centrally, and usage tracked by workspace, user, agent, and trigger.