Data Scientist

Sorry, this job was removed at 12:08 a.m. (CST) on Wednesday, Apr 15, 2026
San Francisco, CA, USA
Hybrid
Artificial Intelligence • Software • Automation
The Role
The Opportunity: Build the Data & Evaluation Backbone for AI-Native Developer Workflows

This isn’t a typical DS role focused on optimizing a mature funnel. As the first Data Scientist at Guild.ai, you’ll establish the company’s “truth layer”—from product instrumentation and decision metrics to evaluation frameworks for autonomous, event-driven AI systems.

We’re tackling one of the hardest—and most important—problems in software engineering: helping developers understand, evolve, and operate complex systems using autonomous and event-driven AI. Your work will ensure we ship the right things, know whether they’re working, and continuously improve quality, reliability, and user trust.

If you thrive in ambiguity, love turning messy signals into crisp insight, and want to build the measurement culture for a 0→1 product with real technical depth, this role is for you.

What You Will Do
  • Define What “Good” Means: Partner with founders, engineering, and design to define product KPIs and quality metrics—especially around AI behaviors (helpfulness, correctness, reliability, latency, cost, user trust).

  • Build the Measurement Foundation: Establish event taxonomy, instrumentation standards, and core datasets. Ensure we can answer product questions quickly and confidently.

  • Create AI Evaluation & Monitoring Systems: Develop offline/online evaluation approaches for agentic workflows (e.g., golden sets, human review loops, heuristic + model-based scoring, regression detection, error taxonomies).

  • Run Experiments That Change Decisions: Design and analyze A/B tests and quasi-experiments; bring statistical rigor to iteration speed.

  • Turn Insight into Action: Produce analyses, narratives, and recommendations that directly shape roadmap tradeoffs and product direction.

  • Enable Self-Serve Analytics: Build dashboards and lightweight tooling that help the entire team understand usage, performance, and customer outcomes.

  • Be a Cross-Functional Glue Layer: Work tightly with engineering on logging/telemetry, with PM on prioritization, and with GTM/customer conversations to connect product behavior to real-world impact.

  • Define Data Science at Guild.ai: Establish best practices for metrics, experimentation, and decision-making frameworks that scale as the team grows.

What You Will Bring
  • Strong foundations in statistics, experimentation, and causal reasoning, with a track record of driving product decisions through data

  • Fluency in SQL and Python, and comfort working across the data stack (from raw events to analysis-ready datasets)
    Experience building analytics and measurement systems in a fast-moving environment (startup and/or high-ownership teams)

  • Ability to translate ambiguous questions into well-scoped analyses and clear recommendations

  • High judgment and crisp communication—especially when data is incomplete or messy

  • A founder’s mentality: comfortable building from scratch, prioritizing ruthlessly, and owning outcomes end-to-end

Bonus Points
  • Experience evaluating or monitoring LLMs / agentic systems (quality measurement, human-in-the-loop evals, regression testing, safety/reliability metrics)

  • Familiarity with developer tools, infrastructure, observability, or Git-based workflows

  • Comfort with modern data tooling (warehouses, dbt, orchestration, BI) and event-driven architectures

  • Experience establishing experimentation and analytics culture at an early-stage startup

Benefits & Perks
  • Significant equity in an early-stage, venture-backed startup

  • Comprehensive Health Benefits (Medical, Dental, Vision)

  • Flexible PTO to ensure you have the time you need to recharge

Thank you for your interest—we can’t wait to meet you.

Similar Jobs

Micron Technology Logo Micron Technology

Data Scientist

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
2 Locations
45000 Employees
136K-290K Annually

GRAIL Logo GRAIL

Data Scientist

Artificial Intelligence • Big Data • Healthtech • Machine Learning • Software • Biotech
Hybrid
Menlo Park, CA, USA
918 Employees
156K-187K Annually

PwC Logo PwC

Data Scientist

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
62 Locations
370000 Employees
99K-232K Annually

SoFi Logo SoFi

Data Scientist

Fintech • Mobile • Software • Financial Services
Easy Apply
Hybrid
San Francisco, CA, USA
4500 Employees
154K-264K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
25 Employees
Year Founded: 2025

What We Do

Guild turns agents into shared production infrastructure, with a managed software center for trusted agent capabilities, and an agent hub for discovering and sharing agents. For Enterprises. AI, Trusted in Production Autonomous software requires the same guardrails as any production system. Guild enforces centralized identity, least-privilege access, and immutable audit logging so enterprise governance extends to AI agents. Agents can act on code, tickets, and operational workflows without bypassing identity controls or becoming a black box. For Developers. AI, Built Like Real Software Guild gives developers the primitives they expect: typed interfaces, versioned releases, safe execution boundaries, and full execution traces, so agents behave like systems, not scripts. The Agent Hub is a public GitHub-like platform for broad discovery and reuse of agents, allowing developers to build agents like real software and ship them as products. One Platform. Any Model Universal by design. Guild is neutral toward models, vendors, and frameworks, doesn’t lock governance into a single stack, and works with Anthropic, OpenAI, Google, and open-source models. Companies can run agents via chat, APIs, webhooks, and schedules, as well as publish trusted capabilities to version, reuse, and improve - so teams don't start from zero. Access can be controlled centrally, and usage tracked by workspace, user, agent, and trigger.

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account