This isn’t a typical DS role focused on optimizing a mature funnel. As the first Data Scientist at Guild.ai, you’ll establish the company’s “truth layer”—from product instrumentation and decision metrics to evaluation frameworks for autonomous, event-driven AI systems.
We’re tackling one of the hardest—and most important—problems in software engineering: helping developers understand, evolve, and operate complex systems using autonomous and event-driven AI. Your work will ensure we ship the right things, know whether they’re working, and continuously improve quality, reliability, and user trust.
If you thrive in ambiguity, love turning messy signals into crisp insight, and want to build the measurement culture for a 0→1 product with real technical depth, this role is for you.
What You Will DoDefine What “Good” Means: Partner with founders, engineering, and design to define product KPIs and quality metrics—especially around AI behaviors (helpfulness, correctness, reliability, latency, cost, user trust).
Build the Measurement Foundation: Establish event taxonomy, instrumentation standards, and core datasets. Ensure we can answer product questions quickly and confidently.
Create AI Evaluation & Monitoring Systems: Develop offline/online evaluation approaches for agentic workflows (e.g., golden sets, human review loops, heuristic + model-based scoring, regression detection, error taxonomies).
Run Experiments That Change Decisions: Design and analyze A/B tests and quasi-experiments; bring statistical rigor to iteration speed.
Turn Insight into Action: Produce analyses, narratives, and recommendations that directly shape roadmap tradeoffs and product direction.
Enable Self-Serve Analytics: Build dashboards and lightweight tooling that help the entire team understand usage, performance, and customer outcomes.
Be a Cross-Functional Glue Layer: Work tightly with engineering on logging/telemetry, with PM on prioritization, and with GTM/customer conversations to connect product behavior to real-world impact.
Define Data Science at Guild.ai: Establish best practices for metrics, experimentation, and decision-making frameworks that scale as the team grows.
Strong foundations in statistics, experimentation, and causal reasoning, with a track record of driving product decisions through data
Fluency in SQL and Python, and comfort working across the data stack (from raw events to analysis-ready datasets)
Experience building analytics and measurement systems in a fast-moving environment (startup and/or high-ownership teams)Ability to translate ambiguous questions into well-scoped analyses and clear recommendations
High judgment and crisp communication—especially when data is incomplete or messy
A founder’s mentality: comfortable building from scratch, prioritizing ruthlessly, and owning outcomes end-to-end
Experience evaluating or monitoring LLMs / agentic systems (quality measurement, human-in-the-loop evals, regression testing, safety/reliability metrics)
Familiarity with developer tools, infrastructure, observability, or Git-based workflows
Comfort with modern data tooling (warehouses, dbt, orchestration, BI) and event-driven architectures
Experience establishing experimentation and analytics culture at an early-stage startup
Significant equity in an early-stage, venture-backed startup
Comprehensive Health Benefits (Medical, Dental, Vision)
Flexible PTO to ensure you have the time you need to recharge
Thank you for your interest—we can’t wait to meet you.
Similar Jobs
What We Do
Guild turns agents into shared production infrastructure, with a managed software center for trusted agent capabilities, and an agent hub for discovering and sharing agents. For Enterprises. AI, Trusted in Production Autonomous software requires the same guardrails as any production system. Guild enforces centralized identity, least-privilege access, and immutable audit logging so enterprise governance extends to AI agents. Agents can act on code, tickets, and operational workflows without bypassing identity controls or becoming a black box. For Developers. AI, Built Like Real Software Guild gives developers the primitives they expect: typed interfaces, versioned releases, safe execution boundaries, and full execution traces, so agents behave like systems, not scripts. The Agent Hub is a public GitHub-like platform for broad discovery and reuse of agents, allowing developers to build agents like real software and ship them as products. One Platform. Any Model Universal by design. Guild is neutral toward models, vendors, and frameworks, doesn’t lock governance into a single stack, and works with Anthropic, OpenAI, Google, and open-source models. Companies can run agents via chat, APIs, webhooks, and schedules, as well as publish trusted capabilities to version, reuse, and improve - so teams don't start from zero. Access can be controlled centrally, and usage tracked by workspace, user, agent, and trigger.
.jpeg)
.png)







