User Researcher, AI Evaluations

Posted An Hour Ago
Be an Early Applicant
2 Locations
Hybrid
196K-230K Annually
Senior level
Artificial Intelligence • Productivity • Software
Notion is the AI workspace where teams and AI agents get more done together.
The Role
Lead UX research to define and scale evaluation of Notion's AI experiences. Create reusable rubrics and measurement approaches, run longitudinal and feature-specific studies, identify failure modes and recovery behaviors, and operationalize evaluation with product, design, engineering, and data science partners to improve model output quality and end-to-end user experience.
Summary Generated by Built In
Who We Are

Notion is the collaborative AI workspace where teams and agents think together. We're building one place where your knowledge, projects, meetings, and AI tools live side by side, so work is faster, clearer, and less fragmented. Millions of individuals, small teams, and large companies run their work on Notion.

Notinos (our employees) are customer zero in bringing this future of work to life. We care about craft, building things that last, and the belief that great work is still fundamentally human. Our goal isn’t to ship the next feature. Each and every team of Notinos is working to set the standard for how humans work together in the AI era. From building a business’s system of record to making and managing AI agents to automating away the busy work, we care deeply about giving our customers more time for their life’s work.

About the Role:

We’re seeking an experienced UX Researcher to define and scale how we evaluate Notion’s AI-powered experiences—focusing on what “good” looks like not only for model output quality, but for the end-to-end product experience where people discover, set goals, delegate work, review results, and build trust over time with AI.

This role sits at the intersection of research craft and evaluation operations: you’ll run studies that uncover user mental models, expectations, and failure/recovery behaviors, then translate those insights into reusable rubrics, workflows, and measurement approaches that product, design, engineering, and data science can apply consistently.

This role can be based in either San Francisco or New York City. We work from our offices on Mondays, Tuesdays and Thursdays (our Anchor Days) because we do our best thinking and building together in person. We’re looking for someone who’s excited to work alongside the team during those days.

What You'll Achieve:
  • Define what “good” looks like (frameworks & rubrics): Establish clear, reusable evaluation criteria that reflect real user expectations—helpfulness, trust, tone, control, and transparency. You’ll translate qualitative insight into scoring guidance that can be applied consistently across teams and over time.

  • Run recurring evals (longitudinal & feature-specific): Run recurring longitudinal and feature-specific surveys and studies to measure experience quality over time against defined rubrics. Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts to deepen understanding of where experiences break down and how they can improve. You’ll help teams spot regressions, benchmark improvements, and understand when expectations shift.

  • Anchor evaluation in real workflows (context > isolated feedback): Ensure evals reflect jobs-to-be-done, user intent, and the full interaction journey (goal setting, delegation, review, iteration), not just decontextualized thumbs up/down. You’ll help teams understand who is evaluating, what they’re trying to do, and why outputs succeed or fail.

  • Identify failure modes & recovery behavior (guardrails): Uncover breakdowns, regressions, and edge cases across the system—from model behavior to UI and integrations—and study how people notice issues, correct them, and continue their work. You’ll turn these insights into actionable guidance for guardrails, fixes, and prioritization.

  • Operationalize evaluation with partners (process & tooling): Collaborate closely with Product, Design, Engineering, and Data Science to align on target use cases and build scalable evaluation loops (human-in-the-loop review, longitudinal studies, and calibration of automated/LLM-judge approaches against human judgment).

Skills You'll Need to Bring:
  • Ability to operationalize insight into measurement: You’re comfortable turning “soft” user expectations (trust, tone, usefulness, clarity) into concrete rubrics, scoring guidelines, and observable metrics.

  • AI fluency and systems thinking: You’re curious and hands-on with AI products, and can reason about how model behavior, uncertainty, and system constraints shape user experience. You also have experience evaluating AI-enabled products (LLMs, agents, generative UI/workflow automation) and working with Data Science/ML partners on measurement strategy and evaluation tooling.

  • Clear communication and impact orientation: You can align diverse partners around shared definitions of quality and create artifacts that enable teams to act consistently. You tailor storytelling to different audiences, connect research to business outcomes, and drive follow-through so insights translate into product change.

  • Strong UX research craft (quant + qual): You can choose the right methods for the question— interviews, benchmarking, surveys, experiments—and synthesize into actionable guidance. You also can prioritize ruthlessly, work through ambiguity, and balance scrappy iteration with deep dives when needed.

  • Pragmatism in fast-moving environments: You can prioritize ruthlessly, work through ambiguity, and balance scrappy iteration with deep dives when needed.

  • Experience: 5+ years doing UX research in industry

Nice to Haves:
  • Familiarity with LLM-as-judge methods, prompt design for evaluators, or “golden dataset” creation

  • Experience using AI research tooling for rapid synthesis and communication (e.g., Dovetail, Listen Labs, Maze, Outset, etc.), as well as AI observability tooling like Braintrust

  • Experience using data querying languages (e.g., SQL), scripting languages (e.g., Python), or statistical/mathematical software (e.g., R, SAS, Matlab, etc.)

  • Master’s or PhD in HCI, Psychology, Behavioral Science, Anthropology, Sociology, or a related field

  • You’re familiar with the work of computing heroes like Douglas Engelbart, Alan Kay, Bret Victor, etc. — and understand why we're big fans.


Notion is committed to providing highly competitive cash compensation, equity, and benefits. The compensation offered for this role will be based on multiple factors such as location, the role’s scope and complexity, and the candidate’s experience and expertise, and may vary from the range provided below. For roles based in San Francisco or New York City, the estimated base salary range for this role is $196,000-$230,000 per year.

By clicking “Submit Application”, I understand and agree that Notion and its affiliates and subsidiaries will collect and process my information in accordance with Notion’s Global Recruiting Privacy Policy and NYLL 144.

#LI-Onsite

A Note on AI

You don’t need deep AI expertise for every role, but we do expect every Notino to be intellectually curious, drawn to tinkering and discovery, and excited to use AI as a real collaborator in their work. For some roles, AI fluency is a core requirement — when that’s the case, we'll say so explicitly in the qualifications. People who thrive here don’t treat AI as a novelty. They use it to think better, and make their work easier for others to build on.

Equal Opportunity & Accommodations

We hire talented people from a wide range of backgrounds. If you’re excited about this role but don’t meet every bullet, we still encourage you to apply. Notion is an equal opportunity employer and does not discriminate on the basis of any legally protected characteristic. Consistent with applicable law, we will consider for employment qualified applicants with arrest and conviction records. Notion provides reasonable accommodations during the application process; if you need one, please let your recruiter know.

Notion is proud to be an equal opportunity employer. We do not discriminate in hiring or any employment decision based on race, color, religion, national origin, age, sex (including pregnancy, childbirth, or related medical conditions), marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity or expression, sexual orientation, or other applicable legally protected characteristic. Notion considers qualified applicants with criminal histories, consistent with applicable federal, state and local law. Notion is also committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, please let your recruiter know.

Skills Required

  • 5+ years doing UX research in industry
  • Ability to operationalize qualitative insight into concrete measurement (rubrics, scoring guidelines, observable metrics)
  • Experience evaluating AI-enabled products (LLMs, agents) and partnering with Data Science/ML on measurement strategy
  • Strong UX research craft across qualitative and quantitative methods (interviews, surveys, benchmarking, experiments)
  • Clear communication, stakeholder alignment, and impact orientation
  • Pragmatism and ability to prioritize in fast-moving, ambiguous environments
  • Work from Notion offices in San Francisco or New York City on Anchor Days (Mondays, Tuesdays, Thursdays)
  • Familiarity with LLM-as-judge methods, prompt design, or golden dataset creation
  • Experience using AI research tooling (e.g., Dovetail, Listen Labs, Maze, Outset) and AI observability tools (e.g., Braintrust)
  • Experience with data/querying or scripting languages (e.g., SQL, Python) or statistical software (R, SAS, Matlab)
  • Master's or PhD in HCI, Psychology, Behavioral Science, Anthropology, Sociology, or related field

What the Team is Saying

Alma
Penny
Marlene

Notion Compensation & Benefits Highlights

  • Healthcare Strength Coverage is described as comprehensive for employees and dependents across medical, dental, and vision, with mental‑health support and EAP included. Some materials indicate fully covered premiums in the U.S., reinforcing strong affordability.
  • Parental & Family Support Paid parental leave is provided for biological, adoptive, and foster parents, and employer‑sponsored fertility benefits support treatments and family‑forming services. This breadth signals meaningful support for various paths to parenthood.
  • Equity Value & Accessibility Compensation includes equity, and a recent liquidity event enabled employees to sell a portion of their shares at a stated valuation. These opportunities increase the practicality of realizing value from stock alongside cash pay.

Notion Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
1,000 Employees
Year Founded: 2016

What We Do

Notion blends your everyday work tools into one. Product roadmap? Company wiki? Meeting notes? With Notion, they're all in one place, and totally customizable to meet the needs of any workflow. It's the all-in-one workspace for you, your team, and your whole company. Mission: We humans are toolmakers by nature, but most of us can't build or modify the software we use every day — arguably our most powerful tool. Here at Notion, we're on a mission to make it possible for everyone to shape the tools that shape their lives.

Why Work With Us

Here at Notion, our work shapes our culture and our culture inspires our work. We seek to hire creative toolmakers that want to be the best in their craft. If every employee is able to focus on being the best toolmaker in their craft, we'll be able to achieve our mission of enabling the world to better solve its problems.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Notion Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Employees work in-person at our offices on Mondays, Tuesdays and Thursdays. The other two days are flexible.

Typical time on-site: 3 days a week
Company Office Image
HQSan Francisco, CA
Company Office Image
Dublin, Dublin
Company Office Image
Hanyang, KR
Company Office Image
Hyderabad, Hyderabad
Company Office Image
New York, NY
Company Office Image
Tokyo, Tokyo
Learn more

Similar Jobs

Notion Logo Notion

User Researcher

Artificial Intelligence • Productivity • Software
Hybrid
2 Locations
1000 Employees
164K-190K Annually

Notion Logo Notion

Enterprise Product Marketing, GTM

Artificial Intelligence • Productivity • Software
Hybrid
2 Locations
1000 Employees
170K-200K Annually

Notion Logo Notion

Software Engineer

Artificial Intelligence • Productivity • Software
Hybrid
2 Locations
1000 Employees
209K-240K Annually

Notion Logo Notion

International Payroll Analyst

Artificial Intelligence • Productivity • Software
Hybrid
2 Locations
1000 Employees
120K-135K Annually

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account