We are on a mission to reinvent how designers work in the AI era. We’re backed by top investors including First Round, Chemistry, Homebrew, Scribble and senior leaders from OpenAI, Meta, Google, Ramp, Stripe and more. We’re building the next-generation AI design tool for product teams.
About the RoleWe're hiring an AI Scientist to decide what AI we need and how to build it, when to fine-tune, distill, route, prompt, or just call an API. You'll own our AI architecture, evaluation, retrieval, and agent design, and run the experiments that tell us what's actually true for our product. Your core job: stop us from spending six months on the wrong thing.
The frontier moves weekly. You'll read the papers, benchmark what matters, and tell us 3 - 6 months early when something changes our roadmap and when something we're worried about is just hype.
What You'll DoAI architecture and the reasoning behind it: fine-tune vs. RAG vs. prompt vs. frontier API, SLM vs. large model, in-house vs. vendor inference. Make the calls, document them, revisit as the landscape shifts.
Experiments. Turn "we think X is better" into "we have evidence X is better."
The evaluation framework: benchmarks, execution-based verifiers, LLM-as-judge, behavioral regression, eval data.
Prompt, context, and retrieval layers across every AI feature.
Agent flow and tool design: orchestration, tool taxonomy, contracts.
Model adaptation when experiments call for it: data curation, synthetic data, SFT, LoRA/QLoRA, DPO/RLHF, deployment.
The loyal skeptic role: audit what we ship and flag where we're over- or under-engineering.
8+ years engineering, 3+ deep in LLMs and modern ML.
Track record of structured experiments that drove real architectural decisions
You've designed eval frameworks for generative models, not just used benchmarks.
Strong data instincts: acquisition, curation, synthetic generation.
Solid grounding in fine-tuning (SFT, LoRA/QLoRA, distillation, preference optimization) and the judgment to know when each is the right call. Production experience is a plus; deep current understanding is the bar.
Built agentic systems with tool calling and designed retrieval pipelines.
Deep familiarity with the current LLM landscape and a track record of calling shifts early.
Published research, open-source, or public writing in LLM/ML.
Multimodal, code-generation, or structured-output experience.
Synthetic data generation at scale.
Shipped AI inside a product, not just research.
Salary: $300,000-$400,000 base salary
Equity: Meaningful stock options
Health Insurance: Best-in-class coverage for the employee and their entire family
Location: San Francisco HQ
Skills Required
- 8+ years engineering experience
- 3+ years deep in LLMs and modern ML
- Track record of structured experiments for architectural decisions
- Experience designing eval frameworks for generative models
- Strong data instincts in acquisition and curation
- Proficient in fine-tuning methods like SFT, LoRA/QLoRA
- Experience building agentic systems and retrieval pipelines
- Familiarity with the current LLM landscape
What We Do
Noon is an AI-native product design platform that provides a dual-canvas tool for product designers. By integrating design and production-ready code, it eliminates the gap between the two, allowing designers to create, iterate, build, test, and ship products directly from a single canvas. Founded in 2024, the company aims to redefine product design workflows through AI-driven, code-centric solutions that work in seconds rather than minutes.









