Collinear AI

Research Scientist and Research Engineer

Reposted 19 Days Ago

Be an Early Applicant

Sunnyvale, CA, USA

In-Office

160K-350K Annually

Mid level

Artificial Intelligence • Machine Learning • Software • Generative AI

The Role

As a Research Scientist and Engineer at Collinear, you will create AI simulation environments, develop evaluation systems, and enhance real-world model performance through rigorous data-driven approaches.

Summary Generated by Built In

About Collinear

At Collinear, we help teams fearlessly ship AI.

Frontier labs and AI-native companies use our SimLab to find capability gaps in their agents and generate high-quality data to close them. We believe that the next generation of AI progress won't come from just bigger models, but from more rigorous, long-horizon simulation and programmatic verification.

SimLab allows researchers to spin up realistic environments, run agents through complex tasks, and surface failure modes under real-world conditions. We then close the loop by generating targeted synthetic data to retrain models, delivering measurable quality lift on the metrics that actually matter.

About the Role

We are looking for Research Scientists and Research Engineers to help us build the data engine for frontier AI. In this role, you will bridge the gap between frontier research and production engineering. You will develop the high-fidelity environments and evaluation stacks that the world’s leading AI labs rely on to stress-test their most advanced agents. Your work will involve iterating on novel RL approaches and translating them into robust, scalable infrastructure that moves the needle on real-world model metrics.

Responsibilities:

Build Agentic Environments: Design and implement the next generation of "SimLabs", ultra-realistic, long-horizon simulation environments where agents learn to navigate ambiguity and maintain context.
Programmatic Verification: Develop rigorous, policy-aware judges and evaluations that measure genuine capability and safety beyond simple benchmarks.
Close the Loop: Design and execute high-quality post-training runs (CPT, SFT, RL) to deliver frontier performance on open-source models using curated, high-signal data.
Rapid Iteration: Debug and iterate across the full ML stack, from infrastructure to model behavior, ensuring our tools remain "command-line first" and developer-friendly.
Collaborate: Work daily with the founders and research staff to shape the roadmap and push the state-of-the-art in AI reliability.

About You

We are looking for individuals who demonstrate a rare combination of technical depth, research intuition, and high agency.

Technical Foundation: A Bachelor’s, Master’s, or PhD in a technical field (CS, Math, Physics, etc.), or a demonstrated "proof of work" through significant open-source contributions or industry experience.
Engineering Rigor: A strong foundation in software engineering with the ability to build robust, scalable infrastructure. You should be comfortable in a Python-friendly, CLI-first development environment.
ML Fluency: A principled understanding of foundation models, including how they are constructed, evaluated, and optimized.
Empirical Mindset: Experience conducting research or technical experiments with a focus on reproducibility and data-driven results.

What will make you stand out

Research Taste: You have a strong intuition for identifying what matters in complex problem spaces. You can balance deep research exploration with the pragmatism needed to ship a product.
Impact-Driven Agency: You care about outcomes, not just activity. You don't wait for a ticket; you identify gaps in the system, build the solution, and ensure it moves real-world metrics for frontier AI labs.
Domain Expertise: Prior experience with Reinforcement Learning (RLHF/RLAIF), simulation systems, or building long-horizon agentic environments.
Proven Track Record: A history of contributing to influential ML research (e.g., publications at NeurIPS, ICLR, ICML) or maintaining high-impact open-source projects.
Post-Training Experience: Experience fine-tuning or evaluating large-scale models to deliver "frontier performance" on open-source benchmarks.

Why Join Collinear

Own the Frontier: Work on the most pressing problem in AI today: making agents reliable enough for production.
High Density of Talent: Join a small, elite team where you will be pushed to do your life's work.
Elite Compensation: We offer competitive salary and equity packages to ensure we attract the best of the best.

Direct Impact: At a seed-backed startup, your work directly shapes the company's trajectory and the future of AI safety.

Collinear is an equal opportunity employer and values diversity. We do not discriminate on the basis of race, color, religion, sex, gender identity, sexual orientation, national origin, age, disability, veteran status, or any other characteristic protected by applicable law.

The base salary range for this role in California is $160,000 to $350,000 per year, depending on experience, skills, and qualifications. This role will also be eligible for equity, benefits, and bonuses.

Collinear provides reasonable accommodations for candidates with disabilities throughout the application and hiring process. If you need an accommodation, please contact us.

Pursuant to applicable local ordinances, we will consider qualified applicants with arrest and conviction records.

Skills Required

Bachelor's, Master's, or PhD in a technical field (CS, Math, Physics, etc.)
Strong foundation in software engineering
Understanding of foundation models
Experience with reproducibility in research
Prior experience with Reinforcement Learning
History of contributing to influential ML research or open-source projects

View all jobs at Collinear AI

View Collinear AI Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

What We Do

Collinear AI builds simulation labs where AI agents learn to work in the real world by simulating users, tools, and workflows to improve AI models before deployment, focusing on AI safety, reliability, and customization for enterprise GenAI.