ReflectionAI

Research Program Manager - Research Infrastructure

Reposted 10 Days Ago

Be an Early Applicant

3 Locations

In-Office

Senior level

Software

The Role

The Research Program Manager will oversee cross-functional programs focused on enhancing training infrastructure, ensuring reliability, and facilitating effective communication among multiple teams in a high-paced environment.

Summary Generated by Built In

Our Mission

Reflection is a research lab making intelligence open and accessible for everyone to use, customize, and build on. We build open models that let anyone control their intelligence and help shape the future of AI. Our mission: make intelligence open and accessible to all.

About the Role

Research Program Managers at Reflection are high-leverage leaders and operators who embed directly with research and infrastructure teams to accelerate the pace of frontier model development. They are not project trackers. They are force multipliers who bring clarity to ambiguity, drive decisions when the path forward is unclear, and ensure that the work happening across multiple teams connects into a coherent whole.

This role focuses on scaling our research infrastructure to support massive, frontier-scale training runs across pre-training, mid-training, and post-training. You will work closely with teams building on training libraries like Megatron, driving the programs that turn raw clusters into reliable, high-performance training environments. Your job is to make sure the infrastructure we build works end-to-end, that teams are unblocked, and that we can scale with confidence as our ambitions grow.

You bring a first-responder mentality. When things go sideways, you don't wait to be asked. You jump in, assess the situation, cut through noise, align the people who need to be aligned, and drive resolution.

What You'll Do

Own cross-functional programs spanning training infrastructure and cluster reliability across pre-training, mid-training, and post-training workstreams.
Drive end-to-end coordination scaling our training stack alongside engineering leads and external partners.
Jump into active incidents and escalations to triage, coordinate response, and drive resolution across teams. Champion a culture of blameless post-mortems and continuous learning, turning every incident into a concrete improvement to our systems and processes.
Partner with infrastructure and research engineering leads to identify bottlenecks, define priorities, and ensure that infrastructure investments are directly tied to research velocity.
Build and maintain visibility into training run health, cluster reliability, and infrastructure performance so that leadership and teams have the context they need to make fast, informed decisions.
Create lightweight, durable processes for cross-team handoffs, config management, checkpoint workflows, and other coordination-heavy touchpoints that currently rely on ad hoc communication.
Translate technical complexity into clear status updates and decision frameworks for engineering leadership and executives.

About You

7+ years of experience in technical program management, research operations, or infrastructure coordination, ideally in ML/AI or large-scale distributed systems environments.
Deep technical knowledge to engage with engineers on topics like distributed training frameworks, GPU cluster architecture, scheduler behavior, networking, and storage systems. You don't need to write the code, but you need to understand the systems to “speak the language”, i.e., to ask the right questions and identify risks early.
Proven ability to operate effectively in high-ambiguity, fast-moving environments. You create structure where there is none and drive clarity without waiting for permission.
Track record of managing complex, multi-team programs with competing priorities and hard deadlines. You know how to make tradeoffs and you communicate them clearly.
Strong stakeholder management skills across both deeply technical ICs and senior leadership. You build trust by being reliable, direct, and well-informed.
Comfortable operating in crisis mode. You stay calm under pressure, you know how to prioritize when everything is on fire, and you follow through on the other side.
Excited to build from zero to one. We are a small, fast-moving team and this role will help define how Research Program management Works at Reflection.
Motivated by enabling researchers and engineers to build the world's most capable open-weight AI systems.

What We Offer:

We believe that to make intelligence open and accessible to all, you need to start at the foundation. Joining Reflection means building from the ground up as part of a talent-dense team. You will help define our future as a company, and help define the future of open foundational models.

We want you to do the most impactful work of your career with the confidence that you and the people you care about most are supported.

Top-tier compensation: Salary and equity structured to recognize and retain our talent globally.
Stock options: Everyone who joins and contributes to Reflection's success gets to share in the upside through stock options.
Health & wellness: Comprehensive medical, dental, vision, and life, with an annual wellness allowance.
Meals: Lunch and dinner are provided in the office daily.
Life & family: 22 weeks paid parental leave for all new birthing and non-birthing parents, including adoptive and surrogate journeys.
Vacation days: Unlimited paid time off in the U.S. and 30 days in the U.K.
Sponsorship support: We sponsor visas to help exceptional talent join our team and support long-term immigration pathways where applicable.
Team building: We have regular off-sites, happy hours, and team celebrations.

Export Control Notice: This position may require access to technology or source code subject to the U.S. Export Administration Regulations. Any offer of employment for this role may be conditioned on the Company's ability to provide the candidate with access to such technology or source code in compliance with applicable U.S. export control laws, which may require the Company to seek government authorization.

Skills Required

7+ years of experience in technical program management, research operations, or infrastructure coordination
Deep technical knowledge to engage with engineers on distributed training frameworks and infrastructure systems
Proven ability to operate in high-ambiguity environments and drive clarity
Track record of managing complex, multi-team programs with competing priorities
Strong stakeholder management skills across technical and leadership levels
Comfortable operating in crisis mode and prioritizing under pressure
Motivated to build from scratch in a fast-moving team

View all jobs at ReflectionAI

View ReflectionAI Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Brooklyn, New York

38 Employees

What We Do

Reflection was founded by former DeepMind and OpenAI researchers to build superintelligent coding agents. We previously built the most powerful LLM (ChatGPT, Gemini) and agent (AlphaGo, AlphaZero) systems in the world. Reflection’s mission is to build superhuman coding agents. Today’s language models are powerful, but they fall short when it comes to tasks that require acting over many steps. The reason is simple. These models were never trained for autonomy. Our goal is to create the most capable and reliable coding agents in the world. Our product is a Coding Agent API that helps automate rote engineering work.