Deepmind

Research Scientist, Personalized Generative AI

Reposted 8 Days Ago

Be an Early Applicant

Mountain View, CA

In-Office

166K-291K

Expert/Leader

Artificial Intelligence

The Role

Develop and implement multiturn reinforcement learning algorithms for personalized language models, collaborate with teams, and drive cutting-edge research in AI.

Summary Generated by Built In

Snapshot

Artificial Intelligence could be one of humanity’s most useful inventions. At DeepMind, we’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority.

About Us

The current paradigm for Large Language Models (LLMs) is largely "one-size-fits-all". While powerful, this approach fails to capture the diverse, implicit, and evolving preferences of individual users. A user's true intent is often revealed not in a single prompt, but over the course of a long, interactive conversation. The next frontier in AI is to move beyond static instruction-following and create models that dynamically learn and adapt to each user, personalizing their behavior to maximize helpfulness and satisfaction over the long term.

Our team is focused on this challenge: teaching Gemini to personalize itself through interaction. We frame this problem as a multi-turn, imperfect information game where the model must learn to infer a user's latent goals and preferences from conversational cues. Our aim is to leverage advanced Reinforcement Learning techniques to optimize for long-horizon user satisfaction, this involves tackling complex credit assignment problems in stateful, interactive environments.

The techniques you develop will have a direct impact on a wide range of Gemini applications like complex, multi-step tool use and agentic workflows.

Key responsibilities:

Design and implement novel multiturn RL algorithms to train personalized LLMs. This includes exploring advanced methods for credit assignment, exploration/exploitation strategies.
Develop and scale our training infrastructure, building on our existing framework for training against stateful user simulators.
Formalize the problem of personalization by creating new metrics, environments, and evaluation methodologies that capture long-term user satisfaction and preference alignment.
Collaborate closely with product teams to integrate these personalization capabilities into core Gemini products, improving tasks that require sustained interaction and user understanding.
Do cutting-edge research that pushes the boundaries of how agents learn from interactive, human-in-the-loop data

To make this effort successful, we need a strong RS who can help us deliver state-of-the-art personalized models. We are looking for a candidate with deep expertise in reinforcement learning and large-scale ML systems. You should be passionate about solving complex, long-horizon problems and excited by the challenge of building truly adaptive and intelligent agents.

About You

In order to set you up for success as a Research Scientist at DeepMind, we look for the following skills and experience:

PhD in Machine Learning, Reinforcement Learning, Natural Language Processing, or a related field.
Strong data analysis and synthetic data generation skills.
Strong development skills in Python and experience with deep learning frameworks like JAX, PyTorch, or TensorFlow.
Experience building and working with large-scale ML training systems.

In addition, the following would be an advantage:

Deep theoretical and practical experience in Reinforcement Learning (e.g., policy gradient methods, value-based methods, model-based RL, credit assignment).
Experience developing and training large generative models (LLMs).
Strong track record of academic publications in top-tier conferences (e.g., NeurIPS, ICML, ICLR, AAAI).
Familiarity with research on game theory, multi-agent systems, or learning from human feedback (RLHF/RLAIF).
Experience building or using user simulators for RL training.

The US base salary range for this full-time position is between $166,000 - $291,000 + bonus + equity + benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.

Application Deadline: September 9, 2025

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.

Top Skills

Jax

Python

PyTorch

TensorFlow

View all jobs at Deepmind

View Deepmind Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

1,218 Employees

Year Founded: 2010

What We Do

We’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority.

Our long term aim is to solve intelligence, developing more general and capable problem-solving systems, known as artificial general intelligence (AGI).

Guided by safety and ethics, this invention could help society find answers to some of the world’s most pressing and fundamental scientific challenges.

We have a track record of breakthroughs in fundamental AI research, published in journals like Nature, Science, and more.Our programs have learned to diagnose eye diseases as effectively as the world’s top doctors, to save 30% of the energy used to keep data centres cool, and to predict the complex 3D shapes of proteins - which could one day transform how drugs are invented.