Senior Research Engineer (Code World Models)

Posted Yesterday
Be an Early Applicant
11 Locations
In-Office
Senior level
Software
The Role
Lead pre-training and mid-training experiments for code-centric foundation models. Build large-scale data pipelines, handle code corpora and execution-based data, develop repository-level evaluations, and collaborate with researchers and engineers to improve model understanding of software systems.
Summary Generated by Built In

JetBrains is a global software company that creates intelligent tools for software developers and teams. Since 2000, we have built products that help developers work more productively, write higher-quality code, and stay focused on solving real problems.

The JetBrains Research team is looking for a Senior Research Engineer to work on Code World Models: models that learn how software systems behave, change, execute, and interact with developer tools.

This role is focused on model pre-training and mid-training for code-centric foundation models. You will work on data, training pipelines, evaluation, and experiments that improve how models understand programs, repositories, execution, tests, and software engineering workflows.

In this role, you will:
  • Design and run pre-training, continued pre-training, and mid-training experiments for code models.
  • Build and improve data pipelines for large-scale model training, including filtering, deduplication, mixture design, and dataset quality checks.
  • Work with code corpora, repositories, tests, execution traces, and synthetic data.
  • Develop evaluations for complex repository-level code reasoning tasks.
  • Collaborate with researchers and engineers working on ML for code and AI developer tools.
We’ll be happy to have you on our team if you:
  • Have hands-on experience with model pre-training, continued training, or mid-training.
  • Have strong engineering skills in Python and experience with modern ML frameworks.
  • Understand large-scale ML training workflows, including data processing, distributed training, checkpointing, evaluation, experiment tracking, and debugging.
  • Have experience working with large datasets and care about data quality, contamination, sampling, and reproducibility.
  • Have a background in NLP, ML for software engineering, or a similar domain.
  • Enjoy working on research problems with high uncertainty and turning ideas into working experiments.
It would be a plus if you:
  • Have experience training or adapting models for code generation, code understanding, software agents, program repair, test generation, or repository-level reasoning.
  • Have worked with execution-based data, such as unit tests, traces, logs, compiler feedback, runtime states, or sandboxed code execution.
  • Have experience with large-scale distributed training of models with 70B+ parameters.
  • Understand evaluation challenges for code models, including benchmark contamination, flaky tests, execution-based scoring, and long-horizon task evaluation.
  • Have contributed to ML infrastructure, open-source projects, or research systems.
#LI-KP1

We are an equal opportunity employer
We know great ideas can come from anyone, anywhere. That’s why we do our best to create an open and inclusive workplace – one that welcomes everyone regardless of their background, identity, religion, age, accessibility needs, or orientation.

We process the data provided in your job application in accordance with the Recruitment Privacy Policy.

Skills Required

  • Hands-on experience with model pre-training, continued training, or mid-training
  • Strong engineering skills in Python and experience with modern ML frameworks
  • Understanding of large-scale ML training workflows including data processing, distributed training, checkpointing, evaluation, experiment tracking, and debugging
  • Experience working with large datasets and attention to data quality, contamination, sampling, and reproducibility
  • Background in NLP, ML for software engineering, or a similar domain
  • Comfort with research problems of high uncertainty and turning ideas into working experiments
  • Experience training or adapting models for code generation, code understanding, software agents, program repair, test generation, or repository-level reasoning
  • Experience with execution-based data (unit tests, traces, logs, compiler feedback, runtime states, sandboxed execution)
  • Experience with large-scale distributed training of models with 70B+ parameters
  • Understanding of evaluation challenges for code models including benchmark contamination, flaky tests, execution-based scoring, and long-horizon evaluation
  • Contributions to ML infrastructure, open-source projects, or research systems

JetBrains Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about JetBrains and has not been reviewed or approved by JetBrains.

  • Fair & Transparent Compensation Fair & Transparent Compensation: Pay is considered market-competitive for many product and engineering roles in key locations, with published salary ranges on some postings aiding expectation-setting. Feedback suggests total compensation feels fair relative to local markets even if packages are positioned as “competitive but not top‑of‑big‑tech”.
  • Flexible Benefits Flexible Benefits: Work setup includes hybrid/remote options and flexible hours across many locations. The ability to work abroad for part of the year adds practical flexibility to where work gets done.
  • Leave & Time Off Breadth Leave & Time Off Breadth: Time off includes additional vacation days beyond local minimums in many countries. U.S. materials also highlight PTO, sick leave, and holidays, underscoring breadth beyond statutory baselines.

JetBrains Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Praha 4
2,209 Employees
Year Founded: 2000

What We Do

JetBrains creates intelligent software development tools consistently used and trusted by 11.4 million professionals and 88 Fortune Global Top 100 companies. Our lineup of more than 30 products includes IDEs for most programming languages and technologies, such as IntelliJ IDEA, PyCharm, and others, as well as products for team collaboration, like YouTrack and TeamCity. JetBrains is also known for creating the Kotlin programming language, a cross-platform language used by more than 5 million developers worldwide yearly and recommended by Google as the preferred language for Android development. The company is headquartered in Prague, Czech Republic, and has offices around the world. JetBrains IDEs * IntelliJ IDEA (Java and Kotlin Developers) * PyCharm (Python developers) * PhpStorm (PHP developers) * GoLand (Go developers) * Rider (.NET developers) * CLion (C and C++ developers) * Rust Rover (Rust developers) * WebStorm (JavaScript & TypesScript developers) * RubyMine (Ruby and Rails developers) * DataGrip (Tool for multiple databases) * ReSharper (Extension for Visual Studio) * Fleet (Multilingual IDE and code editor) * Aqua (IDE for test automation engineers) .NET & Visual Studio: * Rider (IDE for .NET developers) * ReSharper (Extension for Visual Studio) * ReSharper C++ (Visual Studio Extension for C++ developers) * dotCover (.NET Unit Test Runner and Code Coverage Tool) * dotMemory (.NET Memory Profiler) * dotTrace (.NET Performance Profiler) * dotPeek (.NET decompiler and assembly browser) Team Tools: * TeamCity (Powerful CI out of the box) * YouTrack (Project management for all your teams) * Space (Intelligent code collaboration platform) * Datalore (Collaborative data science platform) * Qodana (Code quality platform for teams) Programming Languages: * Kotlin (Programming Language for the JVM and Android) * MPS (Create Your Own Domain-Specific Language) Education: * JetBrains Academy (Learn and Teach Computer Science) Profile by JetBrains s.r.o.

Similar Jobs

Mondelēz International Logo Mondelēz International

Brand Manager

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Hybrid
Prague, CZE
90000 Employees

Mondelēz International Logo Mondelēz International

Brand Manager

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Hybrid
Prague, CZE
90000 Employees

Pfizer Logo Pfizer

Director Marketing Academy, End to End Excellence

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
In-Office or Remote
29 Locations
121990 Employees
163K-272K Annually

Adyen Logo Adyen

Business Development Representative

Fintech • Payments • Financial Services
Easy Apply
Hybrid
Prague, CZE
4771 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account