Senior Research Data Engineer

Posted 15 Days Ago
Be an Early Applicant
Berlin
In-Office
Mid level
Artificial Intelligence
The Role
Join the Foundation Model Training team to develop and manage large-scale data pipelines for AI-driven language products, ensuring data quality and processing. Collaborate with research scientists to improve model training outcomes.
Summary Generated by Built In
Meet DeepL

DeepL is a global communications platform powered by Language AI. Since 2017, we’ve been on a mission to break down language barriers. Our human-sounding translations and intelligent writing suggestions are designed with enterprise security in mind. Today, they enable over 100,000 businesses to transform communications, reach new markets, and improve productivity. And, empower millions of individuals worldwide to make sense of the world and express their ideas.

Our goal is to become the global leader in Language AI, building products that drive better communication, foster connections, and make a real-life impact. To achieve this, we need talented individuals like you to join our exciting journey. If you're ready to work with a dynamic team and build your career in the fast-moving AI space, DeepL is your next destination.

What sets us apart

What sets us apart is our blend of modern technology, competitive benefits, and an open, welcoming work culture that enables our people to thrive. When we share what it's like to work at DeepL, the reactions are overwhelmingly positive. This may be because of our products that have helped countless people worldwide or our shared mission to improve communication for individuals and businesses, bringing cultures closer together. What we know for sure is this: being part of DeepL means joining a team dedicated to innovation and employee well-being. Discover what our teams have to say about life at DeepL on LinkedIn, Instagram and our Blog.

Meet the team behind this journey

DeepL is renowned for its translation and language AI products. At the core of these products are custom-built algorithms and models that are trained using data. The quality and volume of data are key factors in our success.

You will join our Foundation Model Training team. As a cross-functional team of research scientists and data engineers specialising in machine learning, we develop foundation models and manage the pre-training corpora and associated data preparation pipelines. We work with unstructured data on a petabyte scale. This is a fast-paced and highly competitive field.

Your responsibilities
  • Work as part of a foundation model training research team consisting of research scientists and research data engineers.

  • Deploy to cloud infrastructure, incl. AWS and company data centers (on prem) where you will own operation of data processing at massive scale.

  • Build and process large scale datasets of real-world unstructured text documents and other unstructured data like images or audio, and attached metadata.

  • Collaborate with stakeholders, research scientists, other research data engineers and data tooling and platform teams.

  • Act as an owner for the quality and availability of our foundation model training data.

  • Maintain high quality code with documentation and provide a great data product user experience.

  • Participation in our on-call rotation: You’ll ensure the reliability and availability of our services by being available to join the team's shared on-call rotation as needed.

Qualities we look for
  • Degree in a scientific or technical field.

  • Work experience in a scaled-up tech company, ideally with a focus on large-scale unstructured data.

  • Extensive experience with data engineering using Python and Python data ecosystem in cloud deployments.

  • Exploratory data analysis, data cleaning, data validation, ideally ML feature engineering for text and other unstructured data.

  • Developing, testing and deploying data pipelines and infrastructure

  • End-to-end ownership of data solution development, operations, testing and quality assurance, ideally responsibility for data products.

  • Experience with distributed computing and infrastructure-as-code, ideally Kubernetes on AWS.

  • Excellent communication and collaboration skills in an English-speaking organization.

Ideally, you have domain-specific experiences:

  • LLM training data preparation.

  • NLP, text classification, model-based/GPU workflows.

  • Dynamic workflow orchestration frameworks like Argo Workflows.

  • Linguistics expertise or speaking multiple languages.

What we offer
  • Diverse and internationally distributed team: joining our team means becoming part of a large, global community with people of more than 90 nationalities. We're more than just colleagues; we're a group of professionals with a shared mission to connect diverse cultures. Our global presence is growing–we've doubled in size nearly every year, with our employees based in the UK, Germany, the Netherlands, Poland, the US, and Japan, and we continue to expand our network.

  • Open communication, regular feedback: as a language-focused company, we value the importance of clear, honest communication. We value smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.

  • Hybrid work, flexible hours: we offer a hybrid work schedule, with team members coming into the office twice a week. This allows you to engage directly with your team and experience the unique energy of our workspace, while still enjoying the flexibility and comfort of working from home. With flexible working hours and trust in your productivity, we are in sync with your team’s general locations and time zones to foster effective and seamless collaboration.

  • Regular in-person team events: we bond over vibrant events that are as unique as our team, from local team and business unit gatherings, to new-joiner onboardings, to company-wide events that bring us all together–literally.

  • Monthly full-day hacking sessions: every month, we have Hack Fridays, where you can spend your time diving into a project you're passionate about and get the opportunity to work with other teams–we value your initiatives, impact, and creativity.

  • 30 days of annual leave: we value your peace of mind. With 30 days off (excluding public holidays) and access to mental health resources, we make sure you're as strong mentally as you are professionally.

  • Competitive benefits: just as our team spans the globe, so does our benefits package. We've crafted it to reflect the diversity of our team and tailored it to align with your unique location, to ensure you feel supported every step of the way.

If this role and our mission resonate with you, but you're hesitant because you don't check all the boxes, don't let that hold you back. At DeepL, it's all about the value you bring and the growth we can foster together. Go ahead, apply—let's discover your potential together. We can't wait to meet you!

We are an equal opportunity employer

You are welcome at DeepL for who you are—we appreciate authenticity here. Our product is for everyone, and so is our workplace. The more voices we have represented and amplified in our business, the more we will all succeed, contribute, and think forward! So bring us your personal experience, your perspectives, and your background. It’s in our diversity that we will find the power to break down language barriers in the world.

Top Skills

Argo Workflows
AWS
Kubernetes
Machine Learning
Nlp
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Cologne
1,265 Employees
Year Founded: 2017

What We Do

DeepL is a global communications platform powered by Language AI. Since 2017, we’ve been on a mission to break down language barriers. Our human-sounding translations and intelligent writing suggestions are designed with enterprise security in mind. Today, they enable over 100,000 businesses to transform communications, reach new markets, and improve productivity. And, empower millions of individuals around the world to make sense of the world and express their ideas.

Join us in exploring the possibilities of Language AI!

Similar Jobs

HERE Technologies Logo HERE Technologies

Industry Solutions Manager - Transport & Logistics

Artificial Intelligence • Automotive • Computer Vision • Information Technology • Internet of Things • Logistics • Software
Hybrid
3 Locations
6000 Employees

HERE Technologies Logo HERE Technologies

Senior Account Executive

Artificial Intelligence • Automotive • Computer Vision • Information Technology • Internet of Things • Logistics • Software
Hybrid
4 Locations
6000 Employees
100K-200K

Snap Inc. Logo Snap Inc.

Account Manager

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
2 Locations
5000 Employees
Hybrid
Berlin, DEU
289097 Employees

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account