Job Senior Research Data Engineer,Jobs Berlin Berlin,Stellenangebote in Deutschland,IT/Informationstechnik,DeepL GmbH

Meet DeepL

DeepL is a global communications platform powered by Language AI. Since 2017, we’ve been on a mission to break down language barriers. Our human-sounding translations and intelligent writing suggestions are designed with enterprise security in mind. Today, they enable over 100,000 businesses to transform communications, reach new markets, and improve productivity. And, empower millions of individuals worldwide to make sense of the world and express their ideas.

Our goal is to become the global leader in Language AI, building products that drive better communication, foster connections, and make a real-life impact. To achieve this, we need talented individuals like you to join our exciting journey. If you re ready to work with a dynamic team and build your career in the fast-moving AI space, DeepL is your next destination.

What sets us apart

What sets us apart is our blend of modern technology, competitive benefits, and an open, welcoming work culture that enables our people to thrive. When we share what it s like to work at DeepL, the reactions are overwhelmingly positive. This may be because of our products that have helped countless people worldwide or our shared mission to improve communication for individuals and businesses, bringing cultures closer together.

What we know for sure is this: being part of DeepL means joining a team dedicated to innovation and employee well-being. Discover what our teams have to say about life at DeepL on Linked In, Instagram and our Blog.

Meet the team behind this journey

DeepL is renowned for its AI products - from language and translation, to enterprise agents. At the core of these products are custom-built algorithms and models that are trained using data. The quality and volume of data are key factors in our success.

You will join our Foundation Model Training team. As a cross-functional team of research scientists and data engineers specialising in machine learning, we develop foundation models and manage the pre-training corpora and associated data preparation pipelines. We work with unstructured data on a petabyte scale. This is a fast-paced and highly competitive field where we face challenging problems at the frontier of research and engineering.

Your responsibilities

Work on ambitious frontier research projects as part of a foundation model training research team consisting of research scientists and research data engineers.
Architect, design and build scalable data pipelines from the ground up, e.g. for downloading and preparing multimodal unstructured data for training.
Build on top of a modern tech stack incl. Kubernetes, Dask, Ray, etc., and make extensive use of actively developing open-source solutions, where needed debugging low level issues and potentially submitting fixes to upstream.
Deploy complex Python data solutions to cloud infrastructure, incl. AWS and company data centers (on prem) where you will own operation of data processing at massive scale.
Go beyond “Big Data” and ETL. You will engineer and operate large scale data solutions for real-world unstructured data incl. text, code, image and audio modalities.
Collaborate with stakeholders, research scientists, other research data engineers and data tooling and platform teams.
Raise the standard for excellence and act as owner and champion for the quality and availability of our foundation model training data.
Ensure mission-critical reliability of data pipeline jobs, maintain high quality code with documentation and provide a great data product user experience.

Qualities we look for

Degree in a scientific or technical field.
Previous work experience as Data Engineer or a similar data- and engineering-centric role in a scaled-up tech company with a focus on large-scale unstructured data.
Extensive experience with large scale data engineering writing high quality Python code and leveraging the full Python data ecosystem in cloud deployments.
Exploratory data analysis, data cleaning, data validation, ideally ML feature engineering for text and other unstructured data.
Developing, testing and deploying data pipelines and infrastructure
End-to-end ownership of data solution development,…


Increase search radius (miles)



Sprache der Stellenausschreibung