Research Engineer,Data

Posted 14 Days Ago
Be an Early Applicant
Singapore, SGP
In-Office
Mid level
Software
The Role
The role focuses on building data foundations for LLMs, overseeing data pipelines, sourcing, cleaning, validating, and optimizing data for AI models.
Summary Generated by Built In

About Bitdeer:

Bitdeer is a world-leading technology company for Bitcoin mining and AI cloud.
Bitdeer is committed to providing comprehensive Bitcoin mining solutions for its customers. Apart from designing industry-leading ASIC chips and manufacturing mining rigs, the Group handles complex processes involved in computing across the value chain. This includes equipment procurement, transport logistics, datacenter design and construction, equipment management, and network and facility operations. Bitdeer also offers advanced cloud capabilities to customers with a high demand for artificial intelligence.
Headquartered in Singapore, Bitdeer operates globally with a diversified 3 GW energy portfolio, and deploys Bitcoin mining and HPC datacenters in the United States, Bhutan, Norway, Canada, Malaysia, and Ethiopia.

About Bitdeer AI Lab

Bitdeer AI Lab is a frontier AI lab under Bitdeer, a global-leading computing power solutions provider. Guided by long-termism, we are committed to exploring the frontiers of artificial intelligence with the ambition, courage, and determination to build technologies that can truly change the world.

We believe that transformative breakthroughs in AI require both long-horizon thinking and relentless execution. Our mission is twofold: first, to effectively transform energy into intelligence; second, to push the limits of intelligence by rethinking AI systems and architectures that can learn more efficiently, reason more deeply, and scale more effectively.

Our vision is to create intelligence that learns more like humans do: efficiently, adaptively, and recursively, turning finite parameters and finite compute into unbounded potential. We pursue this work with a deep sense of purpose, believing that the most meaningful advances in AI will not only push the frontier of research, but also reshape the future of the world.

Our lab is equipped with thousands of cutting-edge GPUs dedicated to AI research, and we are committed to continuously investing in and expanding our computational infrastructure to support world-class research and engineering in artificial intelligence.

What you will be responsible for:

We are looking for exceptional talent to join us, helping build the data foundation for frontier AI models.This role is centered on building the data foundation for LLMs. You will own the end-to-end pre-training and post-training data pipeline, including data sourcing (open-source datasets and, where needed, web-scale crawling), large-scale cleaning and quality filtering (deduplication, formatting, sampling, and quality classification), data mixture design to find the optimal recipe across domains and stages, data validation through small-scale proxy runs and downstream evaluation, and iterative data optimization based on model eval signals. You will also drive synthetic data generation and data augmentation to extend coverage into high-value domains. Your work will directly shape the capability ceiling of in-house foundation models developed by Bitdeer AI Lab.

How you will stand out:

  • Strong Python engineering skills, with hands-on experience in Spark, Ray, or similar distributed data processing frameworks for terabyte- to petabyte-scale workloads.
  • Solid experience with large-scale data processing pipelines, including deduplication (e.g., MinHash/LSH), format normalization, sampling, and quality classification using rule-based filters and learned quality classifiers.
  • Experience acquiring training data from open-source corpora, and familiarity with (or willingness to build) large-scale web crawling, content extraction, and license/compliance handling.
  • Experience designing training data mixtures and running data ablations to identify optimal mixing ratios across domains and training stages (pre-training, mid-training, and post-training).
  • Experience validating data quality through small-scale proxy training runs and downstream evaluations, and iterating on data based on eval signals to close the loop between data and model behavior.
  • Experience with synthetic data generation and data augmentation, including prompting and distilling from strong LLMs, rejection sampling, and targeted augmentation for high-value domains (e.g., reasoning, code, math).
  • Strong analytical and debugging skills, with the ability to dig into edge cases in data, identify root causes of quality regressions, and turn findings into principled, reproducible pipeline improvements.

What you will experience working with us:

  • A culture that values authenticity and diversity of thoughts and backgrounds;
  • An inclusive and respectable environment with open workspaces and exciting start-up spirit;
  • Fast-growing company with the chance to network with industrial pioneers and enthusiasts;
  • Ability to contribute directly and make an impact on the future of the digital asset industry;
  • Involvement in new projects, developing processes/systems;
  • Personal accountability, autonomy, fast growth, and learning opportunities;
  • Attractive welfare benefits and developmental opportunities such as training and mentoring.
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
214 Employees

What We Do

Bitdeer Technologies Group (Nasdaq: BTDR) is a leader in the blockchain and high-performance computing industry. It is one of the world’s largest holders of proprietary hash rate and suppliers of hash rate. Bitdeer is committed to providing comprehensive computing solutions for its customers. The company was founded by Jihan Wu, an early advocate and pioneer in cryptocurrency who cofounded multiple leading companies serving the blockchain economy. Mr. Wu leads the company as Founder, Chairman, and CEO. Linghui Kong serves as Bitdeer’s CBO and provides leadership through deep industry knowledge and technology expertise. Headquartered in Singapore, Bitdeer has deployed mining datacenters in the United States, Norway, and Bhutan. It offers specialized mining infrastructure, high-quality hash rate sharing products, and reliable hosting services to global users. The company also offers advanced cloud capabilities for customers with high demands for artificial intelligence. Dedication, authenticity, and trustworthiness are foundational to our mission of becoming the world’s most reliable provider of full-spectrum blockchain and high-performance computing solutions. We welcome global talent to join us in shaping the future

Similar Jobs

In-Office
Singapore, SGP
1596 Employees
150K-200K Annually

BlackRock Logo BlackRock

Aladdin Client Engagement, Vice President

Fintech • Information Technology • Financial Services
In-Office or Remote
2 Locations
25000 Employees

MongoDB Logo MongoDB

Enterprise Account Executive

Big Data • Cloud • Software • Database
Easy Apply
Hybrid
Singapore, SGP
5550 Employees

Citadel Logo Citadel

Software Engineer

Information Technology • Software • Financial Services • Big Data Analytics
In-Office
5 Locations
4000 Employees
150K-300K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account