AI Data Engineer

Reposted 6 Days Ago
Boston, MA
In-Office
Mid level
Healthtech • Hospitality • Telehealth
The Role
The Data Engineer will develop and optimize data pipelines for healthcare datasets, ensuring data quality and compliance with healthcare regulations, while collaborating with data scientists and engineers.
Summary Generated by Built In
Position Summary

The Data Engineer will play a crucial role in developing and fine-tuning data specifically for our LLMs and machine learning models. This individual will be responsible for the entire data lifecycle, including gathering, cleaning, structuring, and optimizing large, diverse healthcare datasets. The ideal candidate will have a strong background in data engineering principles, experience with big data technologies, and a keen understanding of the unique challenges and requirements of healthcare data.

You will design, build, and maintain scalable data pipelines that source, preprocess, and deliver high-quality, high-volume datasets to our machine learning engineers. This role requires a deep understanding of data engineering best practices coupled with specific knowledge of the data requirements for LLM training and refinement

Key Responsibilities
  • Collaborate with data scientists and machine learning engineers to understand data requirements for LLM and machine learning model fine-tuning.
  • Design, build, and maintain scalable data pipelines to ingest, process, and store massive and diverse healthcare datasets.
  • Implement robust data validation and monitoring to ensure the integrity, accuracy, and consistency of all training datasets.
  • Implement robust data cleaning, validation, and transformation processes to ensure data quality and integrity.
  • Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models.
  • Work with the team to identify and acquire new data sources, ensuring compliance with relevant healthcare regulations (e.g., HIPAA).
  • Monitor data pipeline performance, troubleshoot issues, and implement optimizations to improve efficiency and reliability.
  • Document data engineering processes, data models, and data dictionaries.
  • Stay up-to-date with the latest advancements in data engineering, big data technologies, and machine learning.

RequirementsRequired
  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Data Engineer, with a focus on big data technologies.
  • Strong proficiency in programming languages such as Python, Scala, or Java.
  • Extensive experience with data warehousing, ETL processes, and data modeling.
  • Experience with major cloud providers (e.g., AWS, GCP, Azure) and their data storage and processing services.
  • Hands-on experience with big data frameworks like Apache Spark for distributed processing.
  • Excellent problem-solving skills and the ability to work independently and as part of a team.
  • Strong communication and interpersonal skills.
Preferred
  • Master's degree in a related field.
  • Experience with healthcare data and a good understanding of healthcare data standards (e.g., FHIR, HL7).
  • Familiarity with machine learning concepts and LLM fine-tuning processes.
  • Experience with data orchestration tools (e.g., Apache Airflow).

Benefits

Why Join Us?

Joining C the Signs is not just about building AI; it’s about shaping the future of healthcare. If you are a technical leader with an unshakable belief in the power of AI to save lives and the ability to make it happen at scale, this is your opportunity to create a tangible, global impact.

Benefits:

  • Competitive salary and benefits package.
  • Flexible working arrangements (remote or hybrid options available).
  • The opportunity to work on life-changing AI technology that directly impacts patient outcomes.
  • Join a team that combines cutting-edge innovation with a mission to save lives and improve health equity.
  • Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.

Top Skills

Apache Airflow
Spark
AWS
Azure
GCP
Java
Python
Scala
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: London
39 Employees
Year Founded: 2017

What We Do

C the Signs is a cancer prediction system that can identify patients at risk of cancer at the earliest and most curable stage of the disease. In under 30 seconds, C the Signs can rapidly identify which cancers a patient is at risk of and recommend the most appropriate test or specialist to diagnose their cancer. Using the latest technology, research and evidence, C the Signs enables healthcare providers to give their patients the best chance of surviving cancer

Similar Jobs

Capital One Logo Capital One

Lead Data Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
4 Locations
55000 Employees
193K-241K Annually

athenahealth Logo athenahealth

Senior Software Engineer

Healthtech • Information Technology • Telehealth
In-Office
Boston, MA, USA
7200 Employees
119K-203K Annually

Kyndryl Logo Kyndryl

Data Engineer

Cloud • Information Technology • Consulting
In-Office
20 Locations
46070 Employees
111K-253K Annually

G2i Logo G2i

Software Engineer

HR Tech • Other • Professional Services
In-Office or Remote
99 Locations
201 Employees

Similar Companies Hiring

Camber Thumbnail
Social Impact • Healthtech • Fintech
New York, NY
53 Employees
Sailor Health Thumbnail
Telehealth • Social Impact • Healthtech
New York City, NY
20 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account