Staff AI Data Engineer (Remote)

Posted Yesterday
Hiring Remotely in United States
Remote
Senior level
Healthtech • Other • Social Impact • Software • Telehealth
Our mission is to make mental healthcare work for everyone.
The Role
As a Staff AI Data Engineer, you will build and maintain data pipelines for ML/AI tools, ensuring high-quality datasets and collaborating with data scientists to enhance mental health care.
Summary Generated by Built In

We believe that mental health is just as important as physical health. We recognize that mental health issues can be complex and multifaceted, and we are dedicated to treating the whole person, not just the symptoms.

We aim to create a world where mental health is no longer stigmatized or marginalized, but rather is embraced as an integral part of one's overall well-being. 

We believe that by providing quality care that is both evidence-based and compassionate, we can empower individuals to take charge of their mental health and achieve their full potential. We are passionate about making a positive impact on the lives of those struggling with mental health issues and we strive to be a force for positive change in the field of mental healthcare.

About the Role

We’re shaping the future of mental health care with AI-enabled experiences that enhance, not replace, the human connection at the core of therapy. Our north star is clinically-grounded and responsible AI designed to bring greater transparency, personalization, and continuous support across the therapy journey. Our work transforms therapy into an experience that’s more connected and accessible. As we expand our portfolio of AI experiences, we’re scaling our team to drive innovation and set a new standard for mental health care.

As a Data Engineer, you will help build and maintain the data pipelines that pull information from our central storage system to train machine learning models and AI tools, supporting a variety of use cases that support our providers and improve patient outcomes. You will be part of a collaborative group that values open discussions and quick adjustments to meet changing needs, working alongside data experts and other specialists to turn raw information into useful resources for our mission. This role sits within our data team, which is part of the overall engineering organization and is a close partner team to our ML Team, where your daily work—designing reliable flows of information, testing for accuracy, and solving unexpected challenges—will directly support innovations that help more individuals get the mental health support they deserve. If you enjoy turning complex data into something that makes a real difference in people's lives, this is your chance to contribute to meaningful advancements in health care.

Required Qualifications

  • 8+ years of Data Pipeline Development – specifically building and maintaining scalable ETL/ELT pipelines for ML/AI training workflows, using tools like AWS Glue, DBT, Dagster, Spark, or Ray for distributed processing of large-scale structured and unstructured data from Data Lakes. Strong proficiency in Spark, Python, and SQL for feature engineering, data transformation, and ensuring high-quality, versioned datasets suitable for model training and inference.

  • 8+ years of Cloud Infrastructure & Data Warehousing experience, 4+ of which with a focus in AWS. This person should be proficient in AWS services such as Redshift, S3, Glue, IAM, EMR, and SageMaker for supporting ML/AI pipelines. Candidates may bring additional experience from other cloud environments (e.g., GCP services like BigQuery, GCS, Dataflow, or AI Platform; Azure services like Synapse Analytics, Blob Storage, Databricks, or Machine Learning Studio) to complement their AWS expertise. Experience optimizing data warehouses (e.g., Redshift, Snowflake, BigQuery) and managing data lakes (e.g., S3, GCS, Azure Blob) for large-scale, versioned ML training datasets, with a focus on partitioning, access controls, and integration with distributed processing frameworks like Spark.

  • Implementing scalable data validation, quality checks, and error-handling mechanisms tailored for ML/AI pipelines, including bias detection, anomaly identification, and dataset integrity to ensure high-fidelity training data. Familiarity with data governance practices, such as metadata management, lineage tracking for reproducible models, and compliance with regulations like CPRA or HIPAA in Data Lake environments.

  • Optimizing data pipelines, queries, and managing large datasets for efficiency and scalability. Knowledge of best practices for high-throughput systems.

  • Experience with data security measures (encryption, role-based access control, data masking). Understanding of compliance standards (e.g., HIPAA, SOC 2) and their application in data engineering.

  • Strong ability to work cross-functionally with data analysts, data scientists, and stakeholders. Effective communication skills to explain technical concepts to non-technical audiences. Adaptability to thrive in a fast-paced startup environment.

Preferred Qualifications

While having the preferred qualifications enhances your candidacy, having all of them is not mandatory. We encourage all interested applicants to apply, even those who may not meet every preferred requirement.

  • Hands-on experience with AWS tools like S3, Glue, EMR, SageMaker, and Lambda for building scalable ETL/ELT pipelines optimized for ML/LLM training, including feature engineering, data versioning, and handling large-scale unstructured data

  • Proven track record in implementing robust data validation, bias detection, and lineage tracking in Data Lakes, with familiarity in compliance standards (e.g., HIPAA for health data) and tools like Delta Lake or Iceberg to ensure high-fidelity, reproducible datasets for model training.

  • Familiarity with infrastructure as code (IaC) tools like Terraform or CloudFormation for managing cloud resources.

  • Experience implementing and maintaining CI/CD pipelines for data workflows.

  • Experience in monitoring and reducing costs for large-scale ML/AI workflows in AWS, using techniques like spot instances for training jobs, auto-scaling EMR clusters, and efficient S3 storage tiers (e.g., Intelligent-Tiering) to minimize expenses while maintaining performance.

  • Strong ability to partner with data scientists and ML engineers to design efficient pipelines, using orchestration tools (e.g., Airflow, Dagster) for incremental loading and cost optimization, while monitoring performance metrics like latency and resource utilization in AWS environments.

We're serious about your well-being! As part of our team, full-time employees receive:

  • 100% remote work environment (US-based only): Working hours to support a healthy work-life balance, ensuring you can meet both professional and personal commitments

  • Attractive pay and benefits: Full transparency of pay ranges regardless of where you live in the United States

  • Comprehensive health benefits: Medical, dental, vision, life, disability, and FSA/HSA

  • 401(k) plan access: Start saving for your future

  • Generous time-off policies: Including 2 company-wide shutdown weeks each year for self-care (for most employees)

  • Paid parental leave: Available for all parents, including birthing, non-birthing, adopting, and fostering

  • Employee Assistance Program (EAP): Support for your mental and physical health

  • New hire home office stipend: Set up your workspace for success

  • Quarterly department stipend: Fund team-building activities or in-person gatherings

  • Wellness events and lunch & learns: Explore a variety of engaging topics

  • Community and employee resource groups: Participate in groups that celebrate employee identity and lived experiences, fostering a sense of community and belonging for all

Our team

We believe that diversity, equity, and inclusion are fundamental to our mission of making mental healthcare work for everyone.  We are dedicated to having a culture of inclusion that will support our employees in feeling safe, seen, heard, and valued.

Top Skills

Airflow
Aws Emr
Aws Glue
Aws Iam
Aws Redshift
Aws S3
Aws Sagemaker
Azure Synapse Analytics
CloudFormation
Dagster
Dbt
Delta Lake
Gcp Bigquery
Gcp Dataflow
Iceberg
Python
Ray
Spark
SQL
Terraform

What the Team is Saying

Colleen
William
Natalie
Jordan
Gabe
Michelle
Devonie
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
595 Employees
Year Founded: 2019

What We Do

We believe that when access to quality mental healthcare improves, patients, providers, and payers all benefit. And that’s why we’re on a mission to make mental healthcare work for everyone. We remove barriers and strengthen connection points between patients, providers, and payers to improve mental health outcomes.

With Rula, it’s easy for patients to find a high-quality therapist or psychiatric practitioner who meets their unique needs, accepts insurance, and is taking new patients.

For providers, Rula offers the flexibility of private practice, while also filling caseloads and offering the necessary behind-the-scenes support. Rula handles the marketing, credentialing, billing, and admin tasks so providers can focus on what they do best: providing care to those in need.

Rula is a remote-first company with teams who specialize in the areas of Clinical, Partnerships, Operations, Marketing, Engineering, Product, and more. We’re committed to reimagining how mental health is treated.

Why Work With Us

As a remote company, we're intentional about the culture we're building. We write things down, we communicate clearly, we follow-up, and we follow-through. We have high expectations for our team, and empower individuals with a high degree of trust and autonomy. In turn, we expect that individuals operate with a sense of ownership in everything.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Rula Offices

Remote Workspace

Employees work remotely.

We're a 100% remote company.

Typical time on-site: None
United States

Similar Jobs

Rula Logo Rula

Sr. Revenue Cycle Manager (Remote)

Healthtech • Other • Social Impact • Software • Telehealth
Remote
United States

Rula Logo Rula

Data Labeling Manager (Remote)

Healthtech • Other • Social Impact • Software • Telehealth
Remote
United States

Rula Logo Rula

Senior Product Manager

Healthtech • Other • Social Impact • Software • Telehealth
Remote
United States

Rula Logo Rula

Healthcare Partnerships Associate (Remote - Georgia)

Healthtech • Other • Social Impact • Software • Telehealth
Remote
Georgia, USA

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account