PointClickCare Jobs

Senior Applied Research Data Engineer (US)

PointClickCare

Senior Applied Research Data Engineer (US)

Reposted 5 Days Ago

Hiring Remotely in United States

Remote

183K-199K Annually

Senior level

Healthtech • Software

The Role

Build and own a versioned, documented gold data layer from silver Lakehouse sources for AI research. Transform, validate, and curate multi-modal datasets, build reusable Databricks/PySpark pipelines, automate quality and labeling, support researcher workflows, and maintain lineage and dataset snapshots for model development and evaluation.

Summary Generated by Built In

At PointClickCare our mission is simple: to help providers deliver exceptional care. And that starts with our people. As a leading health tech company that’s founder-led and privately held, we empower our employees to push boundaries, innovate, and shape the future of healthcare.

With the largest long-term and post-acute care dataset and a Marketplace of 400+ integrated partners, our platform serves over 30,000 provider organizations, making a real difference in millions of lives. We also reinvest a significant percentage of our revenue back into research and development, ensuring our employees have the resources to innovate and make a lasting impact. Recognized by Forbes as a top private cloud company and honored as one of Canada’s Most Admired Corporate Cultures, we offer flexibility, growth opportunities, and meaningful work.

At PointClickCare, we empower our people to be the architects of a smarter healthcare future; one that is human-first and accelerated by AI to create meaningful and lasting change. Employees harness AI as a catalyst for creativity, productivity, and thoughtful decision-making. By integrating AI tools into our daily workflows, collaboration is enhanced, outcomes are improved, and every team member has the proficiency to maximize their impact. It all starts with our hiring practices where we uncover AI expertise that complements our mission, and we continue to invest in training and development to nurture innovation throughout the employee journey.

Join us in redefining healthcare — so it doesn’t just survive, it thrives. To learn more about PointClickCare, check out Life at PointClickCare and connect with us on Glassdoor and LinkedIn.

**Travel to Office expectations**

For Remote Roles: If this role is remote, there will be in-office events that will require travel to and from the Mississauga and/or Salt Lake City office. These will include, but not limited to, onboarding, team events, semi-annual and annual team meetings.

For Hybrid Roles: If this role is Hybrid, there will be an expectation to reside within commutable distance to the office/location specified in the job listing. This will include, but not limited to, weekly/bi-weekly/monthly events in the office with your specific team. This is a requirement for this role.

About the Role

At PointClickCare, we are building the data foundation that powers the next generation of AI and machine learning products in healthcare. We are seeking a Senior Applied Research Data Engineer who thrives at the intersection of data engineering, applied research, and domain discovery.

This is not a traditional data engineering role focused solely on pipelines and infrastructure. You will work closely with AI researchers, data scientists, clinicians, and product experts to transform complex healthcare data into trusted, reusable, AI-ready research assets. Success in this role requires curiosity, investigative thinking, and the ability to uncover meaning in complex, poorly documented systems.

You will be responsible for learning new domains quickly by reading source code, reverse-engineering SQL and business logic, interviewing subject matter experts, and building durable semantic data products that support experimentation, model development, evaluation, and production AI systems.

The ideal candidate enjoys solving data mysteries, creating order from ambiguity, and building datasets that researchers trust. You understand that the quality, semantics, lineage, and documentation of a dataset are often more important than the model itself.

In this role, you will:

Build and own reusable gold-layer data products that power AI, machine learning, and generative AI research.
Transform structured, semi-structured, and unstructured healthcare data into trusted, model-ready datasets.
Investigate and document complex business logic by analyzing source systems, stored procedures, application code, and stakeholder workflows.
Partner directly with researchers to design datasets for experimentation, evaluation, and model training.
Create semantic data definitions, lineage documentation, provenance records, and data quality frameworks that enable reproducible research.
Develop point-in-time-correct datasets, feature sets, and evaluation corpora for classical ML and generative AI workloads.
Support advanced AI data preparation techniques including programmatic labeling, weak supervision, synthetic data generation, and research dataset curation.
Serve as a bridge between domain experts, researchers, and engineering teams, turning tacit knowledge into durable data assets.

What makes someone successful in this role:

You enjoy learning new domains and solving ambiguous data problems.
You are comfortable working with incomplete documentation and legacy systems.
You naturally ask "What does this data really mean?" before asking "How do I process it?"
You can translate conversations with clinicians, product experts, and researchers into robust data products.
You create documentation, data definitions, and semantic models that other teams depend on.
You care deeply about data quality, reproducibility, provenance, and research integrity.

Required Skills and Experience

5+ years building production data systems, with at least 2 supporting ML or AI workloads.

Track record of learning complex new data domains quickly, through reading source code, interviewing experts, and building durable artifacts others rely on.

Advanced Python, SQL, and PySpark/Databricks for working with large, messy data. Expert SQL specifically: comfortable reading complex stored procedures and reverse-engineering business logic from queries.

Databricks ecosystem depth: Delta Lake, Unity Catalog, Spark/PySpark tuning, MLflow.

AI domain literacy: working understanding of embeddings, tokenization, feature engineering, point-in-time correctness, train/validation/test splits, data drift, and the differences between what classical ML and generative models need from data.

Data wrangling across modalities: transforming unstructured content (text, PDFs, transcripts, logs) and structured tabular data into clean, model-ready forms.

AI-friendly data formats (Parquet, Hugging Face datasets) and storage layout decisions — partitioning, sharding, caching, that keep researcher workflows responsive in Azure, AWS or other working environments.

Data quality, filtering, and synthesis pipelines: support for programmatic labeling and weak supervision (e.g. Snorkel or equivalent), near-duplicate detection (MinHash/LSH), content and quality filters, LLM-API-driven synthetic data generation.

Pipeline orchestration (e.g. a la Airflow, Databricks Workflows, Dagster, or Prefect) and dataset versioning including Unity Catalog and feature-store support.

Experience handling regulated or sensitive data under controlled access (HIPAA or equivalent). Familiarity with general de-identification concepts.

Git-based version control and CI/CD for data and code.

Strong written documentation. Skill in eliciting requirements and tacit knowledge from technical and non-technical experts.

Bachelor’s degree in computer science, data science, engineering, statistics, or related field. Equivalent practical experience considered.

Preferred:

Hands-on EHR data experience, ideally in skilled nursing, long-term care, post-acute care, or senior living.

Working knowledge of clinical terminologies (ICD-10, SNOMED CT, LOINC) and data standards (HL7v2, FHIR, CCDA).

dbt for transformation and testing.

Familiarity with training-side ML frameworks (e.g. PyTorch) sufficient to debug data-side bottlenecks; experience supporting LLM or foundation-model training or fine-tuning data pipelines.

Clinical NLP, OCR, document parsing, or ASR / transcript pipeline experience.

Data lineage and catalog tools.

Prior experience embedded inside an AI or ML research team.

Master’s degree in a relevant quantitative or computer science field.

What Success Looks Like

AI researchers can start new projects without spending the opening weeks reconstructing what PointClickCare entities mean or rebuilding the same transformations. The gold datasets they need exist, are versioned, are documented, and accelerate work across EDA, experiments, model development, and evaluation. As coverage expands across data types, modalities, and product surfaces, the function grows with it.

#LI-AV1

#LI-remote

PointClickCare Benefits & Perks:

Benefits starting from Day 1!

Retirement Plan Matching

Flexible Paid Time Off

Wellness Support Programs and Resources

Parental & Caregiver Leaves

Fertility & Adoption Support

Continuous Development Support Program

Employee Assistance Program

Allyship and Inclusion Communities

Employee Recognition … and more!

It is the policy of PointClickCare to ensure equal employment opportunity without discrimination or harassment on the basis of race, religion, national origin, status, age, sex, sexual orientation, gender identity or expression, marital or domestic/civil partnership status, disability, veteran status, genetic information, or any other basis protected by law. PointClickCare welcomes and encourages applications from people with disabilities. Accommodations are available upon request for candidates taking part in all aspects of the selection process. Please contact [email protected] should you require any accommodations. As part of our commitment to a streamlined and equitable hiring experience, PointClickCare uses AI tools to assist with candidate screening and assessment.

When you apply for a position, your information is processed and stored with Lever, in accordance with Lever’s Privacy Policy. We use this information to evaluate your candidacy for the posted position. We also store this information, and may use it in relation to future positions to which you apply, or which we believe may be relevant to you given your background. When we have no ongoing legitimate business need to process your information, we will either delete or anonymize it. If you have any questions about how PointClickCare uses or processes your information, or if you would like to ask to access, correct, or delete your information, please contact PointClickCare’s human resources team: [email protected]

PointClickCare is committed to Information Security. By applying to this position, if hired, you commit to following our information security policies and procedures and making every effort to secure confidential and/or sensitive information.

Skills Required

5+ years building production data systems, with at least 2 supporting ML or AI workloads
Proven ability to learn complex data domains via source code review and expert interviews
Advanced Python
Advanced SQL (expert-level, comfortable reverse-engineering stored procedures and business logic)
PySpark and Databricks experience, including Spark/PySpark tuning
Databricks ecosystem: Delta Lake and Unity Catalog
MLflow
AI domain literacy (embeddings, tokenization, feature engineering, point-in-time correctness, data drift)
Data wrangling across modalities (text, PDFs, transcripts, logs, tabular)
Familiarity with AI-friendly data formats and storage layout decisions (Parquet, Hugging Face datasets) and cloud environments (Azure, AWS)
Data quality, filtering, and synthesis pipelines (programmatic labeling/weak supervision such as Snorkel; near-duplicate detection like MinHash/LSH; LLM-API-driven synthetic generation)
Pipeline orchestration experience (Airflow, Databricks Workflows, Dagster, or Prefect)
Dataset versioning, lineage and feature-store support (including Unity Catalog)
Experience handling regulated or sensitive data under controlled access (HIPAA or equivalent) and familiarity with de-identification concepts
Git-based version control and CI/CD for data and code
Strong written documentation and ability to elicit tacit knowledge from technical and non-technical experts
Bachelor's degree in computer science, data science, engineering, statistics, or related field (or equivalent practical experience)
Hands-on EHR data experience in skilled nursing/long-term/post-acute care (preferred)
Working knowledge of clinical terminologies and standards (ICD-10, SNOMED CT, LOINC, HL7v2, FHIR, CCDA) (preferred)
dbt for transformation and testing (preferred)
Familiarity with training-side ML frameworks (e.g., PyTorch) to debug data-side bottlenecks; experience supporting LLM/foundation-model training or fine-tuning pipelines (preferred)
Clinical NLP, OCR, document parsing, or ASR/transcript pipeline experience (preferred)
Experience with data lineage and catalog tools (preferred)
Prior experience embedded inside an AI or ML research team (preferred)
Master's degree in a relevant quantitative or computer science field (preferred)

PointClickCare Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about PointClickCare and has not been reviewed or approved by PointClickCare.

Healthcare Strength — Health and dental coverage appear robust, with wellness and assistance programs reinforcing core medical benefits. Coverage quality stands out relative to other benefit elements.
Leave & Time Off Breadth — PTO and paid holidays are characterized as generous, and flexible work-from-home options are widely available. Occasional extras like summer half‑day Fridays further expand time-off flexibility.
Flexible Benefits — A customizable mix is evident through remote/hybrid arrangements, day-one eligibility, and a lifestyle or personal spending account. Benefits such as wellness credits and support resources can be tailored to individual needs.

Learn more about PointClickCare's Compensation & Benefits →

PointClickCare Insights

What's It Like to Work at PointClickCare? PointClickCare Culture & Values PointClickCare Career Growth & Development What's the Work-Life Balance Like at PointClickCare? PointClickCare Leadership & Management PointClickCare Company Growth, Stability & Outlook

View all jobs at PointClickCare

View PointClickCare Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Toronto

1,557 Employees

Year Founded: 2000

What We Do

PointClickCare is the market leader driving the transformation of healthcare vulnerable and complex populations through a broad, connected care network powered by deep insights with a commitment to value, outcomes and innovation. We connect post-acute and acute care settings, people and systems like no other company. Our steadfast commitment to our culture and to providing growth opportunities to our employees is evidenced by recent recognition of PointClickCare as one of Canada’s best-managed companies and most admired corporate cultures.