Senior Data Scientist

Posted 11 Days Ago
San Francisco, CA, USA
Hybrid
180K-230K Annually
Senior level
Biotech
The Role
The Senior Data Scientist will develop, validate, and operationalize AI models for diagnosing complex diseases, enhance data analysis processes, and establish MLOps infrastructure while leading data-driven initiatives.
Summary Generated by Built In
About Probably Genetic

Probably Genetic is changing the lives of patients living with severe, complex diseases. Our data platform is used by drug developers and patient advocacy groups to develop and launch treatments for these patients. Our technology discovers undiagnosed patients online, analyzes their disease state using machine learning and at-home testing, and enables compliant communication with patients. In doing so, we help patients access diagnoses, clinical trials, and treatments as early as possible.

We are a tight-knit group of hard-working, ambitious problem solvers united by a mission greater than ourselves. We do well by doing right by patients. We are developing some of the most cutting-edge solutions in healthcare, and our roadmap is packed with innovations in bioinformatics, AI, and drug development. We have built a lean, all-star team to help us bring our vision to life, and we want you to be a part of it.

Probably Genetic has raised multiple rounds of funding from Silicon Valley’s best investors, including Threshold, Khosla, and Y Combinator, and offer competitive salaries, comprehensive benefits, and meaningful early stage equity.

About the role

We are looking for a Senior Data Scientist who will own some of the most consequential diagnostic AI in rare disease: building, validating, and operationalizing the models that help us find and diagnose patients who have never had a name for their disease, powering the analytical rigor behind our testing programs, and shaping how we use data to make smarter product decisions.

What you will do
  • Own the end-to-end development, validation, and operationalization of PG's predictive diagnostic AI models — from feature engineering through production deployment – that power program eligibility decisions and clinical decisions for patients

  • Run prospective testing experiments: apply diagnostic models to undiagnosed patients, coordinate testing, and track outcomes to continuously improve model performance

  • Build and maintain PG's synthetic patient data pipeline, a critical deliverable for our research programs, and key input to our own model development lifecycle

  • Optimize our patient intake experience using NLP and multimodal data analysis to determine which questions to ask, in what order, to maximize data quality and conversion

  • Own API usage and cost optimization across PG's AI stack, including prompt engineering, model evaluation, and ongoing performance monitoring

  • Conduct ad hoc strategic analyses that inform product prioritization, causality assessment, and generate customer-facing program insights

  • Establish MLOps infrastructure: model monitoring, drift detection, API observability, and lightweight but durable operational processes

  • Have the freedom to conduct blue sky research initiatives aimed at creating value from our data

  • Work with Data Engineering to build a robust, scalable data foundation that supports all of the above

Who you are

We are looking for a few specific things that will help you succeed in this role:

  • 7+ years of experience in data science, machine learning engineering, or a closely related field

  • Strong Python proficiency and fluency across the core data science stack: pandas, NumPy, scikit-learn, PySpark, and SQL

  • Demonstrated end-to-end ML experience: you have taken models from problem definition through feature engineering, validation, deployment, and monitoring in a production environment

  • Experience with NLP techniques and applying language models to real-world problems

  • Comfort with prompt engineering and evaluating external AI API performance (e.g., OpenAI)

  • A track record of operating with high ownership in lean, fast-moving environments where you have had to build structure as much as execute within it

  • Strong analytical communication skills — you can translate complex model outputs and data findings into clear, actionable narratives for technical and non-technical audiences alike

Some things that are not required, but you will learn on the job:

  • Experience with Databricks or similar lakehouse/ML platform environments

  • Familiarity with synthetic data generation techniques

  • Domain knowledge in healthcare, rare disease, genomics, or clinical research

  • Experience with MLOps tooling and building observability infrastructure from scratch

  • Exposure to biopharma or insurance analytics use cases

As with all new hires at Probably Genetic, you will also need to be:

  • A good person. We work with some of the most marginalized populations on the planet and empathy is key

  • Patient-focused and motivated to have a lasting, positive impact on humanity

  • Comfortable in a fast-paced, often ambiguous environment with rapid change

  • Action-oriented and excited to build a company from the ground up


The salary range for this role is $180,000-$230,000 annually. Actual compensation offered will depend on several factors including but not limited to: work experience, education, skill level, and/or other business and organizational needs.

What we offer at Probably Genetic:
  • An engaging and supportive team all on a mission to improve lives

  • Fair and equitable compensation with competitive early-stage equity grants

  • Generous Flexible Time off policy, that we actually use

  • Parental Leave Benefits (12 weeks for both birthing and non-birthing)

  • Hybrid, flexible work with high-trust and autonomy

  • A bright, inviting, pet-friendly office in Downtown SF near transit

  • A “work from anywhere” policy, up to 4 weeks a year

  • Regular team retreats in exciting destinations

  • Health Benefits including medical, dental, vision, therapy, FSA, and 401k

  • And so much more!

Probably Genetic is committed to fostering a welcoming and inclusive work environment for people of all genders, sexuality, ethnicity, socioeconomic background and life experiences. We urge candidates of all backgrounds to apply. If you require specific accommodations as you interview or consider working with us, please let us know.

Skills Required

  • 7+ years of experience in data science or machine learning engineering
  • Strong Python proficiency and fluency in data science stack
  • End-to-end ML experience from problem definition to monitoring
  • Experience with NLP techniques
  • Understanding of prompt engineering and external API evaluation
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
29 Employees

What We Do

Probably Genetic helps undiagnosed rare disease patients find answers to their symptoms in a matter of weeks. There are over 400 million people worldwide that have a rare disease — more than cancer and HIV patients combined. Half of those patients are currently undiagnosed and half of them are children, and it takes 5-7 years on average for these patients to get a diagnosis. With our system, patients can get answers in a matter of weeks. Our online Symptom Checker identifies rare disease patients using state-of-the-art machine learning models, gets them tested through our direct-to-consumer genetic testing service, and helps connect these patients to potentially life-saving treatments and advocacy communities. We partner with drug developers to offer sponsored testing programs that allow patients to access genetic testing for little to no cost.

Similar Jobs

CrowdStrike Logo CrowdStrike

Senior Data Scientist

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Hybrid
Sunnyvale, CA, USA
10000 Employees
140K-215K Annually

Chime Logo Chime

Senior Data Scientist

Fintech • Machine Learning • Mobile • Security • Software
Easy Apply
Hybrid
San Francisco, CA, USA
1500 Employees
133K-185K Annually

Chime Logo Chime

Senior Data Scientist

Fintech • Machine Learning • Mobile • Security • Software
Easy Apply
Hybrid
San Francisco, CA, USA
1500 Employees
133K-185K Annually

Chime Logo Chime

Senior Data Scientist

Fintech • Machine Learning • Mobile • Security • Software
Easy Apply
Hybrid
San Francisco, CA, USA
1500 Employees
133K-185K Annually

Similar Companies Hiring

Formation Bio Thumbnail
Artificial Intelligence • Big Data • Healthtech • Biotech • Pharmaceutical
New York, NY
150 Employees
SOPHiA GENETICS Thumbnail
Software • Healthtech • Biotech • Big Data • Artificial Intelligence
Boston, MA
450 Employees
Pfizer Thumbnail
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
New York, NY
121990 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account