Biostatistics Data Engineer
Biostatistics Data Engineer
SCHEDULE: Mon - Fri, traditional MST hours
Territory: Remote in US
Invitae is dedicated to bringing comprehensive genetic information into mainstream medicine to improve healthcare for billions of people! Our team is driven to make a difference for the patients we serve. We are leading the transformation of the genetics industry, by making clinical-grade genetic information affordable and accessible to guide health decisions across all stages of life.
This position is located within the biostatistics group. This group performs statistical analyses to support submissions of new NGS cancer assays to regulatory bodies for approval. This data engineering position will support the biostatistics team by helping to organize and automate data flowing from research and development teams to biostatisticians, and will collaborate with engineers/scientists in other groups and with IT to develop and support these workflows.
What you’ll do:
- Convert and maintain existing statistical R or python code into standardized packages
- Automate biostatistics code and processes via wrapper scripts or other technologies such as snakemaker
- Test and maintain R studio and other biostatistics related enterprise-wide tools
- Oversee and streamline internal biostatistics code repositories/AWS buckets in collaboration with AWS administrators, biostatisticians, data analysts, software engineers, and other stakeholders
- Perform basic statistical analyses, data exploration/aggregation, and data visualization
- Perform data curation, aggregation, and importing into S3 or RDMS as needed
- Act as a liaison between the biostatistics group and system administrators especially as related to AWS technologies and services
- Act as a liaison between the biostatistics group with database administrators and database designers, including collaborating on the design of efficient biostatistics specific SQL queries, materialized views, and data tables
What you bring:
- BS with 5 years of experience, or MS with 2 years of experience, in one of the following fields: Computer Science, Bioinformatics, Statistics, Math, or Physics (or equivalent)
- Proficiency with all of the following are required:
- R
- Python
- Bash/Shell scripts
- SQL, including programmatically accessing RDMS / Files such as Postgres, Snowflake and AWS s3 buckets either directly or via APIs
- Familiarity with at least one of the following is required:
- AWS services such as EC, S3, Athena, GLUE, Lambda function, and/or SAGEMAKER and RDMS database tools/systems such as Postgres and Snowflake, and BI tools such as Tableau
- Familiarity with docker or similar technologies
- Familiarity with ETL and data warehousing principles as pertains to analytics and/or BI reporting
Preferred (not required) skills:
- Familiarity with life sciences, especially genomics/biomarkers, and/or bioinformatics as applied in NGS
Please apply even if you don’t meet all of the “What you bring” requirements noted. It’s rare that someone checks every single item, it’s ok, we encourage you to apply anyways.
Join us!
At Invitae, we value diversity and provide equal employment opportunities (EEO) to all employees and applicants without regard to race, color, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance.
#LI-Remote