Data Engineer - Foundational Microscopy Data

Posted 5 Days Ago
Campus, IL
In-Office
86K-183K Annually
Mid level
Biotech
The Role
Build scalable pipelines and infrastructure to collect, validate, curate, and publish large multimodal 3D/4D microscopy datasets; collaborate with labs and engineers, document provenance, apply statistical analysis and visualization, and support ML research and open-data access.
Summary Generated by Built In
Primary Work Address: 19700 Helix Drive, Ashburn, VA, 20147

Current HHMI Employees, click here to apply via your Workday account.

TLDR: Build the data backbone for the next era of AI-powered spatial biology.

Please include a cover letter with your application detailing your qualifications and experience for this position. Describe a deep learning project you have executed. Projects in computer vision for microscopy image analysis are especially relevant. Include a link to a code repository if possible. If you contributed to a joint project, please describe your specific contributions. Briefly discuss the project's results, limitations, and challenges you encountered. Finally, include a link to your GitHub profile, personal website, or similar and/ or any relevant projects at the bottom of your cover letter.

About the Role:

AI@HHMI: HHMI is investing $500 million over the next 10 years to support AI-driven projects and to embed AI systems throughout every stage of the scientific process in labs across HHMI. The Foundational Microscopy Image Analysis (MIA) project sits at the heart of AI@HHMI. Our ambition is big: to create one of the world’s most comprehensive, multimodal 3D/4D microscopy datasets and use it to power a vision foundation model capable of accelerating discovery across the life sciences.

We're seeking a skilled Data Engineer to drive scientific innovation through robust data infrastructure. You'll build a large-scale foundational microscopy image dataset and develop scalable data processing pipelines. This includes collaborating with internal and external partners on data sharing and writing production-quality Python code to parse, validate, and transform microscopy image data from published research papers, public databases, and internal repositories.

This role requires technical excellence in data engineering and the ability to communicate clearly and proactively with collaborators who contribute multimodal microscopy data to the project. Your work will directly support computational research initiatives, including machine learning and AI applications.

Working closely with multidisciplinary teams of computational and experimental scientists, you'll help define and implement best practices in data engineering—ensuring data quality, accessibility, and reproducibility. You'll maintain detailed documentation, potentially mentor junior engineers, and automate workflows to streamline the path from raw data to scientific insight.

What we provide:

  • A competitive compensation package, with comprehensive health and welfare benefits.

  • A supportive team environment that promotes collaboration and knowledge sharing.

  • The opportunity to engage with world-class researchers, software engineers and AI/ML experts, contribute to impactful science, and be part of a dynamic community committed to advancing humanity’s understanding of fundamental scientific questions.

  • Amenities that enhance work-life balance such as on-site childcare, free gyms, available on-campus housing, social and dining spaces, and convenient shuttle bus service to Janelia from the Washington D.C. metro area.

  • Opportunity to partner with frontier AI labs on scientific applications of AI (see https://www.anthropic.com/news/anthropic-partners-with-allen-institute-and-howard-hughes-medical-institute).

What you’ll do:

  • Use AI coding agents to develop ad-hoc APIs to mine diverse microscopy datasets from public and internal sources.

  • Work with internal and external experimental labs to collect large multi-modal microscopy image datasets.

  • Collect and curate multi-modal foundational datasets for 3D and 4D microscopy data and other modalities.

  • Continuously asses quality and assure correctness of the aggregated data.

  • Collaborate closely with experimental scientists and shared resources teams to develop efficient annotation and metadata workflows.

  • Design and implement scalable, robust data pipelines for microscopy data using workflow managers that perform data validation and quality control at every pipeline stage through tests and clear data visualization.

  • Stay up to date with scientific literature to understand data context and processing requirements.

  • Document data provenance and transformation steps comprehensively.

  • Apply statistical tools and programming languages (e.g., Python, R) to analyze large datasets, develop custom functions, and extract actionable insights through effective visualization.

  • Establish and maintain data standards, formats, workflows, and documentation to ensure data quality, accessibility, and reproducibility across projects.

  • Make foundational microscopy dataset accessible to collaborators and the public as open-data and open source services and act as a point of contact for engineers/researcher who would like to use the dataset.

  • Collaborate with interdisciplinary teams, potentially mentor junior engineers, and direct or assist in directing the work of others to meet project goals while advising stakeholders on data strategies and best practices.

What you bring:

  • Bachelor’s degree in Computer Science, Data Science, Statistics, Applied Mathematics, or a related field with 3+ years of experience applying and customizing data mining and data analysis methods and techniques. An equivalent combination of education and relevant experience will be considered.

  • Experience with data formats such as Zarr, Parquet, and HDF5 and efficient IO (e.g., webdataset).

  • Experience with volumetric 3D/4D microscopy data analysis tools.

  • Experience with high performance compute environments (cloud-based and slurm/lsf clusters).

  • Clear, proactive, and efficient communication style to manage multiple needs and stakeholders involved in the creation of our foundational microscopy dataset.

  • Excellent technical documentation and communication skills.

  • Expertise in utilizing data visualization libraries and software (e.g., Matplotlib, R, Jupyter notebooks).

  • Detail-oriented, creative, and organized team player with strong communication skills and a collaborative mindset.

  • Able to effectively manage time, prioritize tasks, and clearly convey complex data concepts to technical and non-technical audiences.

Physical Requirements:

Remaining in a normal seated or standing position for extended periods of time; reaching and grasping by extending hand(s) or arm(s); dexterity to manipulate objects with fingers, for example using a keyboard; communication skills using the spoken word; ability to see and hear within normal parameters; ability to move about workspace. The position requires mobility, including the ability to move materials weighing up to several pounds (such as a laptop computer or tablet).

Persons with disabilities may be able to perform the essential duties of this position with reasonable accommodation. Requests for reasonable accommodation will be evaluated on an individual basis.

Please Note:

This job description sets forth the job’s principal duties, responsibilities, and requirements; it should not be construed as an exhaustive statement, however.  Unless they begin with the word “may,” the Essential Duties and Responsibilities described above are “essential functions” of the job, as defined by the Americans with Disabilities Act.

Compensation Range

Data Engineer I: $86,181.60 (minimum) - $107,727.00 (midpoint) - $140,045.10 (maximum)

Data Engineer II: $98,039.20 (minimum) - $122,549.00 (midpoint) - $159,313.70 (maximum)

Data Engineer III: $112,629.60 (minimum) - $140,787.00 (midpoint) - $183,023.10 (maximum)

Pay Type: Salary

HHMI’s salary structure is developed based on relevant job market data. HHMI considers a candidate's education, previous experiences, knowledge, skills and abilities, as well as internal consistency when making job offers. Typically, a new hire for this position in this location is compensated between the minimum and the midpoint of the salary range.

#LI-BG1

Compensation and Benefits

Our employees are compensated from a total rewards perspective in many ways for their contributions to our mission, including competitive pay, exceptional health benefits, retirement plans, time off, and a range of recognition and wellness programs. Visit our Benefits at HHMI site to learn more. 

HHMI is an Equal Opportunity Employer

We use E-Verify to confirm the identity and employment eligibility of all new hires.

Top Skills

Ai Coding Agents
Cloud Hpc
Hdf5
Jupyter
Lsf
Matplotlib
Parquet
Python
R
Slurm
Webdataset
Workflow Managers
Zarr
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Chevy Chase, MD
1,800 Employees
Year Founded: 1953

What We Do

For 60 years, HHMI has been moving science forward. We’re an independent, ever-evolving philanthropy that supports basic biomedical scientists and science educators with the potential for transformative impact. We invest in people, not projects. We encourage collaborative and results-driven working styles and offer an adaptable environment where employees can function at their highest level. As HHMI scientists continue to push boundaries in laboratories and classrooms, you can be sure that your contributions while working at HHMI are making a difference.

To move science forward, we need experts in areas such as communications, finance, human resources, information technology, investments, and law as well as scientists.

Visit our website at http://www.hhmi.org

Similar Jobs

Citizens Logo Citizens

Wealth Advisor - Seaford/Smyrna, DE

Digital Media • Fintech • Information Technology • Machine Learning • Financial Services • Cybersecurity • Automation
In-Office or Remote
2 Locations
17000 Employees
105K-250K Annually

IMC Trading Logo IMC Trading

Chief Of Staff

Fintech • Machine Learning • Software • Financial Services
Hybrid
Chicago, IL, USA
1954 Employees
150K-200K Annually

IMC Trading Logo IMC Trading

Quantitative Researcher

Fintech • Machine Learning • Software • Financial Services
Hybrid
Chicago, IL, USA
1954 Employees
200K-275K Annually

IMC Trading Logo IMC Trading

Machine Learning Research Lead

Fintech • Machine Learning • Software • Financial Services
Hybrid
Chicago, IL, USA
1954 Employees
250K-300K Annually

Similar Companies Hiring

Formation Bio Thumbnail
Pharmaceutical • Healthtech • Biotech • Big Data • Artificial Intelligence
New York, NY
140 Employees
SOPHiA GENETICS Thumbnail
Software • Healthtech • Biotech • Big Data • Artificial Intelligence
Boston, MA
450 Employees
Pfizer Thumbnail
Pharmaceutical • Natural Language Processing • Machine Learning • Healthtech • Biotech • Artificial Intelligence
New York, NY
121990 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account