Data Engineer - Foundational Microscopy Data
Listed on 2026-02-21
-
IT/Tech
Data Scientist, AI Engineer, Machine Learning/ ML Engineer, Data Engineer
TLDR:
Build the data backbone for the next era of AI-powered spatial biology.
Please include a cover letter with your application detailing your qualifications and experience for this position. Describe a deep learning project you have executed. Projects in computer vision for microscopy image analysis are especially relevant. Include a link to a code repository if possible. If you contributed to a joint project, please describe your specific contributions. Briefly discuss the project's results, limitations, and challenges you encountered.
Finally, include a link to your Git Hub profile, personal website, or similar and/ or any relevant projects at the bottom of your cover letter.
AI@HHMI: HHMI is investing $500 million over the next 10 years to support AI-driven projects and to embed AI systems throughout every stage of the scientific process in labs across HHMI. The Foundational Microscopy Image Analysis (MIA) project sits at the heart of AI ambition is big: to create one of the world’s most comprehensive, multimodal 3D/4D microscopy datasets and use it to power a vision foundation model capable of accelerating discovery across the life sciences.
We're seeking a skilled Data Engineer to drive scientific innovation through robust data infrastructure. You'll build a large-scale foundational microscopy image dataset and develop scalable data processing pipelines. This includes collaborating with internal and external partners on data sharing and writing production-quality Python code to parse, validate, and transform microscopy image data from published research papers, public databases, and internal repositories.
This role requires technical excellence in data engineering and the ability to communicate clearly and proactively with collaborators who contribute multimodal microscopy data to the project. Your work will directly support computational research initiatives, including machine learning and AI applications.
Working closely with multidisciplinary teams of computational and experimental scientists, you'll help define and implement best practices in data engineering—ensuring data quality, accessibility, and reproducibility. You'll maintain detailed documentation, potentially mentor junior engineers, and automate workflows to streamline the path from raw data to scientific insight.
What we provide:- A competitive compensation package, with comprehensive health and welfare benefits.
- A supportive team environment that promotes collaboration and knowledge sharing.
- The opportunity to engage with world‑class researchers, software engineers and AI/ML experts, contribute to impactful science, and be part of a dynamic community committed to advancing humanity’s understanding of fundamental scientific questions.
- Amenities that enhance work‑life balance such as on‑site childcare, free gyms, available on‑campus housing, social and dining spaces, and convenient shuttle bus service to Janelia from the Washington D.C. metro area.
- Opportunity to partner with frontier AI labs on scientific applications of AI (see ).
- Use AI coding agents to develop ad‑hoc APIs to mine diverse microscopy datasets from public and internal sources.
- Work with internal and external experimental labs to collect large multi‑modal microscopy image datasets.
- Collect and curate multi‑modal foundational datasets for 3D and 4D microscopy data and other modalities.
- Continuously assess quality and assure correctness of the aggregated data.
- Collaborate closely with experimental scientists and shared resources teams to develop efficient annotation and metadata workflows.
- Design and implement scalable, robust data pipelines for microscopy data using workflow managers that perform data validation and quality control at every pipeline stage through tests and clear data visualization.
- Stay up to date with scientific literature to understand data context and processing requirements.
- Document data provenance and transformation steps comprehensively.
- Apply statistical tools and programming languages (e.g., Python, R) to analyze large datasets, develop custom functions, and extract actionable insights through effective…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).