Software Engineer,Science Job Redwood City area,California USA,IT/Tech

Position: Staff Software Engineer, Science

Biohub is leading the new era of AI-powered biology to cure or prevent disease through its 501c3 medical research organization, with the support of the Chan Zuckerberg Initiative.

The Team

Biohub supports the science and technology that will make it possible to help scientists cure, prevent, or manage all diseases by the end of this century. While this may seem like an audacious goal, in the last 100 years, biomedical science has made tremendous strides in understanding biological systems, advancing human health, and treating disease.

Achieving our mission will only be possible if scientists are able to better understand human biology. To that end, we have identified four grand challenges that will unlock the mysteries of the cell and how cells interact within systems — paving the way for new discoveries that will change medicine in the decades that follow:

Building an AI-based virtual cell model to predict and understand cellular behavior
Developing novel imaging technologies to map, measure and model complex biological systems
Creating new tools for sensing and directly measuring inflammation within tissues in real time.tissues to better understand inflammation, a key driver of many diseases
Harnessing the immune system for early detection, prevention, and treatment of disease

The Opportunity

The Data Pipelines team processes scientific datasets specifically designed to enable biological modeling and supporting AI research. It is responsible for data ETL, data validation, testing, storage, and partners with the data management team for retrieval. We handle over 89 million unique cells worth of single cell transcriptomic data, over 15 thousand cryo

ET tomograms that are in imaging datasets as large as 20TB and counting, and will be expanding to support larger scale and additional imaging, sequencing, and literature modalities. Our resources provide access to open source data that is structured and used by tens of thousands of scientists each month to quickly query and form hypotheses on understanding how genetic variants in cells impact disease risk, define drug toxicities, and eventually discover better therapies.

As a software engineer on the Data Engineering team, you will contribute for architecture, help implement all the above mentioned data needs for our platforms, CELLxGENE Discover, CryoET, as well as the new platform we are building that has a focus on data for AI and the virtual cell, in order to enable scientists to further interrogate our very large and growing corpus of data without any need to download the data itself or have any computational expertise.

You will work on a collaborative, multidisciplinary team to develop solutions for our scientist users to accelerate their workflows and accelerate the pace of scientific discovery.

No prior biology experience is needed for this role. You will have the opportunity to pair with Computational Biologists to develop solutions for our users and be able to learn about biology from experts on our team.

Our tech stack:
Python, Terraform, AWS infrastructure, Argo CD and Workflows.

TileDB .

What You’ll Do

Own, maintain and continuously improve upon the data pipeline architecture.
Design, build, and maintain robust, scalable data pipelines for ingesting, processing, and storing large volumes of structured and unstructured data.
Develop and optimize ETL processes, ensuring data quality, validation, and consistency across diverse sources.
Implement and manage data storage solutions, including data warehouses, data lakes, and distributed databases, ensuring secure and performant to handle massive volumes of single-cell transcriptomics data and imaging data.
Monitor and troubleshoot data pipelines, build proactive exception handling, and ensure high reliability and uptime of production systems.
Document processes, maintain data models, and support data governance, lineage, and compliance initiatives.
Utilize modern tools and technologies, such as Argo Workflows, Kubernetes, AWS, Docker, and CI/CD pipelines.
Actively contribute to team problem-solving, project planning, and process improvements with a mindset for innovation and social impact.
Create user-friendly…


Increase/decrease your Search Radius (miles)



Job Posting Language