Bioinformatics Software Engineer
Listed on 2026-02-28
-
IT/Tech
Data Scientist, Data Engineer, Cloud Computing, Systems Engineer
Company Description
By working at Harvard University, you join a vibrant community that advances Harvard's world-changing mission in meaningful ways, inspires innovation and collaboration, and builds skills and expertise. We are dedicated to creating a diverse and welcoming environment where everyone can thrive.
Why join Harvard Medical School?
Harvard Medical School's mission is to nurture a diverse, inclusive community dedicated to alleviating suffering and improving health and well-being for all through excellence in teaching and learning, discovery and scholarship, and service and leadership.
You’ll be at the heart of biomedical discovery, education, and innovation, working alongside world-renowned faculty and a community dedicated to improving human health. This is more than a job - it’s an opportunity to shape the future of medicine.
Job DescriptionJob Summary:
Participate in the design of software that supports and enriches research productivity and reliability; implement software solutions. Develop software and data services with researchers to ensure that modern standards of reproducible code are kept.
Job-Specific Responsibilities:
We are looking for a highly skilled Bioinformatics Software Engineer who specializes in designing, developing, deploying, and maintaining scalable bioinformatics pipelines on cloud-based infrastructure. The candidate will be responsible for the code base supporting the large-scale genomic processing and analysis pipelines at the SMaHT Data Analysis Center that manages multi-omic data (e.g., Illumina/Pac Bio/ONT Whole Genome Sequencing (WGS), RNA-Seq). The ideal candidate will have a deep understanding of next-generation sequencing (NGS) data analysis, workflow automation, cloud computing, and cloud software engineering best practices.
This role will support research and production environments where reproducibility, scalability, and performance are critical.
- Design, implement, and maintain bioinformatics pipelines for high-throughput sequencing data (e.g., alignment, QC, variant calling from WGS and RNA‑seq) similar to those in existing repositories:
- Build reproducible, well‑tested, and automated workflows using workflow management systems (particularly CWL).
- Architect and manage AWS‑based compute infrastructure to support pipeline execution, including automated deployment, scaling, and monitoring.
- Containerize workflows using Docker or similar tools for managed execution and portability.
- Integrate CI/CD tooling to automate testing, deployment, and version control to ensure data integrity and correct execution of the pipeline.
- Develop utility tools for metadata management, file integrity checks or conversion (e.g., VCF, BAM to CRAM), and integration with the SMaHT Data Portal.
- Collaborate cross‑functionally with research scientists, engineers, and IT teams to refine requirements and deliver high‑quality solutions.
- Document code, workflows, and infrastructure configurations clearly.
Basic Qualifications:
- Minimum of five years’ post‑secondary education or relevant work experience.
Additional
Qualifications and Skills:
- PhD in computational biology/bioinformatics/statistics/CS or another quantitative field is strongly preferred.
- Superb programming skills, especially in Python and shell scripting, and communication skills are strongly preferred.
- Extensive experience with analysis of high‑throughput sequencing data and knowledge of bioinformatics tools for sequence alignment, variant calling, sequence data QC, etc.
- Proficiency in Docker for creating a reproducible execution environment and Workflow Description Language for orchestrating complex tasks.
- Strong understanding of AWS services (EC2, S3) or similar cloud platforms for compute and storage.
- Version Control & CI/CD:
Git, automated testing, deployment workflows. - Experience with Linux systems, HPC, and distributed computing environments.
- Knowledge of optimizing pipelines for large‑scale genomic projects.
- Appointment End Date:
This is a one‑year term position from the date of hire, with the possibility of extension, contingent upon work performance and continued funding to…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).