DevOps Software Developer; KBase Team
Listed on 2026-01-03
-
IT/Tech
Cloud Computing, Systems Engineer
Berkeley Lab’s ( LBNL ) Environmental Genomics and Systems Biology ( EGSB ) Division is looking for a Dev Ops Software Engineer to join the US Department of Energy’s ( DOE ) Systems Biology Knowledgebase ( KBase ) team!
In this exciting role, you will contribute directly to an open-source platform that is transforming how biologists and data scientists collaborate, share data, and accelerate discovery.
KBase integrates massive biological datasets and powerful computational tools into a unified, extensible system that supports transparent, reproducible science. This position will contribute to the core infrastructure of the system, supporting the operation and evolution of an advanced platform that integrates cloud-native software, on-premise hardware, and high-performance computing hosted in National Lab data centers. This role is responsible for proactively identifying and resolving complex issues to ensure the platform's stability, performance, and scalability.
This position has an anticipated start date of February 2, 2026.
We’re here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes!We invest in our employees by offering a total rewards package you can count on:
- Exceptional health and retirement benefits , including pension or 401K-style plans
- A culture where you’ll belong - we are invested in our teams!
- In addition to accruing vacation and sick time, we also have an annual Winter Holiday Shutdown
- Parental bonding leave (for both mothers and fathers)
- Develop and implement automation to deploy, configure, and support on-premise compute resources and services (e.g., databases, microservices, LLMs, monitoring systems, object storage like Minio, and High - Performance Computing (HPC)).
- Design, implement, and support robust monitoring, alerting, and logging solutions for infrastructure and platform services.
- Ensure the security, reliability, and performance of KBase's on-premise hardware and software stack by documenting, hardening, and continuously improving its security posture in adherence with National Lab and DOE security standards.
- Develop and maintain comprehensive documentation for infrastructure designs, configurations, and operational procedures.
- Implement Dev Sec Ops pipelines, best practices, and security scanning (SCA/SAST) for infrastructure and software components.
- A Bachelor’s Degree (or equivalent knowledge/training) in Computer Science, Engineering, or a related field and a minimum of 5 years of relevant experience as a Software Infrastructure Engineer, Dev Ops Engineer, Site Reliability Engineer (SRE), or similar role or an equivalent combination of education and experience.
- Experience with infrastructure as code (laC) tools (e.g., Terraform, Ansible), containerization technologies (e.g., Docker), and container orchestration platforms (e.g., Kubernetes).
- Experience with containerization (Docker) and Kubernetes orchestration, including Helm, operators, and resource management for data-intensive workloads.
- Experience with version control systems (e.g., Git), CI/CD pipelines, monitoring, and observability tools (e.g., Prometheus, Grafana, ELK stack or similar).
- Experience with the deployment and management of relational and/or No
SQL databases. - Expert‑level knowledge of Linux operating systems, system administration, and proficiency in scripting languages (e.g., Python, Bash, Go).
- Proficiency in Python, with the ability to write modular, production‑ready software and integrate it into cloud‑native workflows.
- Demonstrated understanding of core Dev Ops, software engineering principles for on‑premise distributed systems, microservices, and HPC architectures.
- Familiarity with object storage systems such as MinIO or AWS S3 and understanding of data lifecycle management in distributed storage.
- Familiarity with Apache Spark (PySpark, Spark
SQL, or Structured Streaming) and distributed data processing frameworks. - Excellent oral and written communication skills including…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).