×
Register Here to Apply for Jobs or Post Jobs. X

HPC Sr. Scientific Software Engineer; IT@JH Research Computing

Job in Baltimore, Anne Arundel County, Maryland, 21276, USA
Listing for: Johns Hopkins University
Full Time position
Listed on 2026-02-07
Job specializations:
  • IT/Tech
    AI Engineer, Cloud Computing
Job Description & How to Apply Below
Position: HPC Sr. Scientific Software Engineer (IT@JH Research Computing)

Specific Duties & Responsibilities

  • Software Deployment and Design
    • Develop and refine deployment strategies for scientific software on HPC and AI systems.
    • Design computational workflows, selecting optimal software configurations, and utilizing tools like Ansible for automation.
    • Assist teams in implementing, tuning, and optimizing AI models and gateway applications (e.g., XDMoD, Coldfront, Open OnDemand, CryoSPARC Live, SBGrid, AI Agents).
  • Performance Optimization
    • Analyze and optimize the performance of AI models and HPC applications, focusing on GPU-enabled computing.
    • Implement parallel processing, distributed computing, and resource management techniques for efficient job execution.
  • Integration and Optimization
    • Develop, debug, and maintain software tools, libraries, and frameworks supporting HPC and AI workloads.
    • Collaborate with the system team and software vendors (e.g., NVIDIA, Intel, Matlab) to optimize systems for maximum performance.
    • Utilize CUDA, DNN, Tensor

      RT, and Intel Compilers to enhance system performance.
  • HPC Scientific Software Support
    • Manage and support scientific software deployment across HPC, cloud-based, and colocation facilities.
    • Oversee installation, configuration, and maintenance of HPC packages with tools like CMake, Make, Easy Build, Spack, and Lua module files.
  • Collaboration and Mentorship
    • Work closely with cross-functional teams, including researchers, data scientists, and software developers, to address complex HPC/AI challenges.
    • Mentor junior engineers and foster a culture of continuous learning.
  • Technical Support and Training Workshops and Troubleshooting
    • Resolve complex technical issues and perform root cause analysis for HPC/AI software challenges.
    • Implement effective solutions to prevent recurrence and improve system reliability.
    • Provide training workshops for researchers and students, focusing on troubleshooting, optimizing workflows, and effectively using HPC systems.
  • Learning and Development
    • Stay current with advances in HPC and AI technologies and methodologies.
    • Incorporate new research findings into existing systems to improve performance and capabilities.
  • Container Orchestration
    • Develop and manage container orchestration strategies to ensure scalability, reliability, and security of applications.
    • Oversee the container lifecycle from creation and deployment to scaling and removal.
  • Documentation and Compliance
    • Create comprehensive documentation for system designs, performance metrics, and project status.
    • Ensure compliance with security and regulatory standards for all HPC and AI systems.
  • In Addition to the Duties Described Above
    • Design, deploy, and maintain large-scale Linux HPC clusters with CPU/GPU resources, high-speed networks, and distributed storage.
    • Develop and maintain automation frameworks for provisioning, monitoring, and software lifecycle management.
    • Implement and optimize job scheduling, container orchestration, and workflow automation tools to support diverse research workloads.
    • Collaborate with faculty and research teams to parallelize, containerize, and scale computational workflows for multi-GPU and distributed environments.
    • Benchmark and tune application performance across architectures, documenting findings and sharing best practices.
    • Integrate and support AI/ML frameworks, scientific libraries, and workflow engines (Snakemake, Nextflow, Dask, Ray).
    • Ensure system and application reliability through proactive monitoring (Prometheus, Grafana, ELK) and incident response participation.
    • Support reproducibility and FAIR data principles through version-controlled, containerized environments.
    • Contribute to documentation, training materials, and technical guidance to enhance user experience and self-service capabilities.
    • Participate in evaluation and adoption of new technologies to advance performance, efficiency, and sustainability in research computing.
Minimum Qualifications
  • PhD in a quantitative discipline.
  • Five years of experience in HPC user support, software deployment, and performance optimization within an academic or research environment.
  • Additional education may substitute for required experience and additional related experience may substitute for required education beyond a…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary