×
Register Here to Apply for Jobs or Post Jobs. X

Research Scientist, HPC Workflows

Job in Oak Ridge, Anderson County, Tennessee, 37830, USA
Listing for: Oak Ridge National Laboratory
Full Time position
Listed on 2026-03-12
Job specializations:
  • IT/Tech
    Cybersecurity, Cloud Computing
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Requisition Id16060

Overview

Oak Ridge National Laboratory (ORNL), home to some of the world’s most powerful supercomputers, is seeking a Research Scientist in HPC Workflows to design, orchestrate, and maintain computational workflows that enable reproducible, scalable science on leadership-class systems. You will collaborate with researchers across diverse domains to translate scientific objectives into robust pipelines, automate job orchestration and data movement, and optimize end-to-end workflow performance on large-scale Linux-based HPC environments and hybrid cloud/HPC platforms.

Job Duties and Responsibilities May Include
  • Support research on HPC workflows to support the mission of the National Center for Computational Sciences.
  • Workflow Design and Orchestration:
    Architect, implement, and maintain HPC workflows and pipelines that leverage job schedulers (e.g., SLURM, PBS) with job dependencies, arrays, and resource‑aware templates. Establish reproducible execution patterns, including environment setup, module management, data staging, and cleanup.
  • Scripting and Tooling:
    Develop command‑line tools and automation in Python, Bash, and/or C/C++ to encapsulate workflow steps, manage configuration files (e.g., YAML/JSON), and implement robust logging, error handling, and checkpoint/retry strategies.
  • Operational Reliability and Optimization:
    Diagnose job failures, mitigate bottlenecks, and improve throughput, latency, and resource utilization. Use scheduler and Linux tools (e.g., sacct, squeue, coreutils, ssh, tmux, top, iostat) to monitor, analyze, and tune workflows.
  • Automation and Version Control:
    Implement CI/CD practices for workflow deployment, create templates and reusable libraries, and manage changes with Git. Automate environment provisioning and repeatable execution across systems and users.
  • Collaboration and User Enablement:
    Consult with researchers to understand requirements, translate them into executable workflows, and provide documentation, training, and examples. Partner with operations teams to align workflows with policies and best practices.
  • Observability and Reporting:
    Build simple status dashboards or reports for workflow health and progress. Aggregate job metrics, queue statistics, and resource usage to inform planning and continuous improvement.
  • Security and Compliance:
    Apply basic cyber‑security principles (e.g., SSH key hygiene, least privilege, firewall rules) to workflow design and operations. Handle credentials and secrets responsibly.
  • Documentation and Support:
    Author clear, user‑focused documentation and contribute to playbooks, runbooks, and knowledge bases. Participate in an on‑call rotation for critical workflows as needed.
  • Cloud and Hybrid HPC Integration:
    Design and operate workflows on public cloud platforms (AWS, Azure, or GCP) and in hybrid on‑prem/cloud environments. Leverage cloud object storage (e.g., Amazon S3) for data staging and artifacts; implement parallel, secure data movement and lifecycle policies.
Basic Qualifications
  • Ph.D. in Computer Science, Computer Engineering, Computational Engineering, or a closely related field.
  • At least 2 years of experience working with Linux‑based systems; familiarity with core utilities and managed services such as coreutils, ssh, tmux, and common system services.
  • At least 1 year of programming experience in one or more of Python, C/C++, or Bash.
  • Strong verbal and written communication skills, with the ability to collaborate across technical and scientific teams.
Preferred Qualifications
  • Demonstrated experience in leading scientific research and publishing in high‑impact venues.
  • Experience with HPC job schedulers such as SLURM or PBS.
  • Familiarity with basic cyber‑security principles (e.g., firewalls, network segmentation, secure configuration).
  • Basic web development skills (e.g., HTML, CSS) for lightweight dashboards or documentation.
Security, Credentialing, and Eligibility Requirements

For employment at Oak Ridge National Laboratory (ORNL), a Real  form of identification will be required. Additionally, ORNL is subject to Department of Energy (DOE) access restrictions. All employees must also be able to obtain and maintain a federal…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary