HPC Production Engineer
Listed on 2026-01-12
-
Engineering
Systems Engineer -
IT/Tech
Systems Engineer
Senior Technology & Quant Finance Associate at NJF Global Holdings
We’re partnered with a leading global trading firm looking for a highly specialized HPC Production Engineer to support their large-scale, mission-critical compute and storage environments. This role is ideal for engineers who thrive on deep technical challenges, working with high-performance systems that power complex workloads.
This is not a generalist IT role—we’re looking for someone who can dig into HPC internals, troubleshoot complex production issues, and optimize systems across compute, storage, and network layers.
Responsibilities- Design, implement, and maintain large-scale HPC compute and storage infrastructure.
- Monitor and optimize system performance, diagnose complex production incidents, and perform root cause analysis.
- Build and maintain tooling for software deployment, OS upgrades, and cluster provisioning.
- Implement and maintain performance and fault monitoring systems.
- Collaborate with researchers and engineers to analyze workloads and optimize HPC performance.
- Provide operational support on a rotating on-call schedule.
- 5+ years of hands‑on HPC production experience, including parallel file systems (Lustre, GPFS, or similar) and batch schedulers (Slurm, Grid Engine, or equivalent).
- Strong Linux systems administration skills, including kernel, memory, I/O, and process‑level debugging.
- Programming/scripting proficiency in Go, Python, C/C++, or similar.
- Experience designing, building, and operating complex distributed systems.
- Familiarity with configuration management and automation tools (Ansible, Salt Stack, Puppet, etc.).
- Comfortable working across compute, storage, and network layers.
- Hands‑on, collaborative, and capable of operating high‑performance infrastructure at scale.
This is an excellent opportunity for HPC engineers who enjoy working at the forefront of large‑scale computing environments, solving challenging technical problems, and supporting mission‑critical systems.
Seniority levelMid‑Senior level
Employment typeFull‑time
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).