Senior HPC Engineer/Administrator; IMC
Listed on 2026-01-23
-
IT/Tech
Systems Engineer, IT Support, Systems Administrator, Cybersecurity
Serving Maryland and the Greater Washington D.C. area, Sage Cor Solutions (Sage Cor) is a growing company bringing complete engineering services and true full lifecycle System Engineering services to areas requiring nationally-recognized expertise in high performance computing, large data analytics and cutting edge information technologies.
Active TS/SCI w/ Polygraph required.
Systems Administrator responsibilities:
Provide system administration and technical support of traditional and High-Performance Computing (HPC) systems in a research-driven environment.
- Configure and manage Linux and Windows operating systems and installs/loads operating system software, troubleshoot, maintain integrity of and configure network components, along with implementing operating systems enhancements to improve security, reliability, and performance.
- Administer, monitor, and maintain HPC systems, including compute nodes, storage, networking, and software stacks.
- Provide support to IT systems including day-to-day operations, monitoring and problem resolution for all client/server/storage/network devices, mobile devices, etc.
- Implement and maintain automation tools for system provisioning, configuration management, and monitoring.
- Provide support for implementation, troubleshooting and maintenance of IT systems.
- Manage the daily activities of configuration and operation of IT systems.
- Provide assistance to users in accessing and using IT systems.
- Optimize system operations and resource utilization, and perform system capacity analysis and planning.
- Provide in-depth experience in troubleshooting IT systems.
- Analyze and resolve complex problems associated with server hardware, applications and software integration.
- Contribute to performance benchmarking, system tuning, and capacity planning.
- Support researchers by providing technical expertise and resolving IT-related roadblocks or issues.
- Document system administration procedures and contribute to knowledge-sharing initiatives.
- Experience administering Linux-based servers and HPC clusters, including job schedulers (e.g., Slurm, LSF, PBS).
- Experience configuring and managing Virtual Private Network (VPN) clients and servers.
- Scripting/programming skills (C and Python).
- Knowledge of system automation tools (e.g., Ansible).
- Knowledge of system provisioning tools (e.g., Warewolf).
- Knowledge of distributed storage systems (e.g., Lustre, BeeGFS).
- Knowledge of containerization (e.g., Docker, Apptainer).
- Knowledge of installing, maintaining and using infrastructure and performance monitoring and optimization tools (e.g., Grafana, Prometheus).
- Knowledge of setting up and executing benchmarks in an HPC environment and analyzing their results systematically.
- Active Top Secret/SCI clearance with polygraph.
- Preferably meets DoD 8140.01 or DoD 8570.01-M training and certification requirements.
Consistent with federal and state law where Sage Cor conducts business, Sage Cor Solutions provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status, or any other protected class.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).