System Administrator – HPC/EDA Infrastructure
System Administrator – HPC / EDA Infrastructure
We are seeking a highly skilled System Administrator to design, administer, and optimize large-scale Red Hat Enterprise Linux (RHEL)-based High-Performance Computing (HPC) and Electronic Design Automation (EDA) infrastructure. The ideal candidate will have strong experience in Linux system administration, workload orchestration, virtualization, storage integration, performance tuning, and infrastructure automation within a regulated enterprise data center environment. This role supports engineering workloads and compute-intensive environments used for semiconductor design, simulation, and verification.
Key ResponsibilitiesAdminister and maintain large-scale RHEL-based HPC environments
Perform system tuning for high CPU, memory, and I/O workloads
Manage OS patch lifecycle, upgrades, and security compliance
Implement system hardening aligned with CIS security baselines
Configure and manage IBM LSF or Slurm workload schedulers
Monitor cluster performance and optimize job scheduling efficiency
Implement resource allocation policies and quota management
Manage VMware clusters, including HA, DRS, and resource pools
Support VDI environments for engineering workloads (CPU/GPU)
Maintain golden images and provisioning workflows for VDI users
EDA InfrastructureDeploy and maintain EDA tool environments including Cadence, Synopsys, and Siemens
Perform tool performance tuning and patch compatibility validation
Manage license servers (FlexLM / RLM) including triad redundancy configurations
Storage & Data ManagementConfigure and maintain NFSv4 and parallel file systems
Ensure proper file permissions, locking, and secure access to design data
Support high-throughput storage infrastructure for compute workloads
Automation & Dev OpsDevelop automation workflows using Ansible, shell scripting, or Python
Automate system provisioning, patching, and monitoring tasks
Support emerging container technologies (Docker / Podman) for EDA workloads
Monitor infrastructure health and perform capacity planning
Analyze system logs and resolve performance or reliability issues
Backup, Recovery & ResilienceManage backup policies and validate recovery procedures
Conduct disaster recovery (DR) and high availability (HA) testing
User Environment & Access ManagementProvision user environments and manage access lifecycle
Support engineers using shared compute infrastructure
Maintain documentation and follow enterprise change management processes
QualificationsRequired Qualifications
5+ years of Linux system administration experience in enterprise environments
Strong expertise in Red Hat Enterprise Linux (RHEL)
Experience with HPC cluster management or large compute farms
Hands-on experience with LSF, Slurm, or similar job schedulers
Experience with VMware virtualization
Knowledge of NFS storage environments
Scripting experience (Bash, Python, or similar)
Strong troubleshooting and performance optimization skills
Required CertificationsRHCSA or RHCE
Preferred QualificationsExperience supporting EDA environments
MCSE or equivalent Microsoft infrastructure experience
AWS certification
VMware certification (VCP or higher)
Experience with GPU computing or VDI platforms
Familiarity with container environments (Docker / Podman)
Soft SkillsStrong analytical and problem-solving skills
Excellent documentation and communication skills
Ability to work in cross-functional engineering environments
Experience working in regulated or compliance-driven environments
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).