×
Register Here to Apply for Jobs or Post Jobs. X

AI Infrastructure Engineer

Job in Morrisville, Wake County, North Carolina, 27560, USA
Listing for: Adecco
Full Time position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    AI Engineer, Systems Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: AI Infrastructure Engineer (788235)

Overview

Are you an experienced GPU systems engineer passionate about building and maintaining high‑performance AI infrastructure? Join our Global Fortune 500 Tech Client
, a world‑renowned global technology leader at their Morrisville, NC campus and play a critical role in supporting advanced machine learning and large‑scale AI workloads.

We’re seeking a skilled AI Infrastructure Engineer who thrives at the intersection of hardware, Linux systems, and ML operations. In this role, you’ll ensure GPU servers and AI clusters run at peak performance—supporting cutting‑edge AI research and enterprise‑scale LLM training environments.

Responsibilities
  • Monitor, maintain, and optimize high‑performance GPU servers/workstations.
  • Diagnose and resolve hardware issues (GPU faults, cooling/power issues, component failures).
  • Coordinate hardware repairs, upgrades, and replacements to maintain system uptime.
  • Software & Driver Administration:
    Install, configure, and update Linux OS (Ubuntu/CentOS), NVIDIA CUDA drivers, and supporting software; ensure compatibility between hardware, drivers, and ML frameworks.
  • Performance Benchmarking & Optimization:
    Execute and analyze MLPerf or similar benchmarking suites; identify bottlenecks and tune configurations for ML training performance; continuously monitor system performance and stability; investigate kernel errors, networking issues, and resource contention within training clusters; implement corrective actions to prevent downtime and performance degradation; manage logging, backups, firmware updates, system patching, and cluster health.
Required Qualifications
  • 3+ years managing GPU‑accelerated servers or HPC environments.
  • Hands‑on experience with NVIDIA GPU hardware (A100, H100, etc.) and CUDA toolkit/drivers
    .
  • Familiarity with ML frameworks:
    Tensor Flow
    , Py Torch , or Hugging Face
    .
  • Skilled in diagnostic tools: nvidia-smi, dmesg, top/htop, Prometheus/Grafana.
  • Solid understanding of AI infrastructure, containerization, and distributed training concepts.
  • Excellent problem‑solving abilities and proactive ownership of system reliability.
Preferred Skills
  • Experience with cluster orchestration:
    Slurm
    , Kubernetes
    , Ray
    .
  • Knowledge of server hardware diagnostics (IPMI, BIOS configs, RAID arrays).
  • Background in MLOps or Dev Ops for AI environments.
  • Certifications such as RHCE
    , NVIDIA credentials, or similar.
  • Ability to work independently in a fast‑paced, highly technical environment.
Why This Opportunity Stands Out
  • Direct contribution to large‑scale AI and LLM infrastructure.
  • Work on a state‑of‑the‑art enterprise campus with cutting‑edge GPU hardware.
  • Join an engineering team driving next‑generation machine learning innovation.

We’re very excited about this incredible opportunity to lead in the AI Workspace with a global leader in computing.

APPLY NOW! HERE!

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary