AI Infrastructure Engineer
Listed on 2026-01-23
-
Engineering
AI Engineer, Systems Engineer -
IT/Tech
AI Engineer, Systems Engineer
Overview
The world is transforming - and so is Intel. Intel is a company of bold and curious inventors and problem solvers who create some of the most astounding technology advancements and experiences in the world. With a legacy of relentless innovation and a commitment to bring smart, connected devices to every person on Earth, our diverse and brilliant teams are continually searching for tomorrow's technology and revel in the challenge that changing the world for the better brings.
We work every single day to design and manufacture silicon products that empower people's digital lives. Come join us and do something wonderful.
We are seeking a highly experienced Senior AI/ML Infrastructure Engineer to join our cloud systems team. This role focuses on designing, implementing, and optimizing AI accelerator systems and cloud infrastructure for large-scale machine learning workloads. The ideal candidate will have extensive experience with AI hardware platforms, system-level debugging, and cross-functional collaboration in enterprise environments.
Key Responsibilities- AI/ML System Engineering:
Design and optimize AI accelerator systems (Gaudi, GPU clusters) for production ML workloads;
Debug complex PCIe, memory subsystem, and interconnect issues in AI clusters;
Validate and integrate the cutting-edge GPUs and AI accelerator platforms. - System Integration and Validation:
Lead platform bring-up and validation for next-generation AI hardware;
Develop comprehensive test plans for AI systems;
Collaborate with OEM vendors on BMC firmware integration and system stability;
Perform full-stack debugging across hardware, firmware, and software layers. - Infrastructure and Tooling:
Develop automated testing frameworks and monitoring solutions;
Create diagnostic tools and APIs for system health monitoring. - Leadership and
Collaboration:
Mentor junior engineers and data center technicians;
Lead cross-functional teams through complex technical challenges;
Coordinate with hardware, firmware, and software teams on platform readiness;
Drive technical decisions and architectural improvements.
You must possess the below minimum qualifications to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates. Experience listed below would be obtained through a combination of your degree, research and or relevant previous job and or internship experiences.
Minimum Qualifications:
- Bachelors & 6+ years or Masters & 4+ years or PhD & 2+ years in Computer Science, Electrical Engineering, or related field
- 5+ years of experience in system engineering, platform validation, or related roles.
- 2+ years experience of successfully bringing up and debugging high-performance AI clusters.
- 2+ years experience resolving complex system-level issues in production AI/ML environments.
- 2+ years experience AI cluster design, validation, and production deployment experience.
- 2+ years experience Full-stack debugging capabilities from hardware layer through application layer.
Preferred Qualifications:
- Experience with Intel platforms (Xeon, Gaudi) or similar GPU or AI accelerators.
- Familiarity with cloud deployment and containerization.
- Programming:
Expert-level Python. - AI/ML Frameworks:
Experience with vLLM, PyTorch, Tensor Flow, OpenMPI - System Tools:
Linux/Unix administration, Docker, shell scripting. - Hardware:
Deep understanding of PCIe, memory subsystems, AI accelerators. - Protocols:
Redfish, IPMI, BMC management. - Computer architecture and microprocessor design.
- AI/ML workload optimization and deployment.
- System-level debugging and validation methodologies.
- Enterprise platform security and manageability.
Work Model for this Role:
This role will require an on-site presence. Job posting details (such as work model, location or time type) are subject to change.
Posting Statement:
All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.
Benefits:
We offer a total compensation package that ranks among the best in the industry. It consists of competitive pay, stock bonuses, and benefit programs which include health, retirement, and vacation. The company-wide benefits information can be found in the Intel benefits documentation.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).