System Level Debug Engineer – Data Center GPU
Listed on 2026-03-01
-
IT/Tech
Systems Engineer, IT Support
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next‑generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture.
We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
AMD’s Data Center GPU organization is transforming the industry with our AI‑based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and join our Data Center GPU organization where we are building amazing AI‑powered products with amazing people.
THE ROLE:AMD is looking for a lead systems engineer to provide thought leadership and subject matter expertise to our growing team. As a key contributor, you will have a strong technical background to contribute to all aspects of the software development process. We have competitive benefit packages and an award‑winning culture. Join us!
The Datacenter Graphics and Accelerated Computing (DCGPU) organization is looking for an experienced system‑level debug engineer. Individual will be part of a team that brings‑up, validates and ensures the platform being used is fully validated—including electrical, power, networking and SOC. Individual will be required to lead and document the plan for validating the system itself as well as document the unique steps to enable it.
Individual will need to be able to drive to root closure any issues encountered and communicate with the different functional and IP layers for resolution.
You are a highly motivated hands‑on leader with a strong development background, problem‑solving mentality, excellent communication skills, ability to prioritize tasks along with a willingness to learn and adapt. Excellent teamwork skills and capable of leading a highly technical team.
Experience in debugging of complex HW/FW issues is a must, understand the flow of a GPU through the different layers of a system and be able to validate the items connecting to the GPU SOC (PCIe, VR’s, RMs, retimers, HBM, internal networking). Communication is essential in working with different owners of the functional code stack as well as the ability to drive issues via phone calls, chat messages, e‑mails.
Hands‑on experience with hardware in a Data Center environment will be required.
- Debug / triage engineer and understanding of industry tools for root causing complex issues
- Understanding of GPU/System level HW and SW flow
- Ability to probe parts of a board; check electrical and power currents and validate a system
- Provide leadership for driving to root cause issues
- Communicate / Document flows and methods of bring‑up, boot‑up, system initialization and debug
- Lead technical presentations demonstrating a good understanding of application, data, infrastructure, architecture expertise and application systems design
- Collaborate with application, and infrastructure architects and be responsible for the defining‑designing‑delivering of the technical architectures, patterns, technical quality, risks, fitness for purpose and operability of technical architecture solutions
- Be a leader and mentor to the operation team; be hands‑on and lead by example
- Be able to hands‑on troubleshoot and solve the technical issues; own the problem and drive for resolution
- Able to proactively support team culture that fosters knowledge sharing, excellence, and collaboration
- Significant experience in SoC and/or System debug of complex issues
- Develop / Document debug capabilities on a given SOC and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).