System Level Debug Engineer – Data Center GPU
Listed on 2026-02-14
-
IT/Tech
Systems Engineer, IT Support
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next‑generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture.
We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.
AMD’s Data Center GPU organization is transforming the industry with AI‑based graphic processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise data centers, AI, HPC and embedded systems. If this resonates with you, come join our Data Center GPU organization where we are building amazing AI‑powered products with amazing people.
The RoleAMD is looking for a lead systems engineer to provide thought leadership and subject matter expertise to our growing team. As a key contributor, you will have a strong technical background to contribute to all aspects of the software development process. The Datacenter Graphics and Accelerated Computing (DCGPU) organization is looking for an experienced system‑level debug engineer. The individual will be part of a team that brings up, validates and ensures the platform used is fully validated—including electrical, power, networking and SOC.
The individual will lead and document the plan for validating the system itself and document unique steps to enable it, driving root‑cause closure of any issues encountered and communicating with different functional and IP layers for resolution.
You are a highly motivated hands‑on leader with a strong development background, problem‑solving mentality, excellent communication skills, ability to prioritize tasks along with willingness to learn and adapt. You have excellent teamwork skills and are capable of leading a highly technical team. Experience debugging complex HW/FW issues is a must; you understand the flow of a GPU through the different layers of a system and can validate items connecting to the GPU SOC (PCIe, VR’s, RMs, retimers, HBM, internal networking).
Communication is essential in working with different owners of the functional code stack as well as the ability to drive issues via phone calls, chat messages, e‑mails. Hands‑on experience with hardware in a data‑center environment will be required.
- Debug / triage engineer and understand industry tools for root‑causing complex issues
- Understand GPU/System‑level HW and SW flow
- Probe parts of a board, check electrical and power currents and validate a system
- Provide leadership for driving to root cause issues
- Communicate and document flows and methods of bring‑up, boot‑up, system initialization and debug
- Lead technical presentations demonstrating a good understanding of application, data, infrastructure, architecture expertise and application systems design
- Collaborate with application and infrastructure architects and be responsible for defining, designing, delivering technical architectures, patterns, technical quality, risks and operability of solutions
- Be a leader and mentor to the operations team; be hands‑on and lead by example
- Hands‑on troubleshoot and solve technical issues; own the problem and drive for resolution
- Proactively support a team culture that fosters knowledge sharing, excellence, and collaboration
- Significant experience in SoC and/or system debug of complex issues
- Develop and document debug capabilities on a given SOC and system
- Go‑to person for debugging production‑level platform validation
- Collaborate with internal teams to root‑cause issues and find optimum resolutions
- Hands‑on experience using industry debug tools, scopes and board‑level power analysis
- Proven experience with C/C++
- Demonstrable experience…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).