Job Description & How to Apply Below
At AMD, our mission is to build great products that accelerate next generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture.
We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.
Together, we advance your career.
THE ROLE:
The AI/ML Frameworks team is hiring a Software Development Engineer to build and maintain scalable Dev Ops infrastructure that accelerates AMD’s AI software development. You will lead CI/CD pipeline design, manage Kubernetes deployments, and automate systems with Ansible and Python. This role supports GPU environments, integrates monitoring tools, and enables rapid, reliable software delivery across teams.
THE PERSON:
The ideal candidate is a skilled engineer with a strong background in Dev Ops, site reliability, or infrastructure engineering. They are proficient in Kubernetes, CI/CD tools, scripting (Python/Bash), and infrastructure automation frameworks such as Ansible. Experience working with GPU compute environments and integrating automated test workflows is highly valued. This person thrives in collaborative, fast-paced environments and can drive technical execution with minimal oversight.
They bring a problem-solving mindset, strong communication skills, and a passion for building reliable, scalable systems.
KEY RESPONSIBILITIES:
CI/CD Pipeline Development:
Design, implement, and manage efficient continuous integration and delivery pipelines using Buildkite, Git Hub Actions, and Jenkins to enable rapid and reliable software deployment.
Kubernetes Infrastructure Management:
Deploy and maintain robust Kubernetes-based environments across both on-premise and cloud platforms to support scalable service orchestration.
Infrastructure Automation:
Automate provisioning, configuration, and management of infrastructure using Ansible, Python, and Bash to improve system consistency and reduce manual intervention.
Service Deployment with Helm:
Administer application and service deployment in Kubernetes using Helm charts for consistent and repeatable release processes.
GPU Server Support:
Configure, manage, and maintain GPU-based compute environments including lifecycle automation and hardware-level test integration.
Database and Observability Integration:
Interact with MySQL databases to support dynamic data updates and integrate data sources into Grafana dashboards for monitoring and insights.
Cross-Functional Collaboration:
Work closely with development teams, SREs, and project stakeholders to ensure system-level alignment and high-impact delivery.
Quality Assurance Enablement:
Integrate automated testing frameworks into CI pipelines to ensure code quality, stability, and performance across development cycles.
PREFERRED EXPERIENCE:
Dev Ops Tools & Automation:
Proficient with Buildkite, Git Hub Actions, Jenkins, Ansible, and scripting languages like Python and Bash for streamlining Dev Ops workflows.
Containerization & Orchestration:
Strong experience with Docker, Kubernetes, and Helm for deploying and managing scalable, containerized applications.
Infrastructure as Code (IaC):
Hands‑on experience automating infrastructure provisioning and configuration to ensure reproducibility and scalability across environments.
GPU–Based Compute Environments:
Familiarity with GPU server lifecycle management and integration of GPU resources into CI test workflows for performance‑critical applications.
Monitoring & Observability:
Experience using tools like Checkmk, Prometheus, and Grafana to monitor infrastructure health and application performance.
Version Control &
Collaboration:
Advanced knowledge of Git-based version control, including branching strategies and CI/CD integration for collaborative…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×