×
Register Here to Apply for Jobs or Post Jobs. X

Software Development Engineer – Software Dev Ops & Continuous Integration Team

Job in Markham, Ontario, Canada
Listing for: AMD
Full Time position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    Cloud Computing, Data Engineer, AI Engineer, Systems Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 CAD Yearly CAD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Overview

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems.

Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary.

When you join AMD, you’ll discover the real differentiator is our culture: we push the limits of innovation to solve the world’s most important challenges, striving for execution excellence while being direct, humble, collaborative and inclusive of diverse perspectives.

The Role

The AI/ML Frameworks team is hiring a MTS Software Development Engineer to build and maintain scalable Dev Ops infrastructure that accelerates AMD’s AI software development.

Key responsibilities include designing and owning CI/CD pipelines, managing Kubernetes‑based GPU environments, and automating systems using Python, Go, and Ansible.

Key Responsibilities
  • Build System Expertise & Issue Triaging:
    Develop deep expertise in build tools and flows (CMake, Bazel, Make, compiler tool chains). Triage complex build failures by understanding the full build pipeline—from source to binary. Identify root causes across infrastructure, toolchain, and code-level issues.
  • Team Training & Knowledge Sharing:
    Train and mentor team members on build systems, CI/CD workflows, and debugging techniques. Create documentation, runbooks, and training sessions to enable independent triage.
  • ML Framework Integration & Code Contribution:
    Understand the architecture and codebase of ML frameworks (PyTorch, Tensor Flow, ROCm stack). Review, debug, and contribute code changes to resolve build issues, improve CI reliability, and support new features.
  • Tooling & Automation Development:
    Design and develop internal tools, automation scripts, and services primarily in Python and Go. Write well‑tested, production‑grade code to solve infrastructure and workflow challenges.
  • CI/CD Pipeline Development:
    Design, implement, and manage efficient continuous integration and delivery pipelines using Buildkite, Git Hub Actions, and Jenkins to enable rapid and reliable software deployment for ML workloads.
  • Kubernetes Infrastructure Management:
    Deploy and maintain robust Kubernetes‑based environments across on‑premise and cloud platforms to support scalable service orchestration.
  • Infrastructure Automation:
    Automate provisioning, configuration, and management of infrastructure using Ansible, Python, and Bash to improve system consistency and reduce manual intervention.
  • Service Deployment with Helm:
    Administer application and service deployment in Kubernetes using Helm charts for consistent and repeatable release processes.
  • GPU Server Support:
    Configure, manage, and maintain GPU‑based compute environments including lifecycle automation and hardware‑level test integration for ML training and inference workloads.
  • Database and Observability Integration:
    Interact with MySQL databases to support dynamic data updates and integrate data sources into Grafana dashboards for monitoring and insights.
  • Cross‑Functional

    Collaboration:

    Work closely with ML framework developers, SREs, and project stakeholders to ensure system‑level alignment and high‑impact delivery.
  • Quality Assurance Enablement:
    Integrate automated testing frameworks into CI pipelines to ensure code quality, stability, and performance across development cycles.
Preferred Experience
  • Build Systems & Tool chains:
    Strong understanding of CMake, Bazel, Make, and compiler tool chains (GCC, Clang, LLVM). Debug complex build failures, optimize build performance.
  • Programming

    Languages:

    Strong proficiency in Python and Go for building tools, services, and automation. Ability to read and modify C++ code is a plus.
  • ML Framework Familiarity:
    Understanding of ML framework architecture (PyTorch, Tensor Flow, JAX, or similar). Navigate large codebases, contribute fixes or improvements.
  • Mentorship & Training:
    Document complex systems, train team members, break down technical concepts.
  • Dev Ops Tools & Automation:
    Proficient with Buildkite, Git Hub Actions, Jenkins, Ansible, and scripting.
  • Containerization & Orchestration:
    Ex…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary