Principal Software Engineer - AI Platform Job Toronto area,Ontario Canada,IT/Tech

Overview   Your work days are brighter here. We’re obsessed with making hard work pay off, for our people, our customers, and the world around us. As a Fortune 500 company and a leading AI platform for managing people, money, and agents, we’re shaping the future of work so teams can reach their potential and focus on what matters most. The minute you join, you’ll feel it.

Not just in the products we build, but in how we show up for each other. Our culture is rooted in integrity, empathy, and shared enthusiasm. We’re in this together, tackling big challenges with bold ideas and genuine care. We look for curious minds and courageous collaborators who bring sun-drenched optimism and drive. Whether you're building smarter solutions, supporting customers, or creating a space where everyone belongs, you’ll do meaningful work with Workmates who’ve got your back.

In return, we’ll give you the trust to take risks, the tools to grow, the skills to develop and the support of a company invested in you for the long haul. So, if you want to inspire a brighter work day for everyone, including yourself, you’ve found a match in Workday, and we hope to be a match for you too.
About the Team The Workday AI Infrastructure and Operations team is seeking an energetic and determined Software Engineer to design, implement, and deliver highly scalable features for our AI Platform. As a member of this fast-paced group, you will have a unique and rewarding opportunity to shape and contribute towards microservices that power Workday AI features in production. You will partner with Data Scientists, AI Engineers, and other Software Engineers to create the technology that brings these features to life.
About the Role In this Principal-level role, you will lead high-impact infrastructure initiatives. You will work with public clouds (such as IAAS, AWS, GCP) and apply capacity management principles to build systems that allow developers to streamline their interactions with the AI platform.

Key Responsibilities    Cluster Consolidation: Architect and implement the consolidation of applications from multiple EKS clusters to optimize resource density and operational efficiency.
Global CD Architecture: Design a shared continuous deployment and configuration management system capable of operating across two distinct cluster types in multiple global regions.
ArgoCD Rollout: Lead the strategic rollout of ArgoCD across a massive fleet of clusters, ensuring compliance with strict security standards including FedRAMP environments.
Infrastructure as Code: Own and develop features from end-to-end including infrastructure as code, specifically increasing the use of Terraform for automation.
Container Orchestration: Deploy and orchestrate containers in production environments, leveraging Kubernetes and Service Mesh to enhance developer scalability.
Distributed Systems: Oversee the implementation and operation of distributed systems including the conception, specifying, designing, and documenting involved in creating and maintaining frameworks.

Collaboration:

Actively engage with Tech Leads and AI Engineers across teams to elaborate on requirements and drive technical solutions.
About You We need creative and dedicated Software Engineers, like you, who really want to move the needle. By nature, you are inquisitive and ready to question the status quo. You have a passion for exploring and implementing innovative techniques and approaches to solve sophisticated and ambitious problems. Most meaningfully of all, you are a superlative collaborator and teammate and bring out the very best in everyone.
Basic Qualifications    Dev Ops & Infrastructure

Experience:

10+ years of total software engineering or Dev Ops experience, with at least 8+ years focused specifically on Infrastructure Automation, Site Reliability Engineering (SRE), or release engineering in a Linux environment.
Kubernetes at Scale: 5+ years of hands-on experience managing Kubernetes in production. Must have experience managing multi-cluster environments (fleets of 10+ clusters or 500+ nodes).
Infrastructure as Code (IaC) Mastery: 5+ years of experience using Terraform to manage…


Increase/decrease your Search Radius (miles)



Job Posting Language