Job Description & How to Apply Below
With expertise across a broad range of sectors and services, our consultants serve clients worldwide. Our expertise delivers results. Our optimism transforms outcomes.
Job Description
Sia Partners is looking for a Platform Engineering Manager to support the design and delivery of next-generation AI and Generative AI platforms within Sia's AI Factory. This role is pivotal in bridging high-level product vision with robust, cloud-native engineering execution.
As a Platform Engineering Manager, you will be responsible for building and evolving internal development and MLOps platforms that improve automation, scalability, reliability, and developer productivity. You will operate as a player-coach, combining hands-on technical leadership with people management and strategic ownership.
This is a product-focused role, working closely with product managers, data scientists, ML engineers, application engineers, and security teams to deliver platform capabilities that directly support AI, data, and software workloads
Key Responsibilities
Leadership of Platform / Dev Ops / SRE engineering teams, ensuring delivery excellence and strong engineering culture
Ownership of internal platform products, working closely with product, application, and data engineering teams to deliver scalable, reliable, and secure platform capabilities.
Support to Data Scientists, ML Engineers, Data Engineers, and Software Engineers by providing reliable, scalable, and easy-to-use platform services
Definition and execution of the platform engineering strategy and roadmap, aligned with business and delivery objectives
Development and operation of internal developer platforms enabling automation, self-service, and scalability
Support to Data Scientists, Data Engineers, and Software Engineers by providing reliable, secure, and scalable platforms
Cloud services: architecture and operations across AWS, Azure, and GCP, including compute, storage, networking, access management, cost monitoring, and cost optimization
Infrastructure as Code: design, standardization, and governance using Terraform
Containers: containerization and orchestration of applications using Docker and Kubernetes, including Kubernetes platform ownership
CI/CD: definition and standardization of continuous integration and deployment pipelines
Observability & reliability: monitoring, logging, alerting, and application of SRE principles to ensure availability, performance, and resilience
Contribution to technological, architectural, and governance decisions to address the challenges of scaling AI and data platforms
Collaboration with product, application, data, and security teams to gather requirements and deliver platform capabilities
Planning and management of platform initiatives, including timelines, resourcing, and budget oversight
Mentoring engineers and fostering knowledge sharing and continuous improvement.
Qualifications
Experience:
8+ years of experience in the data/software space, with at least 3+ years in a formal people management or technical leadership role leading Data Science or ML teams.
Strong hands-on experience with cloud platforms (AWS, Azure, and/or GCP)
Strong expertise in Infrastructure as Code, preferably Terraform
Solid experience with Docker and Kubernetes
Experience designing and operating CI/CD pipelines
Proficiency in Python scripting and strong Linux fundamentals
Good understanding of SRE principles, reliability, scalability, and security best practices
AI-Native Engineering Leadership:
Experience managing teams that utilize Cursor, Git Hub Copilot, or Claude Code as a core part of their daily workflow.
Strong communication, leadership, and stakeholder management skills
Fluency in English, written and spoken.
Additional Information
What Success Looks Like
Platform Ownership:
Treats platform capabilities as long-lived products, not one-off infra projects.
Enablement Mindset:
Measures success by how effectively product teams ship using the platform.
Strong Technical Judgment:
Understands platform trade-offs across scalability, reliability, security, and cost.
Production Readiness:
Ensures platforms are production-grade (deployment, monitoring, rollback, lifecycle).
Reliability & Operations:
Owns uptime, incident response, post-mortems, and continuous improvement.
Developer
Experience:
Reduces friction in CI/CD, environments, tooling, and self-service workflows.
Cost & Efficiency Awareness:
Balances performance with cloud and compute cost optimization
Roadmap & Prioritization:
Aligns platform roadmap with product needs and AI Factory growth.
Team Leadership:
Builds and…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×