Senior Director - Cloud Platform Engineering
Listed on 2026-01-20
-
IT/Tech
Systems Engineer, Cloud Computing
Life at MX
We are driven by our moral imperative to advance mankind - and it all starts with our people, product and purpose. We always carry a deep sense of drive and passion with us. If you thrive in a challenging work environment, surrounded by incredible team members who will help you grow, MX is the right place for you. Come build with us and be part of an award‑winning company that’s helping create meaningful and lasting change in the financial industry.
Aboutthe Role
We are looking for a US‑based Senior Director that would be a strategic, operational, execution, and escalation owner for all the site, infrastructure and cloud platform services.
This role is personally accountable for the production reliability and stability, including owning US time‑zone incidents, Sev 0/1 events, leading cutovers, and directly representing site, infrastructure and platforms to executive leadership during high‑impact events. The expectation is that this leader stands front of the line during critical incidents and events like migration and stabilization, makes real‑time decisions, and clearly articulates risk, impact, and trade‑offs to executives under pressure.
This front‑line ownership is intentional but transitional. A core measure of success for this role is building the systems, operating model, delegation structure, and a strong leadership bench such that sustained, high‑quality operations do not depend on the continuous personal presence of a single leader. The leader is expected to design for leverage: establishing clear ownership, developing managers/leaders, and embedding practices that scale reliability beyond individual heroics.
In parallel, they are expected to lead the full lifecycle of our infrastructure transformation, from data center exit and AWS migration through steady‑state cloud operations and platform maturity. Success is measured not just by completing the migration, but by leaving behind a durable operating model with clear delegation, on‑call ownership, and predictable executive engagement. The ideal candidate will have personally led large‑scale data center exits and Cloud migrations, not just advised or governed them.
WhatSuccess Looks Like in 12–24 Months
- 100% exit from on‑premise data centers, with all targeted workloads successfully migrated to AWS and on‑prem dependencies fully decommissioned.
- A clear, stable post‑migration operating model in place, with unambiguous ownership across teams.
- 99.99%+ Availability for Platform and infrastructure services consistently, with active error budget management guiding operational and delivery decisions.
- Reduction in Sev 0 and Sev 1 incidents, with measurable reduction in customer‑impacting events and improved predictability of recovery.
- Improved incident KPIs, including faster MTTR and reduced incident recurrence.
- Declining operational toil through automation, standardization, and self‑service platform capabilities.
- Mature incident management practices, including blameless postmortems and systemic remediation of root causes.
- A strong leadership bench providing resilient production coverage, confident incident leadership, and effective delegation.
- Improved cost efficiency and visibility across cloud infrastructure post‑migration through Fin Ops practices, capacity right‑sizing, and platform standardization.
- Personally own and execute the end‑to‑end data center exit and AWS migration, from discovery and planning through cutover, stabilization, and full decommissioning.
- Define migration waves, readiness gates, and cutover plans with explicit transition into steady‑state ownership, avoiding temporary or parallel operating models.
- Own architectural decisions across AWS networking, compute, storage, security, and observability, ensuring designs are operable, supportable, and resilient post‑migration.
- Establish and own the post‑migration operating model for cloud infrastructure and platforms, explicitly tied to outcomes:
- Clearly defined SLIs, SLOs, and error budgets for all Tier‑1 and Tier‑2 services
- Accountable owners for SLO attainment across SRE, platform, and product teams
- On‑call and escalation models…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).