Senior Platform Engineer Job Dallas area,Texas USA,IT/Tech

Requisition

Salary Range:

- Please note that the Salary Range shown is a guideline only. Salary offered may vary based on factors, including, but not limited to, the successful candidate's relevant knowledge, skills, and experience.

Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.

Global Banking and Markets

Global Banking and Markets (GBM) is a leading Canadian Capital Markets and Investment Banking business with a growing platform in the US and Latin America, operating globally for over 100 years. Scotiabank's strong U.S. presence provides our clients an important bridge to this key global market for trade and investment flows across the Americas and the world.

Global Banking & Markets provides a full range of investment banking, credit and risk management products and services relevant to the financing and strategic development needs of our clients. Our products include debt and equity financing, mergers & acquisitions, corporate banking, institutional equity sales, trading and research, fixed income products, derivatives, energy, foreign exchange and precious & metals. We also cross-sell the full range of wholesale products and services offered by the Scotiabank Group.

Be part of an innovative, Global Capital Markets and Investment Banking business with a unique geographic footprint that puts capital to work for our clients across industries! We work together to drive ambition for every future!

Purpose:

The Senior Platform Engineer will be responsible for the building, tuning, managing infrastructure, Dev Ops, Platform site reliability, monitoring, troubleshooting, enhancing, enabling new features on Data & AI platform(s) as per banks Data & AI strategy. This consists of working with cross functional teams like IAM, Network, Cloud Ops, Security, Client partners etc for integration, process automation, platform enhancement and delivery of new projects.

What You'll Do

* Guidance and Direction:
Provide clear direction to the team, set goals, and keep the team accountable for their deliverables. Align team goals with the overall direction of the Azure & Databricks Platform roadmap and enterprise standards.

* Technical Oversight:
Own the technical direction across Azure and Databricks:
Azure networking and security architecture (VNets, Private Endpoints, NSGs, route tables, Azure Firewall), Azure Identity & Access Management (RBAC, PIM), and Databricks platform governance (Unity Catalog, workspace configuration, cluster policies). Ensure best practices for reliability, cost, and security are consistently applied.

* Quality Assurance:
Ensure a high quality of support delivery for platform users; adhere to platform SLAs/SLOs and service objectives

* Process Improvements:
Continually improve platform processes and SOPs for efficiency and automation. Design and develop reusable Terraform modules for Azure native resources and Databricks (clusters, SQL warehouses, Unity Catalog objects), enabling consistent, scalable, and automated deployments via Terraform Cloud/Enterprise and CI/CD.

* Customer Relations:
Build strong relationships with data engineers, analysts, and platform users. Communicate proactively with stakeholders and cross functional teams (Platform, Security, Cloud Ops, Networking, Data Governance) to align priorities, manage expectations, and drive adoption of platform standards.

* Advanced Monitoring and Troubleshooting:
Troubleshoot and resolve performance issues across Databricks jobs, clusters, SQL warehouses, and Azure dependencies. Implement Azure Monitor and Log Analytics based observability with custom dashboards for cluster/job health, driver/executor metrics, and cost insights. Establish proactive alerting and early issue detection via logs/metrics for Databricks and Azure services.

* Site Reliability:
Analyze, triage, and resolve platform issues promptly to achieve SLOs and platform reliability objectives. Drive error budget aware practices, post incident reviews, and resilience engineering (e.g., autoscaling, retry/backoff strategies, policy guardrails).

* Incident Management:
Provide support during major incidents, including after hours support. Lead incident response, communications to users and stakeholders, and root cause analysis with clear action items and follow through.

* Observability Tools Development:
Design, build, and deploy logging/monitoring solutions for early detection and actionable insights. Standardize ingestion to Log Analytics from Databricks (audit logs, cluster events, job runs) and key Azure resources; built dashboards and alert rules to reduce MTTR.

* Release Control Management:
Maintain and enhance the Infrastructure & Platform release pipeline using Terraform, Terraform Cloud, Azure Dev Ops and/or Git Hub Actions, with source control in Git Hub/Bitbucket and artifact promotion via ACR/Artifacts. Enforce approvals, change windows, and automated checks to ensure safe, repeatable releases.

* Client Pipeline Management:
Implement…


Increase/decrease your Search Radius (miles)



Job Posting Language