×
Register Here to Apply for Jobs or Post Jobs. X

Head of Observability and Monitoring

Job in Dallas, Dallas County, Texas, 75215, USA
Listing for: Truist
Full Time position
Listed on 2026-01-18
Job specializations:
  • IT/Tech
    Cybersecurity, IT Support
Job Description & How to Apply Below

ESSENTIAL DUTIES AND RESPONSIBILITIES Technical Leadership & Expertise

Develop and execute a comprehensive observability strategy, integrating logging, metrics, and distributed tracing across the Bank’s technology stack.

Lead the design and deployment of monitoring platforms, ensuring real-time visibility into system performance, availability, and security threats.

Own the end-to-end observability architecture, including tools selection, automation, and integration with cloud, on-prem, and hybrid environments.

Drive the adoption of AI / ML-powered monitoring to enhance anomaly detection, predictive analytics, and automated incident response.

Ensure robust service level indicators (SLIs), service level objectives (SLOs), and error budgets are established and tracked for critical services.

Strategic Planning & Governance

Define and implement observability governance frameworks, ensuring compliance with regulatory requirements (e.g., FFIEC, OCC, Basel III, GDPR).

Develop strategies to support real-time monitoring, root cause analysis, and proactive remediation to minimize downtime and business impact.

Partner with engineering, security, business unit, risk, and compliance teams to align observability initiatives with operational stability and performance targets, continuity and disaster recovery plans.

Champion operational resilience by ensuring monitoring covers end-to-end customer journeys, critical business services, and third-party dependencies.

Establish and maintain a centralized observability platform, standardizing logging and metrics collection across microservices, APIs, databases, and infrastructure.

Collaboration & Stakeholder Management

Work closely with platform teams to embed observability best practices into CI / CD pipelines and software development life cycles.

Partner with Cybersecurity to integrate security monitoring, anomaly detection, and threat intelligence into observability solutions.

Engage with business and operations teams to ensure monitoring capabilities support customer experience, regulatory reporting, and incident management.

Serve as the Bank’s SME on observability, engaging with industry forums, vendors, and regulatory bodies to stay ahead of trends and compliance needs.

Technical Skills

Proven expertise in modern observability stacks, including Splunk, Dynatrace, App Dynamics, Thousand Eyes, Service Now AIOps or Datadog.

Deep understanding of cloud-native monitoring across AWS, Azure, and Google Cloud, including serverless, Kubernetes, and container-based architectures.

Strong hands-on experience with log aggregation, tracing (Jaeger, Zipkin), and APM (Application Performance Monitoring).

Knowledge of AI-driven monitoring, automated remediation, and self-healing infrastructure.

Familiarity with SIEM tools and security monitoring, ensuring alignment with SOC and threat detection capabilities.

Experience in API monitoring, network telemetry, and database performance tuning.

Leadership & Strategic Experience

10+ years of experience in observability, monitoring, or infrastructure resilience roles within regulated financial services or banking environments.

Proven track record of designing and implementing enterprise-scale observability platforms in a complex, multi-cloud environment.

Experience leading cross-functional teams to drive cultural adoption of observability and monitoring best practices.

Strong knowledge of regulatory and compliance requirements related to operational resilience, incident management, and monitoring.

Soft Skills & Collaboration

Ability to translate complex technical monitoring data into actionable insights for senior executives and non-technical stakeholders.

Strong problem-solving skills with a proactive and forward-thinking approach to technology and resilience.

Excellent communication and leadership abilities, fostering collaboration across engineering, risk, and business teams.

Compliance and Regulatory Knowledge

In-depth understanding of compliance in regulated industries (e.g., financial services, healthcare).

Experience working with audit and risk management processes.

Stakeholder Engagement & Communication

Facilitate collaboration between application,…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary
Learn4Good is currently undergoing necessary server maintenance.
We hope to have the Login & Registration options back in 5 minutes, and apologize for any inconvenience.