Lead Production Support Analyst
Listed on 2026-03-01
-
IT/Tech
IT Support
Job Family
IT Operations
About UsAt Transamerica, hard work, innovative thinking, and personal accountability are qualities we honor and reward. We understand the potential of leveraging the talents of a diverse workforce. We embrace an environment where employees enjoy a balance between their careers, families, communities, and personal interests. We believe everyone deserves to live their best life.
Job Description SummaryThe Lead Production Support & Operations role is responsible for end-to-end production support management for a defined line of business (Individual Solutions and WFG) ensuring availability, stability, performance, and operational excellence for business‑critical applications and services. This Lead oversees a vendor/contractor production team, drives incident/problem/change rigor, and delivers measurable improvements through automation, monitoring enhancements, and operational standardization. This role is preferred to be hands‑on (or strongly technically fluent) with the ability to guide triage, diagnose complex issues across application/infrastructure/database layers, and partner effectively with engineering, infrastructure, security, and business stakeholders.
ResponsibilitiesOperational & Production Support Leadership
- Lead day‑to‑day production support operations for Individual Solutions & WFG applications/services, ensuring high availability, performance, and stability.
- Act as the accountable owner for the production support operating model, including L1/L2/L3 routing, on‑call rotations, escalation paths, and SLAs/SLOs.
- Oversee and coach a vendor/contractor support team, ensuring quality execution, clear accountability, and consistent outcomes across shifts/time zones.
- Own application onboarding into production support: ensure runbooks, SOPs, architecture diagrams, support metrics, monitoring/alerting, access, and DR/backup readiness are complete and current.
- Establish operational readiness standards across logging, monitoring, access controls, backup, disaster recovery, and maintenance windows.
- Manage vendor performance (tickets, SLAs, MTTR, quality of RCAs, repeat incidents, documentation hygiene) and drive continuous service improvement.
- Run recurring vendor governance: operational reviews, KPI scorecards, backlog prioritization, and corrective action plans.
- Coordinate with third‑party providers for escalations, service requests, planned maintenance, patching, and production changes.
- Serve as the primary escalation point for high‑severity incidents; lead war rooms/bridge calls and drive timely resolution with strong communication.
- Ensure Root Cause Analysis (RCA) and Post‑Incident Reviews (PIRs) are completed with actionable remediation, prevention plans, and measurable follow‑through.
- Drive problem management: identify patterns and recurring issues using incident history, logs, and metrics; reduce repeat incidents through permanent fixes.
- Oversee change/release execution to minimize production risk: pre‑change validation, approvals, rollback plans, post‑release monitoring, and “go/no‑go” decision support.
- Ensure adherence to ITSM processes and audit‑ready evidence for incident/change/problem workflows.
- Improve detection and response through dashboards, health checks, distributed tracing/APM, synthetic monitoring, and log correlation.
- Tune alerting to reduce noise and improve signal‑to‑noise; implement event correlation to prevent alert storms.
- Partner with engineering and platform teams to define/track error (where applicable), and reliability improvements.
- Proactively identify opportunities for automation (self‑healing, auto‑remediation, runbook automation, standardized scripts) that reduce toil and improve MTTR.
- Drive operational standardization: repeatable onboarding, consistent runbooks, automated checks, and common monitoring patterns.
- Lead initiatives focused on reducing incident volume, shortening recovery times, improving release quality, and removing manual steps from common procedures.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).