Senior Site Reliability Engineer - Cloud and Data Center Services SRE
Listed on 2026-01-16
-
IT/Tech
Cloud Computing, Systems Engineer
Job Description:
At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. We do this by driving Responsible Growth and delivering for our clients, teammates, communities and shareholders every day.
Being a Great Place to Work is core to how we drive Responsible Growth. This includes our commitment to being an inclusive workplace, attracting and developing exceptional talent, supporting our teammates’ physical, emotional, and financial wellness, recognizing and rewarding performance, and how we make an impact in the communities we serve.
Bank of America is committed to an in‑office culture with specific requirements for office‑based attendance and which allows for an appropriate level of flexibility for our teammates and businesses based on role‑specific considerations.
At Bank of America, you can build a successful career with opportunities to learn, grow, and make an impact. Join us!
Cloud and Data Center Services (CDS)Cloud and Data Center Services (CDS) team in the CTO organization offers Private and Public Cloud platforms for Bank of America's developers to drive faster time‑to‑market, innovation with private and public cloud capabilities, and reduce complexity with built‑in integrations. We believe in a high‑quality engineering culture to deploy our platforms with customer first mindset, design for large enterprise scale and resilience, and accelerate market innovation into the technical platforms we deliver.
As part of this team, you will have a large impact on the evolution of next generation Cloud services for Bank of America and explore an extensive list of new technologies that will drive innovation across our company.
We are seeking an experienced Senior Site Reliability Engineer (SRE) with deep expertise in automation, Infrastructure as Code, external Cloud Service Providers (CSPs) like AWS, Azure and GCP, Open Shift/AKS Container solutions and Image Management platforms to support and administrate products within our Foundational Services organization.
Our Foundational Services Site Reliability Engineers (SRE) ensure that our Platform meets the reliability and uptime requirements of our demanding enterprise customers. This is achieved with the best engineering practices and resilient design and through a well‑defined and effective global on‑call rotation that runs 24x7.
The role provides opportunity to work with wide range of technologies and unique perspective on how various services (on‑prem/off‑prem) interact with each other. You will work with colleagues that are as smart, hardworking, and driven as you. You will get an opportunity to work in a team that keeps growing, innovating, and giving you room to be proactive and creative.
Are you ready for the next step in your career? Then we’d love to hear from you!
Position Summary:- Responsible for reliability and support of Foundational Services Platforms and Tools oriented for both on‑premises and external clouds (Azure / AWS / GCP)
- Design and build the solutions for non‑functional requirements of the platforms including monitoring and resiliency
- Proactively monitor and troubleshoot environment performance issues, connectivity issues, security issues, etc.
- Perform deep dives into systemic and latent reliability issues, incident management, problem management
- Identify, analyze, and resolve infrastructure vulnerabilities and application deployment issues.
- Perform blameless RCA, partner with product engineering and operations teams across the organization to establish sustainable fixes
- Responsible for application onboarding and provide troubleshooting support through the lifecycle of the tools and platforms
- Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence
- Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities
- Be a key stakeholder in the design of cloud services and collaborate with architecture, engineering, operations and product teams
- Participate in 24x7 on‑call coverage providing L3 platform support, including maintaining the schedule for other personnel
R…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).