Senior DevOps Engineer
Listed on 2026-01-12
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Your opportunity
At Schwab, you are empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.
We believe in the importance of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).
The Client Experience Technology team is looking for a talented Senior Dev Ops Engineer to join a small team supporting developers and working collaboratively with multiple agile teams to build, test and deploy new Schwab Investing Technology suite of product offerings.
This role requires a high level of responsibility and accountability yet has a support structure vital for development growth while continuing to build and grow our CET platforms. This is an outstanding opportunity to join a team and company where your talents will have direct impact on company direction, our customers, and our industry.
Our technology stack is built on Java, .Net, and Python - running in AWS and Pivotal Cloud Foundry. You will be involved in all SDLC aspects, from designing CI/CD pipelines, supporting on-prem and cloud environments, to working on various POCs. You will gain a deep understanding of the inner workings of the applications to identify needed monitors and create suitable runbooks.
Your role will require a wealth of knowledge and field-proven experience to deliver on the success of the initiatives.
An experienced Dev Ops/Platform engineer who owns the end‑to‑end CI/CD experience, production readiness, and safe, repeatable releases for containerized workloads on AWS. You move seamlessly between hands‑on diagnostics (e.g., ECS tasks), infrastructure automation (Terraform for app and data services like RDS), and cross‑team orchestration with SRE, Release Management, and feature teams.
What you’ll do- Own CI/CD pipelines from build through promotion and deployment for containerized services; define guardrails, quality gates, and rollout/rollback patterns aligned with SRE and Release Management practices.
- Design, build, and operate AWS infrastructure with Terraform (networking, compute, containers, data services—incl. RDS) using module standards, work spaces/environments, and automated promos.
- Diagnose container runtime issues (e.g., task health, service scaling, deployments) and partner with teams during image promotion windows.
- Embed reliability practices: runbooks, production checks, operational readiness, and joint incident/retro participation with SRE.
- Champion observability and change safety: metrics, logs, alerts, and progressive delivery strategies (feature flags, config changes, DB change playbooks).
- Respond to Alerts and Escalations:
Actively monitor and respond to system alerts and escalations to ensure the stability and reliability of our services. This includes diagnosing and troubleshooting issues in real-time to minimize downtime and impact on users. - System Recovery Events: Lead and coordinate system recovery efforts during incidents. This involves executing recovery procedures, collaborating with cross-functional teams to restore services, and conducting post-incident reviews to identify root causes and implement preventive measures.
Required Qualifications:
- 5+ years in Dev Ops/Platform roles on Linux with AWS depth (networking/VPC, IAM, ECR, ECS, RDS)
- Expert with containers (Docker) and 12‑factor services; hands‑on with ECS (Fargate or EC2) and image promotion workflows
- CI/CD mastery with Git Hub Actions and/or Bamboo (pipeline design, reusable templates, environment promotion, deployment strategies)
- Terraform at scale (modules, policies/guardrails, plan/apply automation, drift detection) for app and data stacks (incl. RDS)
- Strong networking fundamentals; scripting in Bash and Python
- Excellent cross‑functional communication with SRE and Release Management to drive readiness and approvals
- Experience in production change management, including change approval workflows, risk assessment, deployment coordination, and post-deployment monitoring
- Experience with Kubernetes/ECS or PCF/Tanzu to support…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).