×
Register Here to Apply for Jobs or Post Jobs. X

Principal Site Reliability Engineer; SRE

Job in Charlotte, Mecklenburg County, North Carolina, 28245, USA
Listing for: Ally Financial
Full Time position
Listed on 2026-03-03
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Principal Site Reliability Engineer (SRE)

General information Career area Technology Work Location(s) 601 S. Tryon Street, NC Remote? No  # 21735 Posted Date 02-26-26 Working time Full time Ally and Your Career Ally Financial only succeeds when its people do - and that’s more than some cliché people put on job postings. We live this stuff! We see our people as, well, people - with interests, families, friends, dreams, and causes that are all important to them.

Our focus is on the health and safety of our teammates as well as work-life balance and diversity and inclusion. From generous benefits to a variety of employee resource groups, we strive to build paths that encourage employees to stretch themselves professionally. We want to help you grow, develop, and learn new things. You’re constantly evolving, so shouldn’t your opportunities be, too?

Work

Schedule

Ally designates roles as (1) fully on-site, (2) hybrid, or (3) fully remote. Hybrid roles are generally expected to be in the office a certain number of days per week as indicated by your manager. Your hiring manager will discuss this role's specific work requirements with you during the hiring process. All work requirements are subject to change at any time based on leader discretion and/or business need.

Opportunity

At Ally, you get a startup feel, but experience the benefits of a company that’s worked out the kinks and is fulfilling its purpose. We’re always evolving and see that as a good thing. From owning our work to seeing its impact in the real world, our team is relentless in finding new ways technology can help make experiences better and help people.

We are problem solvers, we value diverse thinking, we support one another, and we challenge ourselves to think bigger in the journey to deliver customer‑obsessed tech solutions.

At this time, Ally will not sponsor a new applicant for employment authorization for this position.

The Work Itself
  • Design and implement highly available, scalable infrastructure systems that support mission‑critical production services, including automated deployment pipelines, observability platforms, and disaster recovery
  • Lead incident response and postmortem processes, diving deep into complex distributed systems failures to identify root causes and drive systemic reliability improvements across engineering teams
  • Develop and maintain service level objectives (SLOs) and error budgets, using data‑driven approaches to balance feature velocity with system reliability and guide organizational decision‑making
  • Build tooling and automation to eliminate toil, improve operational efficiency, and enable engineering teams to safely deploy and operate services with minimal manual intervention
The Skills You Bring

Minimum Qualifications
  • 7+ years of relevant experience
  • Bachelor's degree in relevant field(s) of study or equivalent
Preferred Qualifications
  • 5+ years of experience in site reliability engineering, systems engineering, or Dev Ops roles with a proven track record of maintaining large‑scale production systems
  • Deep expertise in cloud AWS including infrastructure as code tools like Terraform, Cloud Formation, or Pulumi
  • Experience defining and measuring SLIs, SLOs, and error budgets, and using them to drive reliability improvements and inform product decisions
  • Proficiency in AI development
  • Strong programming skills in languages such as Python, Go, or Node with the ability to write production‑quality code for automation, tooling, and system integration
  • Extensive experience with container orchestration (ECS or similar) and microservices architectures in production environments
  • Proficiency with observability and monitoring tools such as:
    Dynatrace, Prometheus, Grafana, Datadog, New Relic, or similar and experience building comprehensive monitoring and alerting systems
  • Solid understanding of networking concepts, load balancing, CDNs, DNS, and distributed systems principles including consensus algorithms and failure modes
  • Hands‑on experience with CI/CD pipelines and Git Ops workflows using tools like Jenkins, Git Hub Actions, ArgoCD, or CircleCI
  • Strong incident management and troubleshooting skills with the ability to quickly diagnose and resolve complex production issues…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary