Site Reliability Engineer
Listed on 2026-03-10
-
IT/Tech
Cloud Computing, SRE/Site Reliability
Note:
Candidates must be authorized to work in the United States of Atlanta, GA location on a full-time basis without the need for current or future visa sponsorship. Unfortunately we are unable to sponsor visas at this time.
We’re looking for a Senior Site Reliability Engineer who is passionate about Development, Automation, Cloud Infrastructure, and improving reliability at scale
. This role will support 15+ development teams by designing and maintaining AWS infrastructure and deployment pipelines.
If you enjoy solving complex reliability challenges and building scalable cloud platforms, this role is for you.
Site Reliability Engineer Atlanta, GA (ONLY LOCALS)Job Description
This role is for an opening for a Senior Site Reliability Engineer (SRE) on the Manheim Logistics SRE team. The team has currently standardized on a Docker-based infrastructure solution and is adding functionality to support new development team requests and architectural patterns (such as Lambda, Step Functions, Fargate, etc). The SRE team has a strong focus on IaC with Terraform and best practices such as least privilege access, proactive monitoring and alerting, etc.
This role will work directly with a release train and help with IaC and SRE activites such as improving monitoring/alerting, defining an error budget, assisting with Dev Sec Ops , etc.
- Take complex problems and come up with a technically reasonable solution
- Experience working with and defining SLOs, error budgets, etc.
- Have innate curiosity about how things work
- Design and assist in the authoring of software tools that reliably manage application delivery & performance
- Design and assist in the setup and maintenance of application monitoring and alerting
- Engage with engineering teams to ensure best practices are implemented
- Improve predictability and reliability of software releases, workflows, and operating software.
- Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery.
- Bachelor’s degree in Computer Science or related field and at least 8 years working experience
- Expertise in software development and architecture/solutioning experience
- Strong background with Terraform
- Experience with Amazon AWS technologies especially: ECS and Lambda
- Experience with monitoring/observability tools such as:
New Relic, Splunk, Pager Duty - Experience with agile development, continuous integration and automated testing
- Solid written communication, problem solving, and process management skills
- Broad AWS platform skills including Cognito, WAF, Elasticache (Redis), Elasticsearch, SNS, SQS, S3, Systems Manager
- Experience automating Terraform at scale
- Experience with Database Server infrastructure (RDS, MySQL, Postgres, etc)
- Git Hub Actions
- Experience with Github, docker, and Linux adminstration experience
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).