Site Reliability Engineer Job Atlanta area,Georgia USA,IT/Tech

Title:

Site Reliability Engineer

Atlanta, GA

Duration: 12 months

Site Reliability Engineer (SRE) with AWS Cloud and Application Monitoring Experience

We are seeking a skilled Site Reliability Engineer (SRE) with expertise in AWS cloud infrastructure and robust application monitoring capabilities.

As an integral part of our team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems and applications.

Responsibilities:

* Implement, improve monitoring, alerting, and logging solutions to detect and respond to incidents.

* Collaborate closely with development team to deploy applications and services and ensure they meet reliability and performance standards.

* Automate deployment, configuration management, and troubleshooting processes to streamline operations.

* Participate in on-call rotation and triage production incidents, lead RCAs, and implement preventive actions.

* Conduct capacity planning and performance analysis to handle growing user traffic and data volume effectively.

* Establish and enforce best practices for security, monitoring, and disaster recovery.

* Continuously evaluate and implement new technologies to optimize infrastructure efficiency and reliability.

Requirements:

* Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent work experience.

* Proven experience as a Site Reliability Engineer or similar role, with a strong focus on AWS cloud infrastructure.

* Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, Cloud Formation).

* Hands-on experience with monitoring tools such as Cloud Watch, Sumo Logic, Dynatrace, Grafana, or similar for application performance monitoring and alerting.

* Proficiency in scripting and automation (e.g., Python, Bash) to build and maintain deployment pipelines and infrastructure.

* Strong analytical and troubleshooting skills to diagnose and resolve complex infrastructure and application, data issues.

* Experience with containerization (Docker, Kubernetes) and serverless architecture (AWS Lambda).

* Familiarity with CI/CD pipelines and version control systems (Git) for continuous integration and deployment.

* Excellent communication skills and ability to collaborate effectively with cross-functional teams.

AWS Certification is plus.


Increase/decrease your Search Radius (miles)



Job Posting Language