Senior Site Reliability Engineer
Listed on 2026-01-15
-
IT/Tech
Cloud Computing, SRE/Site Reliability, Systems Engineer, IT Support
This range is provided by Optomi. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range$60.00/hr - $72.00/hr
Direct message the job poster from Optomi
Site Reliability Engineer – Applications & Domains (Sr. / Lead)
Optomi is seeking a Site Reliability Engineer (SRE) to join a newly formed SRE organization supporting application domains. This role will focus on improving application reliability, performance, availability, and observability across critical platforms. The ideal candidate brings strong automation skills, cloud-native experience, and the ability to partner closely with development, infrastructure, and operations teams to embed SRE best practices across the application lifecycle.
This opportunity requires four days onsite per week in Plano, TX.
- Building a new SRE function from the ground up within a large enterprise environment!
- Driving application reliability and performance across mission-critical domains!
- Partnering closely with application development, platform, and operations teams!
- Designing and implementing automation to reduce toil and improve system health!
- Working with modern CI/CD pipelines and cloud-native infrastructure!
- Leading major incident response and driving long-term reliability improvements!
- Influencing architecture decisions with a focus on scalability and resilience!
- 6-7 years of experience for Senior SRE (JL17) or 8-10 years for Lead SRE (JL18).
- Strong experience regardless of roles in Site Reliability Engineering, Dev Ops, or Production Engineering.
- Hands‑on experience with CI/CD pipelines using Git Hub and Harness.
- Strong experience building infrastructure as code using Terraform (AWS preferred).
- Proficiency in Python scripting for automation, tooling, and operational runbooks curves.
- Experience managing container orchestration platforms such as Kubernetes / EKS.
- Deep understanding of cloud platforms (AWS preferred; GCP or Azure acceptable).
- Experience designing and improving observability using tools such as Dynatrace and Cloud Watch.
- Strong knowledge of SRE concepts including SLOs, SLAs, error budgets, and reliability metrics.
- Excellent communication skills with the ability to collaborate across technical and business teams.
- Design, build,ത്തിലുള്ള automation to streamline operations and reduce manual effort.
- Partner with observability engineers to provide actionable insights into system health and performance.
- Ensure scalable, repeatable application deployments using CI/CD and infrastructure as code.ҟьаны möchte
- Develop automation and operational tooling primarily using Python.
- Manage and support containerized application environments and cloud-native services.
- Define, implement, and track SLOs, SLAs, and error budgets aligned to business objectives.
- Design and implement monitoring and observability enhancements using Dynatrace and Cloud Watch.
- Lead major incident response efforts and coordinate resolution with key stakeholders.
- Conduct blameless post‑incident reviews and drive corrective and preventative actions.
- Collaborate cross‑functionally to embed SRE principles into application design and delivery.
- Participate in architecture reviews with a focus on reliability, scalability, and resilience.
- AWS certifications (Dev Ops Engineer, Solutions Architect, or similar).
- Experience with Git Ops practices, secrets management, and infrastructure monitoring best practices.
- Experience building self‑healing systems and automated remediation workflows.
Mid‑Senior level
Employment typeFull‑timeាព>
Job functionIT Services and IT Consulting
Referrals increase your chances of interviewing at Optomi by 2x
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).