Lead Site Reliability Engineer
Remote / Online - Candidates ideally in
Mission, Johnson County, Kansas, 66201, USA
Listed on 2026-03-05
Mission, Johnson County, Kansas, 66201, USA
Listing for:
Convergys
Full Time, Remote/Work from Home
position Listed on 2026-03-05
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
IND Work-at-Hometime type:
Full time posted on:
Posted Yesterday job requisition :
R1700168
Job Title:
Lead Site Reliability Engineer
Job Description As a Lead Site Reliability Engineer, you’ll play a strategic role in shaping and scaling our Dev Sec Ops ecosystem. You’ll lead the design and implementation of automated CI/CD pipelines, enforce enterprise-grade security and compliance standards, and drive reliability across the entire software delivery lifecycle.
Partnering closely with development and operations teams, you’ll define best practices, optimize deployment workflows, and ensure our applications are resilient, observable, and continuously improving. Your expertise will be key to accelerating innovation while maintaining the highest levels of quality and performance.
Additionally, you will be expected to extensively use and lead the group to adopt AI within the SRE role and domain. The ideal candidate will have a "builder" mindset with strong software engineering skills that can a be "force-multiplier" - you will generate automation and platform code daily, and always looking to improve and build upon what can be imagined, leveraging the latest tools to deliver faster, more efficient, more effective, and more autonomous solutions.
About the Role As a Lead Site Reliability Engineer, you will own the reliability and availability of our production systems. You will champion SRE principles across engineering teams — defining SLOs, managing error budgets, and leading a culture of blameless incident response. This is a hands-on leadership role where you will partner closely with product and engineering teams to balance the pace of innovation with the stability our customers depend on.
Key Responsibilities Reliability Ownership
· Define, implement, and own Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets across critical services.
· Use error budget policies to drive data-informed conversations between engineering and product on release velocity vs. reliability trade-offs.
· Conduct capacity planning and proactive risk assessments to prevent incidents before they occur.
Incident Management
· Lead incident response as incident commander — coordinating teams, driving resolution, and maintaining clear stakeholder communication during outages.
· Facilitate thorough, blameless postmortems and ensure action items are tracked, prioritized, and resolved.
· Develop and continuously improve runbooks, escalation paths, and on-call practices to reduce MTTD and MTTR.Observability & Monitoring
· Design and maintain observability strategies using modern tooling (Prometheus, Grafana, Open Telemetry, ELK) to ensure full visibility into system health.
· Define intelligent alerting that is actionable and minimizes alert fatigue.
· Drive adoption of distributed tracing and structured logging across services.
Toil Reduction & Automation
· Identify and measure toil across the engineering organization and lead initiatives to eliminate it through automation.
· Build internal tooling and self-service capabilities that improve developer productivity and system reliability.
Infrastructure & Platform Reliability
· Collaborate with platform and infrastructure teams on cloud-native patterns for fault tolerance, auto-scaling, and disaster recovery.
· Provide SRE input into CI/CD pipelines and deployment strategies (e.g., canary releases, blue/green deployments) to minimize production risk.
· Manage infrastructure using IaC practices (Terraform or equivalent) with a focus on reliability and consistency.
Leadership & Culture
· Mentor and grow junior SREs, fostering a culture of ownership, curiosity, and continuous improvement.
· Act as an SRE advocate across engineering — embedding reliability thinking into the software development lifecycle.
· Partner with key stakeholders to align SRE strategy with broader organizational goals.
· Conduct regular 1:1s with direct reports and participate in team rituals.
AI Expectations As with all engineers at our organization, this role requires an AI-native mindset. Specifically, you will be…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×