Job Description & How to Apply Below
Location:
Hyderabad
Work Mode: Hybrid (3 Days Office)
Experience:
12–18 Years
Notice Period: Immediate – 30 Days
Role Overview
We are looking for a seasoned Reliability Engineering Lead to drive reliability strategy, incident excellence, automation maturity, and observability across enterprise digital platforms. This role blends deep technical expertise with governance leadership and is ideal for someone who can translate reliability engineering into measurable business outcomes such as revenue impact, operational efficiency, and user safety.
You will act as the process owner for reliability frameworks , ensuring systems remain resilient, compliant, scalable, and optimized while enabling engineering velocity.
Key Responsibilities
1. Service Reliability & SLO Framework
Define and implement SLIs/SLOs aligned with business impact and operational requirements.
Drive SLO-based decision making for releases, prioritization, and incident response.
Establish error budget frameworks balancing feature velocity and system reliability.
Build reliability governance aligned with regulatory frameworks (GxP, SOX, etc.).
Translate technical metrics into business-level insights and executive reporting.
2. Incident Management & Learning Culture
Lead structured incident command processes for critical outages.
Facilitate blameless postmortems to improve systems and foster psychological safety.
Build and maintain incident learning repositories for organizational knowledge sharing.
Implement proactive monitoring systems to detect issues before user impact.
3. Automation & Toil Reduction
Maintain operational toil below 50% workload through automation initiatives.
Identify and eliminate repetitive tasks using cost-benefit prioritization.
Deliver engineering improvements that enhance performance and reliability quarterly.
Develop self-service documentation, runbooks, and automation tooling.
4. Platform Engineering & AI Reliability
Design reliability frameworks for AI/ML workloads and data pipelines .
Partner with platform teams to embed reliability into internal developer platforms (IDPs) .
Support enterprise-scale agentic systems with reliability and compliance alignment.
Improve CI/CD reliability and infrastructure-as-code practices.
5. Observability & Performance Engineering
Implement full-stack observability across metrics, logs, traces, and business KPIs .
Conduct performance engineering, capacity planning, and bottleneck analysis.
Deploy intelligent monitoring systems with predictive alerting and root cause insights.
Enable cross-system monitoring across cloud, on-prem, and legacy environments.
6. Security & Compliance Alignment
Integrate reliability practices with Dev Sec Ops and compliance frameworks .
Automate compliance checks, audit trails, and reporting.
Perform reliability impact assessments for regulated systems.
Design and validate disaster recovery strategies aligned with business and regulatory requirements.
Mandatory Qualifications
12–18 years of experience in SRE, platform engineering, or reliability engineering .
Proven experience designing enterprise-scale reliability frameworks.
Strong expertise in:
SLO/SLI design
Observability platforms
Incident management
Automation strategies
Hands-on knowledge of distributed systems, cloud platforms, and infrastructure reliability.
Experience working within regulated environments or compliance-driven systems.
Strong stakeholder communication and leadership capabilities.
Why This Role
Strategic leadership opportunity with organization-wide impact.
Ownership of reliability strategy for mission-critical platforms.
High visibility with senior leadership and cross-functional teams.
Ability to influence platform architecture, delivery velocity, and engineering culture.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×