Senior Site Reliability Engineer Job Ahmedabad area,Uttar Pradesh India,IT/Tech

The Senior Site Reliability Engineer is responsible for the availability, performance, serviceability, and recoverability of production systems supporting flight operations, maintenance, and compliance workflows.
This role owns production reliability outcomes as systems scale, migrate, and evolve within regulated aviation environments.
Job Title :
Senior Site Reliability Engineer   Experience Required : 5+ Years   Location : Pune/ Ahmedabad   Educational

Qualification:

Bachelor's degree in Computer Science, Software Engineering, or a related field

Roles and Responsibilities
Reliability Ownership and Service Health    Own availability, latency, throughput, and durability for production systems
Define and maintain service level indicators and service level objectives
Manage error budgets to guide engineering and operational decisions
Ensure reliability targets are met consistently
Production Architecture and Resilience    Design and operate highly available multi availability zone and multi region architectures
Ensure controlled and observable failure behavior
Define redundancy, graceful degradation, and automated recovery strategies
Validate failover and recovery through testing
Incident Response and Operational Maturity    Lead response to production incidents
Own root cause analysis focused on systemic contributors
Drive remediation actions to completion
Reduce incident frequency, severity, and blast radius over time
Observability and Operational Insight    Design centralized logging, metrics, alerting, and dashboards
Define observability standards tied to customer impact
Ensure alerts are actionable and low noise
Use operational data for capacity planning and scaling decisions
Automation and Toil Reduction    Identify and eliminate manual or repetitive operational tasks
Build automation to reduce operational risk
Standardize operational workflows
Treat simplicity as a reliability requirement
Data and Database Reliability    Own production database reliability
Design replication, backup, restore, and failover strategies
Validate recovery procedures regularly
Lead migrations to managed cloud databases such as AWS RDS or Aurora

Technical

Qualifications:

Cloud and Infrastructure    Hands on experience operating production systems on AWS or Azure
Strong understanding of networking, IAM, load balancing, and managed services
Ability to balance cost, reliability, and operational complexity
Distributed Systems    Experience operating distributed systems in production
Strong understanding of partial failure and recovery patterns
Ability to diagnose cross stack production issues
Observability and Operations

Experience with centralized logging, metrics, and alerting
Ability to design alerts based on service impact
Experience driving improvement from operational data
Programming and Automation    Strong scripting skills using Python, Node.js, or shell
Ability to write production grade operational tooling
Comfort modifying application code to improve reliability
Databases    Experience operating relational databases in production

Experience with replication, backup, restore, and failover
Experience migrating legacy databases to managed services preferred

Preferred Experience
Experience in regulated or safety critical industries such as aviation
Familiarity with compliance, auditability, and traceability requirements
Experience supporting systems with direct operational impact


Increase/decrease your Search Radius (miles)



Job Posting Language