×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in 500001, Hyderabad, Telangana, India
Listing for: Sonata Software
Full Time position
Listed on 2026-02-05
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability, Data Engineer, AWS
Job Description & How to Apply Below
Role:
Site Reliability Engineer

Location:

Hyderabad
Notice Period:
Immediate to 20 Days

Employment Type:

Full Time
Experience
7–12 years in  site reliability, cloud-based data infrastructure, data pipeline observability, automation, and high-availability engineering  within  EdTech platforms (2U)
Primary Skills (Must-Have)
AWS, CI/CD, Jenkins, IAAC, Terraform, Kubernetes
Secondary Skills (Good-to-Have)
AWS systems;
Dataiku data, Platform updates and patching
Tools & Platforms
Data Warehousing & Processing:
Snowflake, Redshift, Apache Airflow, dbt
CI/CD & Deployment:
Jenkins, Git Hub Actions, AWS Code Pipeline, Terraform
Cloud & Event Processing: AWS Lambda, API Gateway, SNS/SQS, Kafka, Step Functions
Monitoring & Logging:
Data Dog, AWS Cloud Watch, Prometheus, Splunk
Incident Management:
Pager Duty, Opsgenie, AWS Health Dashboard
Collaboration & Code Review:
Git Hub, Jira, Confluence

Key Responsibilities
Data Pipeline Reliability & Observability:
- Maintain and optimize   highly available, fault-tolerant infrastructure  for  data pipelines, ETL jobs, and real-time data processing
- Implement  end-to-end monitoring of Airflow DAGs, Snowflake queries, and AWS-based data workflows
- Automate  data pipeline health checks, error handling, and auto-remediation strategies

Infrastructure & Cloud Automation:
- Deploy and manage   AWS-based data infrastructure using Terraform and Cloud Formation
- Optimize  Kubernetes (EKS) clusters  for processing large-scale datasets and real-time analytics
- Ensure  high availability and cost-efficient scaling  for  Redshift, Snowflake, and data storage solutions

Performance, Monitoring & Incident Response:
- Implement  real-time monitoring, logging, and alerting  using  Data Dog, AWS Cloud Watch, and Prometheus
- Define and track   SLOs, SLIs, and error budgets  to improve data reliability and uptime
- Conduct  Root Cause Analysis (RCA), security audits, and post-mortems for incidents

Security & Compliance:
- Ensure  GDPR, CCPA, and SOC 2 compliance  for  data storage, access controls, and retention policies
- Implement  AWS security best practices (IAM, KMS, Shield, WAF) to secure data access and encryption
- Secure  API gateways, authentication mechanisms, and data lake permissions  to prevent unauthorized access

Collaboration & Leadership:
- Work closely with   data engineers, analytics teams, and Dev Ops engineers  to enhance data platform reliability
- Participate in   incident response drills, disaster recovery planning, and security compliance reviews
- Advocate for  best practices in automation, cost optimization, and cloud-native data solutions
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary