×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in Bengaluru, 560001, Bangalore, Karnataka, India
Listing for: Pocket FM
Full Time position
Listed on 2026-02-04
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing, Systems Engineer, IT Support
Job Description & How to Apply Below
Location: Bengaluru

Senior Site Reliability Engineer (SRE)
Company:  Pocket FM
About the Role
Pocket FM is a  global audio entertainment platform  serving millions of listeners across multiple geographies. We are looking for an experienced  Senior Site Reliability Engineer (SRE)  to ensure the reliability, scalability, and performance of our large-scale audio streaming platform built on  Kubernetes-first, cloud-native architecture .
In this role, you will own platform stability, improve operational excellence, and work closely with engineering teams to deliver a seamless listening experience to users worldwide.

Key Responsibilities
Reliability & Engineering Excellence
Own and improve the reliability, availability, and performance of  globally distributed, Kubernetes-based production systems .
Define and continuously improve  SLIs, SLOs, and SLAs  using metrics derived from  Prometheus and Grafana .
Drive reliability best practices across the entire software development lifecycle.
Kubernetes & Platform Operations
Operate and scale  production-grade Kubernetes clusters  (EKS/GKE) running critical audio streaming and backend services.
Troubleshoot complex production issues across pods, nodes, networking, storage, and the Kubernetes control plane.
Implement autoscaling, rollout strategies, and resilience patterns for containerized workloads.

CI/CD & Git Ops
Own and improve  CI/CD pipelines  using  Git Hub Actions and Jenkins  to ensure safe, reliable, and repeatable deployments.
Implement and operate  Git Ops workflows using Argo CD  for Kubernetes application and configuration management.
Enforce deployment best practices including canary, blue-green, and rollback strategies.
Observability & Monitoring
Build and maintain a strong observability stack using  Prometheus (metrics), Grafana (visualization), and Loki (logs) .
Design effective alerting strategies that reduce noise and improve signal quality.
Use observability insights to drive performance tuning, capacity planning, and reliability improvements.

Incident Management & Operational Excellence
Lead and participate in incident response for platform, Kubernetes, and database-related issues.
Perform  post-incident reviews (PIRs)  with clear root cause analysis and preventive actions.
Improve on-call readiness, runbooks, and operational maturity for  24x7 global systems .

Databases & State Management
Support and improve reliability of  MySQL  in production, including monitoring, backups, failover, and performance tuning.
Collaborate with backend teams on schema changes, query performance, and scaling strategies.
Infrastructure & Automation
Design and manage cloud infrastructure integrated with Kubernetes using  Infrastructure-as-Code (Terraform) .
Automate operational tasks using  Python and/or Go  to reduce toil and improve system resilience.
Drive cost and capacity optimization across cloud and Kubernetes environments.

Collaboration & Innovation
Work closely with backend, mobile, data, product, and QA teams to embed reliability principles early.
Contribute to Pocket FM’s engineering roadmap with focus on  scale, resilience, and operational efficiency .

Apply modern SRE and cloud-native best practices pragmatically in production.

Required Skills & Experience
Experience
3+ years  of experience in  Site Reliability Engineering or platform engineering roles .
Proven experience operating  large-scale, Kubernetes-based, consumer-facing systems .

Technical Expertise (Must-Have)
Strong hands-on expertise with  Kubernetes  in production environments.

Experience with  Prometheus, Grafana, and Loki  for monitoring, alerting, and logging.
Strong experience with  CI/CD systems  such as  Git Hub Actions and Jenkins .
Hands-on experience with  Git Ops workflows using Argo CD .
Solid experience managing and supporting  MySQL  in production.
Strong experience with  AWS and/or GCP .
Proficiency in  Python and/or Go .
Strong Infrastructure-as-Code experience using  Terraform .
Solid understanding of Linux, networking, and cloud security fundamentals.

Preferred Qualifications
Kubernetes certifications ( CKA / CKAD / CKS ).
Cloud certifications (AWS / GCP).
Experience supporting platforms with  millions of users across multiple regions .
Familiarity with structured incident management practices.

Why Pocket FM?
Pocket FM is a  global product with a rapidly growing international user base , offering the opportunity to work deeply across Kubernetes, observability, and Git Ops while solving complex reliability challenges at scale.
Position Requirements
10+ Years work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary