×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in Bengaluru, 560001, Bangalore, Karnataka, India
Listing for: Tricog Health
Full Time position
Listed on 2026-02-04
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
Location: Bengaluru

RESPONSIBILITIES
Operate and optimize Kubernetes-based infrastructure using HELM/ kustomize for deployment and configuration management.
Build and maintain CI/CD pipelines for infrastructure and application deployments.
Manage and monitor cloud infrastructure on AWS (EKS, EC2, S3, IAM, VPC, etc.). and on premise infrastructure
Ensure observability through logging, monitoring, and alerting systems (e.g., Prometheus, Grafana, Cloudwatch, Data Dog ).
Implement and enforce security best practices across infrastructure components.
Participate in on-call rotations, incident response, and root cause analysis.
Support scaling of systems to meet demand while maintaining reliability.
Collaborate with engineering and security teams on architecture and deployment strategies.
Ensure the implementation of security standards and compliance requirements across all operational aspects of the cloud platforms.

MUST HAVE SKILLS
3 - 6+ years of hands-on experience in SRE roles
2 - 4+ years of managing production  Kubernetes environments
Currently operating  production EKS clusters  (hands-on, not observational)
Deep expertise in  Kubernetes (EKS or self-managed) and Helm
Strong understanding of  networking fundamentals: TCP/IP, DNS, VPNs, firewalls, load balancing
Practical experience with  AWS services: EKS, EC2, IAM, S3, Cloud Watch, VPC
Solid exposure to containerization (Docker) and CI/CD pipelines (e.g., Bitbucket Pipelines, Git Hub Actions, ArgoCD, Flux CD)
Proven experience handling production systems, on-call rotations, and real-time incident response
Proficiency in at least one programming language (Python or Go preferred)
Clear understanding of the  Software Development Life Cycle (SDLC)
Strong automation mindset with a bias toward eliminating manual toil
Ability to  build and maintain Grafana dashboards using PromQL  (or equivalent)
Strong grasp of SRE principles: SLIs, SLOs, error budgets, incident and post-incident management

NICE TO HAVE
Experience in regulated industries (healthcare,fintech).

Experience with incident management and disaster recovery.

QUALIFICATIONS/EXPERIENCE
Minimum of 3 years with 2+ years of SRE experience.
BTech/BE/BS or MTech/MCA/ME/MS
2+ years of work experience with Amazon Web Services (AWS)
2+ years of work experience with Kubernetes
2+ years of work experience with Site Reliability Engineering
Working in a hybrid setting

WHAT DAY TO DAY LOOKS LIKE
Monitoring Service-Level Indicators (SLIs)
Setting Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs)
Responding to Incidents
Writing Postmortems
Automating System Tasks
Cross-Department Collaboration
Building Software for Dev Ops, SRE, and Support Teams
Fixing Support Escalation Issues
Optimizing On-Call Rotations and Processes
Documenting "Tribal" Knowledge
Conducting Post-Incident Reviews
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary