Site Reliability Engineer Job Delhi area,Delhi India,IT/Tech

Job Title:

SRE / Dev Ops Engineer

Location:

Remote (India)

Experience:

5+ Years

Role Overview :
We are looking for an experienced Site Reliability / Dev Ops Engineer to design, build, and operate highly reliable, scalable, and secure cloud infrastructure on Google Cloud Platform (GCP) . You will work closely with development, platform, and security teams to ensure high availability, performance, and continuous delivery of cloud-native microservices.
This role demands deep expertise in Kubernetes (GKE) , Terraform , CI/CD automation , and observability , along with strong troubleshooting and communication skills.

Key Responsibilities:

Design, implement, and manage highly available GCP infrastructure using Terraform (IaC).
Build and operate Kubernetes (GKE) clusters, including deployments, ingress, autoscaling, and Helm-based releases.
Develop and maintain CI/CD pipelines using Git Hub Actions and Google Cloud Build .
Implement SRE best practices : SLIs, SLOs, SLAs, error budgets, and incident response.
Containerize and deploy microservices using Docker and Kubernetes.
Implement monitoring, logging, and observability using Cloud Monitoring, Cloud Logging, Prometheus, and Grafana.
Troubleshoot production issues, perform root cause analysis, and drive permanent fixes.
Manage networking and security including VPCs, load balancers, DNS, SSL/TLS, firewalls, IAM, and VPNs.
Collaborate with engineering teams to improve system reliability, performance, and scalability.
Automate operational tasks using Python, Bash, or Go .
Participate in on-call rotations and incident management processes.

Required Skills &

Qualifications:

5+ years of experience as an SRE / Dev Ops / Cloud Engineer .
Strong hands-on experience with Google Cloud Platform (GCP) :
Compute Engine, GKE, Cloud Functions
Cloud Storage, VPC, IAM
Cloud Logging & Cloud Monitoring
Expert-level Kubernetes experience (preferably GKE ):
Deployments, Services, Ingress
Autoscaling (HPA)
Helm charts
Strong experience with Terraform for Infrastructure as Code.
Proven experience building CI/CD pipelines using Git Hub Actions and Cloud Build .
Strong understanding of Docker, containers, microservices , and service mesh concepts .

Experience with observability tools :
Stackdriver (Cloud Ops), Prometheus, Grafana
Solid understanding of networking & cloud security :
Load balancers, DNS, SSL
VPNs, firewalls, IAM best practices
Hands-on scripting experience in Python, Bash, or Go .
Excellent problem-solving, debugging, and communication skills .

Nice to Have:

Experience with service mesh (Istio, Linkerd).

Experience with SRE metrics and reliability engineering practices.
Knowledge of cost optimization (Fin Ops) on GCP.
Experience working in remote, globally distributed teams .


Increase/decrease your Search Radius (miles)



Job Posting Language