Senior DevOps Engineer/Platform Reliability Lead Job Kolkata area,West Bengal India,IT/Tech

Position: Senior DevOps Engineer / Platform Reliability Lead
Senior Dev Ops Engineer / Platform Reliability Lead

Exp : 10-12+ years
Location : Kolkata

Role Overview
We are seeking a Senior Dev Ops Engineer / Platform Reliability Lead who can take an end-to-end view of our systems, identify improvement areas across architecture, infrastructure, deployment pipelines, and reliability, and guide the platform toward higher scalability, stability, and operational maturity.
This role requires strong system thinking, sound architectural judgment, and the ability to clearly call out risks and improvements.

Key Responsibilities
Review the complete backend ecosystem (Node.js, Golang services, cloud infrastructure, CI/CD).
Identify architectural, scalability, reliability, and security gaps post in-house migration.
Recommend and prioritise short-term fixes and long-term platform improvements.
Own containerized infrastructure using Docker and Kubernetes in production.
Design and maintain robust CI/CD pipelines with safe deployment and rollback strategies.
Implement and improve monitoring, logging, alerting , and incident response practices.
Define and track meaningful SLIs, SLOs, and error budgets.
Prepare systems for OTT traffic spikes during releases and live events.
Improve caching, queuing, and backend performance in collaboration with backend teams.
Drive secure access, secrets management, and cloud cost optimisation.
Act as a technical partner to backend, product, and leadership teams.

Required Technical Skills
Cloud & Infrastructure
Strong experience with AWS (EC2, EKS/ECS, S3, RDS/Dynamo

DB, IAM)
Docker and Kubernetes (production environments)
Infrastructure as Code – Terraform (preferred)
CI/CD & Operations
Git Hub Actions / Git Lab CI / Jenkins
Blue-green / canary deployments and rollback strategies
Backend Awareness
Node.js (Express / NestJS level understanding)
Golang (microservices, concurrency, profiling basics)
Observability
Prometheus, Grafana
Centralised logging (ELK / Open Search / Loki)
Distributed tracing (Jaeger / Open Telemetry)
Data, Cache & Messaging
Redis (cache and/or queues)
Kafka / SQS / Rabbit

MQ (deep experience with at least one)
Mongo

DB (understanding of No-SQL DBs, bonus if experienced with Atlas offerings)
Security & Reliability
Secrets management (Vault / AWS Secrets Manager)
IAM and least-privilege access design
Production incident handling experience

Personality & Mindset
Strong ownership and accountability for platform reliability.
Comfortable identifying what is wrong and explaining how to fix it.
Calm and structured during incidents and high-pressure situations.
Clear communication with engineers and non-technical stakeholders.
Systems thinker who understands end-to-end impact, not just isolated components.
Pragmatic, data-driven, and collaborative.

Reach out to : sushim / shirin


Increase/decrease your Search Radius (miles)



Job Posting Language

Senior DevOps Engineer​/Platform Reliability Lead

Senior DevOps Engineer/Platform Reliability Lead