×
Register Here to Apply for Jobs or Post Jobs. X

Senior DevOps Engineer​/Platform Reliability Lead

Job in Kolkata, West Bengal, India
Listing for: hoichoi
Full Time position
Listed on 2026-02-04
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Systems Engineer
Job Description & How to Apply Below
Position: Senior DevOps Engineer / Platform Reliability Lead
Senior Dev Ops Engineer / Platform Reliability Lead

Exp :  10-12+ years
Location :  Kolkata

Role Overview
We are seeking a  Senior Dev Ops Engineer / Platform Reliability Lead  who can take an end-to-end view of our systems, identify improvement areas across architecture, infrastructure, deployment pipelines, and reliability, and guide the platform toward higher scalability, stability, and operational maturity.
This role requires strong system thinking, sound architectural judgment, and the ability to clearly call out risks and improvements.

Key Responsibilities
Review the complete backend ecosystem (Node.js, Golang services, cloud infrastructure, CI/CD).
Identify architectural, scalability, reliability, and security gaps post in-house migration.
Recommend and prioritise short-term fixes and long-term platform improvements.
Own containerized infrastructure using  Docker and Kubernetes  in production.
Design and maintain robust  CI/CD pipelines  with safe deployment and rollback strategies.
Implement and improve  monitoring, logging, alerting , and incident response practices.
Define and track meaningful SLIs, SLOs, and error budgets.
Prepare systems for OTT traffic spikes during releases and live events.
Improve caching, queuing, and backend performance in collaboration with backend teams.
Drive secure access, secrets management, and cloud cost optimisation.
Act as a technical partner to backend, product, and leadership teams.

Required Technical Skills
Cloud & Infrastructure
Strong experience with  AWS  (EC2, EKS/ECS, S3, RDS/Dynamo

DB, IAM)
Docker  and  Kubernetes  (production environments)
Infrastructure as Code  – Terraform (preferred)
CI/CD & Operations
Git Hub Actions / Git Lab CI / Jenkins
Blue-green / canary deployments and rollback strategies
Backend Awareness
Node.js  (Express / NestJS level understanding)
Golang  (microservices, concurrency, profiling basics)
Observability
Prometheus, Grafana
Centralised logging (ELK / Open Search / Loki)
Distributed tracing (Jaeger / Open Telemetry)
Data, Cache & Messaging
Redis (cache and/or queues)
Kafka / SQS / Rabbit

MQ (deep experience with at least one)
Mongo

DB (understanding of No-SQL DBs, bonus if experienced with Atlas offerings)
Security & Reliability
Secrets management (Vault / AWS Secrets Manager)
IAM and least-privilege access design
Production incident handling experience

Personality & Mindset
Strong ownership and accountability for platform reliability.
Comfortable identifying what is wrong and explaining how to fix it.
Calm and structured during incidents and high-pressure situations.
Clear communication with engineers and non-technical stakeholders.
Systems thinker who understands end-to-end impact, not just isolated components.
Pragmatic, data-driven, and collaborative.

Reach out to :  sushim / shirin
Position Requirements
10+ Years work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary