More jobs:
Sr DevOps Engineer
Job in
500016, Prakāshamnagar, Telangana, India
Listed on 2026-02-03
Listing for:
Confidential
Full Time
position Listed on 2026-02-03
Job specializations:
-
IT/Tech
Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Senior Dev Ops & Site Reliability Engineer (Dev Ops + SRE)
About the Role
We are seeking a highly experienced Senior Dev Ops & Site Reliability Engineer to support and scale our cloud-native, containerized IoT platform built on AWS. You will work closely with the Technical Manager to automate infrastructure, build CI/CD pipelines, manage large-scale deployments, and ensure the platform's reliability, security, and performance.
This role requires deep hands-on expertise in AWS, Docker/Kubernetes, serverless workflows, infrastructure automation, scripting (Python), and IoT-scale distributed systems reliability.
Key Responsibilities
Dev Ops Responsibilities
· Design, implement, and maintain CI/CD pipelines using Git Hub Actions, AWS Code Pipeline, or Git Lab CI.
· Develop and automate deployment workflows following Dev Ops strategy and best practices .
· Manage Docker containerization , including multi-stage builds, optimization, and image security.
· Orchestrate containers using Kubernetes (EKS) or AWS ECS (Fargate/EC2).
· Manage and optimize ECR for image storage and versioning.
· Implement Infrastructure-as-Code using AWS CDK, Terraform, or Cloud Formation .
· Build automated workflows for backend, microservices, and IoT services deployment.
· Support serverless architectures using AWS Lambda, Step Functions, Event Bridge, etc.
· Implement secure secrets management using AWS IAM, KMS, and Secrets Manager.
· Handle configuration, environment management, and zero-downtime deployment strategies.
Site Reliability Engineering (SRE) Responsibilities
· Build and maintain monitoring, logging, tracing pipelines using Cloud Watch, Grafana, Prometheus, X-Ray, and Open Telemetry.
· Define and implement SLIs, SLOs, error budgets , and reliability dashboards.
· Ensure high availability, resilience, and performance of all systems under production.
· Conduct incident management, root cause analysis, and post-incident reviews.
· Optimize cost, compute utilization, autoscaling policies, and failover strategies.
· Implement cloud reliability patterns—circuit breaker, retries, throttling, canary and blue-green deployments.
· Manage production readiness, release safety, and operational excellence.
Required
Skills & Qualifications
· 7+ years of experience in Dev Ops, SRE, or Cloud Infrastructure roles.
· Deep hands-on experience with:
o Docker containerization & orchestration
o Kubernetes (EKS) and/or AWS ECS
o AWS ECR (image lifecycle management)
o AWS IoT Core, Lambda, API Gateway, VPC, S3, IAM, Cloud Watch
· Strong scripting experience — Python expertise preferred (Bash is a plus).
· Proficiency with Git Hub for code management, automation, and CI/CD workflows.
· Strong background in Infrastructure-as-Code : AWS CDK, Terraform, or Cloud Formation.
· Experience with reliability engineering frameworks, large-scale distributed systems, and HA/DR design.
· Knowledge of serverless computing and event-driven architectures.
· Strong understanding of cloud security, identity management, and compliance.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×