Sr DevOps Engineer Job Prakāshamnagar Telangana India,IT/Tech

Location: Prakāshamnagar

Senior Dev Ops & Site Reliability Engineer (Dev Ops + SRE)
About the Role
We are seeking a highly experienced Senior Dev Ops & Site Reliability Engineer to support and scale our cloud-native, containerized IoT platform built on AWS. You will work closely with the Technical Manager to automate infrastructure, build CI/CD pipelines, manage large-scale deployments, and ensure the platform's reliability, security, and performance.
This role requires deep hands-on expertise in AWS, Docker/Kubernetes, serverless workflows, infrastructure automation, scripting (Python), and IoT-scale distributed systems reliability.

Key Responsibilities
Dev Ops Responsibilities

· Design, implement, and maintain CI/CD pipelines using Git Hub Actions, AWS Code Pipeline, or Git Lab CI.

· Develop and automate deployment workflows following Dev Ops strategy and best practices .

· Manage Docker containerization , including multi-stage builds, optimization, and image security.

· Orchestrate containers using Kubernetes (EKS) or AWS ECS (Fargate/EC2).

· Manage and optimize ECR for image storage and versioning.

· Implement Infrastructure-as-Code using AWS CDK, Terraform, or Cloud Formation .

· Build automated workflows for backend, microservices, and IoT services deployment.

· Support serverless architectures using AWS Lambda, Step Functions, Event Bridge, etc.

· Implement secure secrets management using AWS IAM, KMS, and Secrets Manager.

· Handle configuration, environment management, and zero-downtime deployment strategies.
Site Reliability Engineering (SRE) Responsibilities

· Build and maintain monitoring, logging, tracing pipelines using Cloud Watch, Grafana, Prometheus, X-Ray, and Open Telemetry.

· Define and implement SLIs, SLOs, error budgets , and reliability dashboards.

· Ensure high availability, resilience, and performance of all systems under production.

· Conduct incident management, root cause analysis, and post-incident reviews.

· Optimize cost, compute utilization, autoscaling policies, and failover strategies.

· Implement cloud reliability patterns—circuit breaker, retries, throttling, canary and blue-green deployments.

· Manage production readiness, release safety, and operational excellence.

Required

Skills & Qualifications

· 7+ years of experience in Dev Ops, SRE, or Cloud Infrastructure roles.

· Deep hands-on experience with:
o Docker containerization & orchestration
o Kubernetes (EKS) and/or AWS ECS
o AWS ECR (image lifecycle management)
o AWS IoT Core, Lambda, API Gateway, VPC, S3, IAM, Cloud Watch

· Strong scripting experience — Python expertise preferred (Bash is a plus).

· Proficiency with Git Hub for code management, automation, and CI/CD workflows.

· Strong background in Infrastructure-as-Code : AWS CDK, Terraform, or Cloud Formation.

· Experience with reliability engineering frameworks, large-scale distributed systems, and HA/DR design.

· Knowledge of serverless computing and event-driven architectures.

· Strong understanding of cloud security, identity management, and compliance.


Increase/decrease your Search Radius (miles)



Job Posting Language