Senior DevOps Engineer Job Chicago area,Illinois USA,IT/Tech

We are seeking a highly experienced Senior Dev Ops Engineer (Production Support) with deep expertise in AWS, Kubernetes, CI/CD, and cloud-native platforms. This role will focus on operating, stabilizing, and continuously improving production environments, ensuring high availability, performance, and scalability of mission-critical applications.

The ideal candidate is a hands-on Dev Ops/SRE professional who thrives in fast-paced production environments and can automate, troubleshoot, and optimize distributed systems at scale.

You will work extensively with AWS, Kubernetes (Rancher), Jenkins, Git Hub, Terraform, Kafka, Harness, and Python while partnering with engineering, platform, and product teams.

Key Responsibilities Production Operations & Reliability

Provide L2/L3 production support for cloud-native applications running on AWS and Kubernetes.
Own incident triage, root cause analysis (RCA), and resolution for high-severity production issues.
Participate in on-call rotations and drive post-incident improvements.
Improve system reliability, resilience, and observability using SRE best practices.
Design and operate scalable AWS environments using:
EC2, EKS, VPC, ALB/NLB
S3, RDS, DynamoDB
IAM, Cloud Watch, Event Bridge
Optimize cloud cost, performance, and security posture.
Manage and operate Kubernetes clusters (Rancher-managed or EKS).
Troubleshoot:
Pod failures
Resource constraints
Improve:
Autoscaling strategies
Deployment reliability
Design and maintain CI/CD pipelines using:
Jenkins
Git Hub Actions
Harness (preferred)
Implement:
Blue/green and canary deployments
Git Ops workflows
Automated rollbacks

Infrastructure as Code & Automation

Build and maintain infrastructure using:
IaC modules
Platform templates
Deployment accelerators
Automate provisioning, scaling, and recovery workflows.

Kafka & Streaming Platforms

Design and manage Kafka infrastructure including:
Producers/consumers
Ensure:
High availability
Throughput optimization
Secure connectivity
Integrate Kafka with AWS and Kubernetes ecosystems.

Observability & Platform Health

Implement monitoring and alerting using:
Cloud Watch / Splunk Observability
Define:
SLIs/SLOs
Alerting thresholds
Runbooks
Proactively identify bottlenecks and prevent outages.

Security & Compliance

Secrets management
IAM least privilege
Container scanning
Supply chain security
Ensure infrastructure adheres to security and compliance standards.
Partner with development teams to:
Reduce operational toil
Increase automation coverage
Drive:
Developer experience improvements
Operational excellence initiatives

Qualifications Experience

4 - 10 years in Dev Ops / SRE / Production Support roles
Strong experience managing production-grade cloud environments
Proven track record handling live incident management

Technical Skills Must Have

Splunk
Terraform
Jenkins / Git Hub
Kafka
Python or Shell scripting

Good to Have

Harness CI/CD
Observability tools (New Relic, Datadog, Prometheus)

Soft Skills

Strong troubleshooting and debugging mindset
Ability to work in high-pressure production environments
Ownership-driven and automation-first approach

Mandatory

Overall Dev Ops, AWS, Kubernetes/Helm, Terraform/Ansible, Jenkins/Harness, Python/Groovy scripting, Linux, Splunk, Production Support

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language