×
Register Here to Apply for Jobs or Post Jobs. X

SRE DevOps Engineer

Job in Overland Park, Johnson County, Kansas, 66213, USA
Listing for: Highbrow LLC
Full Time position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

SRE Dev Ops Engineer

Location:

Overland Park, KS / Atlanta, GA / Frisco, TX (Onsite)

Qualifications
  • 4–9 years in SRE/Dev Ops/Systems Engineering as Senior or Principal Engineer
  • Strong hands‑on experience with Kubernetes, container orchestration, and API management.
  • Working knowledge of WAFs, networking security, and database technologies (SQL/No

    SQL).
  • Proficient in automation and scripting (Python, Go, Ansible, Terraform, etc.)
  • Strong observability/monitoring experience.
  • Experience with CI/CD pipelines, Git Ops, and infrastructure as code.
  • Solid problem‑solving and collaboration skills.
Job Responsibilities
  • Resolve escalated incidents across Kubernetes, API Proxy, WAF, DBs, and infra platforms.
  • Design and improve runbooks, automating manual steps wherever possible.
  • Lead and contribute to building self‑healing systems and self‑service tooling for users.
  • Analyze incident trends, propose improvements in monitoring, capacity, and reliability.
  • Collaborate with engineering teams on deployment, upgrades, and performance optimization.
  • Conduct post‑mortems, document RCA, and ensure learning is captured.
  • Mentor and coach L1 engineers.
Skills Mandatory Skills (Must‑Have) 1. Advanced Incident Troubleshooting & Resolution

Expectation:
Diagnose and resolve escalated incidents that L1 cannot handle, often across multiple layers (infrastructure, application, network).

Example:
For an API outage, identify if the root cause is in Kubernetes pod networking, API gateway misconfig, or backend DB latency — and apply fixes.

2. Kubernetes & Container Orchestration Expertise

Expectation:
Comfortable with deployments, scaling, networking, and debugging cluster‑level issues.

Example:
Troubleshoot why pods are pending by checking node capacity, taints/tole rations, and cluster autoscaler logs.

3. Automation & Scripting (Python, Go, Bash, Ansible, Terraform)

Expectation:
Write scripts and automation to reduce manual toil, enhance monitoring, and improve incident resolution speed.

Example:
Develop a Python script to automatically collect pod and system logs when a service crashes.

4. Observability & Monitoring Tooling

Expectation:
Deep understanding of monitoring, alerting, tracing, and logging systems.

Example:
Build Prometheus alert rules to detect DB query spikes; configure Grafana dashboards for API latency.

5. CI/CD & Infrastructure as Code (IaC)

Expectation:
Familiarity with Git Ops workflows, CI/CD pipelines, and infrastructure provisioning.

Example:
Enhance Jenkins pipeline to add automated smoke tests before promoting Kubernetes deployments.

6. Database Troubleshooting (SQL & No

SQL)

Expectation:
Identify performance bottlenecks, connection issues, and basic tuning opportunities.

Example:
Run queries to detect slow‑running SQL statements causing latency in an application.

7. Incident Management & RCA

Expectation:
Act as incident commander for escalated issues, lead bridge calls, and produce Root Cause Analyses.

Example:
After a WAF misconfiguration causes downtime, lead the investigation, document the timeline, and propose preventive actions.

8. Mentorship & Runbook Improvement

Expectation:
Coach L1 engineers, refine runbooks, and introduce new automated workflows.

Example:
Update a runbook to add automated Kubernetes log collection instead of manual steps.

Preferred Skills (Nice-to-Have) 1. Cloud Platform Engineering (AWS, Azure, GCP)

Expectation:
Hands‑on skills in provisioning, scaling, and securing cloud workloads.

Example:
Diagnose why an AWS ALB is misrouting traffic after a deployment.

2. Security & WAF Management

Expectation:
Understand WAF rules, common attacks (SQL injection, XSS), and how to apply fixes.

Example:
Investigate false positives in WAF logs and adjust rule sets with security teams.

3. Capacity & Performance Engineering

Expectation:
Anticipate scaling needs, tune resource utilization, and propose optimizations.

Example:
Identify that a Kubernetes deployment is CPU‑throttled and adjust HPA (Horizontal Pod Autoscaler) configs.

4. Automation Platform Integration (AIOps, Chat Ops)

Expectation:
Integrate AI/ML‑powered tools for anomaly detection and auto‑remediation.

Example:
Implement a Chat Ops bot that runs predefined Kubernetes troubleshooting commands in Slack.

5. Cross‑Platform Expertise (Hybrid Infra)

Expectation:
Experience supporting both on‑prem and cloud environments seamlessly.

Example:
Compare latency patterns between on‑prem DBs and cloud‑hosted APIs to identify bottlenecks.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary