×
Register Here to Apply for Jobs or Post Jobs. X

Senior Engineer

Job in Atlanta, Fulton County, Georgia, 30383, USA
Listing for: Gokool Digital
Full Time position
Listed on 2026-03-05
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Location:
Overland Park, KS / Atlanta, GA / Frisco, TX (Onsite)

Requirements

Qualifications:

  • 4–9 years in SRE/Dev Ops/Systems Engineering as Senior or Principal Engineer
  • Strong hands‑on experience with Kubernetes, container orchestration
    , and API management.
  • Working knowledge of WAFs, networking security, and database technologies (SQL/No

    SQL).
  • Proficient in automation and scripting (
    Python, Go,Ansible, Terraform, etc.)
  • Strong observations/monitoring experience.
  • Experience with CI/CD pipelines, Git Ops, and infrastructure as code.
  • Solid problem‑solving and collaboration skills.

Job responsibilities:

  • Resolve escalated incidents across Kubernetes,API Proxy, WAF,DBs, and infra platforms.
  • Design and improve runbooks, automating manual steps wherever possible.
  • Lead and contribute to building self‑healing systems and self‑service tooling for users.
  • Analyze incident trends, propose improvements in monitoring, capacity, and reliability.
  • Collaborate with engineering teams on deployment, upgrades, and performance optimization.
  • Conduct postmortems, document RCA, and ensure learning is captured.
  • Mentor and coach L1 engineers.

Skills

Mandatory Skills (Must-Have)

  • Advanced Incident Troubleshooting & Resolution

Expectation:
Diagnose and resolve escalated incidents that L1 cannot handle, often across multiple layers (infrastructure, application,network).

Example:
For an API outage,identify if the root cause is in Kubernetes pod networking, API gateway mis-config,or back-end DB latency — and apply fixes.

Expectation:
Comfortable with deployments, scaling,networking, and debugging cluster level issues.

Example:
Troubleshoot why pods are pending by checking node capacity, taints/tole rations, and cluster auto scaler logs.

  • Automation & Scripting (Python, Go, Bash,Ansible, Terraform)

Expectation:
Write scripts and automation to reduce manual toil, enhance monitoring, and improve incident resolution speed.

Example:
Develop a Python script to automatically collect pod and system logs when a service crashes.

  • Observability & Monitoring Tooling

Expectation:
Deep understanding of monitoring, alerting, tracing, and logging systems.

Example:
Build Prometheus alert rules to detect DB query spikes; configure Grafana dashboards for API latency.

  • CI/CD & Infrastructure as Code (IaC)

Expectation:
Familiarity with Git Ops workflows, CI/CD pipelines, and infrastructure provisioning.

Example:
Enhance Jenkins pipeline to add automated smoke tests before promoting Kubernetes deployments.

  • Database Troubleshooting (SQL & No

    SQL)

Expectation:
Identify performance bottlenecks, connection issues, and basic tuning opportunities.

Example:
Run queries to detect slow-running SQL statements causing latency in an application.

Expectation:
Act as incident commander for escalated issues, lead bridge calls, and produce Root Cause Analyses.

Example:
After a WAF misconfiguration causes downtime,lead the investigation, document the timeline, and propose preventive actions.

Expectation:
Coach L1 engineers, refine runbooks, and introduce new automated workflows.

Example:
Update a runbook to add automated Kubernetes log collection instead of manual steps.

Preferred Skills (Nice-to-Have)

Expectation:
Hands-on skills in provisioning, scaling, and securing cloud workloads.

Example:
Diagnose why an AWS ALB is misrouting traffic after a deployment.

  • Security & WAF Management

Expectation:
Understand WAF rules, common attacks (SQL injection, XSS), and how to apply fixes.

Example:
Investigate false positives in WAF logs and adjust rule sets with security teams.

  • Capacity & Performance Engineering

Expectation:
Anticipate scaling needs, tune resource utilization, and propose optimizations.

Example:
Identify that a Kubernetes deployment is CPU‑throttled and adjust HPA (Horizontal Pod Autoscaler) configs.

  • Automation Platform Integration (AIOps, Chat Ops)

Expectation:
Integrate AI/ML-powered tools for anomaly detection and auto‑remediation.

Example:
Implement a Chat Ops bot that runs predefined Kubernetes troubleshooting commands in Slack.

Expectation:
Experience supporting both on‑prem and cloud environments seamlessly.

Example:
Compare latency patterns between on‑prem DBs and cloud‑hosted APIs to identify bottlenecks.

Qualifications:

  • 7+ years in SRE/Dev Ops/Systems Engineering as Senior or Principal Engineer
  • Strong hands‑on experience with Kubernetes, container orchestration, and API management.
  • Working knowledge of WAFs, networking security, and database technologies (SQL/No

    SQL).
  • Proficient in automation and scripting (Python, Go, Ansible, Terraform, etc.)
  • Strong observability/monitoring experience.
  • Experience with CI/CD pipelines, Git Ops, and infrastructure as code.
  • Solid problem‑solving and collaboration skills.
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary