Senior Kafka Platform Engineer
Job in
Chicago, Cook County, Illinois, 60290, USA
Listed on 2026-03-01
Listing for:
Selby Jennings
Full Time
position Listed on 2026-03-01
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, Data Engineer
Job Description & How to Apply Below
Overview
We're looking for a Senior Kafka Platform Engineer to design, automate, and scale a mission-critical event-streaming platform. In this role, you'll own the core Kafka environment—from brokers and storage through security, automation, and observability—while driving modern, Kubernetes-based deployment patterns. You'll build self-service tooling, define reliability standards, and collaborate closely with engineering teams to ensure robust, performant, and secure streaming capabilities. The ideal candidate brings deep Kafka expertise, strong automation skills, and a cloud-native engineering mindset.
Key Responsibilities- Kafka Platform Ownership: Architect, deploy, and operate production-grade Kafka clusters (self-managed or cloud-hosted), overseeing upgrades, scaling strategies, capacity modeling, and multi-AZ/region resiliency.
- Kubernetes & Automation: Run Kafka on Kubernetes using Operators, Helm, and Git Ops; build automation frameworks and guardrails using IaC to support repeatable, compliant, zero-downtime deployments.
- Ecosystem Services: Manage and optimize Kafka Connect, Schema Registry, and replication technologies (Mirror Maker 2, Cluster Linking); define connector standards and enable self-service provisioning.
- Reliability Engineering: Establish SLOs, own incident response, maintain runbooks, conduct postmortems, and develop automated remediation and resilience patterns.
- Observability: Build and maintain monitoring for metrics, logs, traces, consumer lag, partition health, and capacity insights using tools such as Prometheus, Grafana, Burrow, Cruise Control, or Open Telemetry.
- Security & Compliance: Implement encryption, authentication, authorization, secrets management, network policies, and audit controls for secure data-in-motion.
- Streaming Best Practices: Guide application teams on topic strategy, partitioning, retention and compaction tuning, idempotency, ordering guarantees, schema evolution, DLQs, and exactly-once semantics.
- Cross-Functional Collaboration: Partner with application, data, platform, and SRE teams to provide tooling, documentation, enablement, and architectural guidance.
- Technical Leadership: Mentor engineers, help shape platform strategy, and contribute to long-term standards and roadmap decisions.
- Kafka Expertise: Extensive hands-on experience operating Kafka in production environments at scale, including brokers, controllers, replication, ISR dynamics, rebalancing, storage tiers, and failure recovery.
- Kubernetes
Skills:
Strong background operating stateful systems on Kubernetes using Operators, Helm, CRDs, and cloud-native patterns. - Automation: Proficiency with IaC tools (e.g., Terraform), Git Ops workflows (Argo CD or Flux), and CI/CD tooling for full lifecycle automation.
- Programming: Strong scripting and development experience in Python, Go, or Java; plus solid Bash and Linux fundamentals (networking, file systems, JVM tuning).
- Observability & Tuning: Expertise in Kafka performance troubleshooting, capacity planning, monitoring stacks, and alerting workflows.
- Security: Hands-on experience with TLS/mTLS, SASL/OAuth, ACL/RBAC, and secret-management solutions such as Vault.
- Ecosystem Components: Experience with Kafka Connect, Schema Registry, Mirror Maker 2/Cluster Linking; familiarity with Cruise Control.
- Cloud: Knowledge of AWS, Azure, or GCP networking, IAM, and managed streaming services such as Confluent Cloud or AWS MSK.
- Operational Excellence: Demonstrated ability to write runbooks, lead incidents, and drive platform improvements.
- Experience with stream-processing frameworks (Kafka Streams, Flink, Spark Structured Streaming).
- Background running Strimzi or Confluent for Kubernetes in production.
- Knowledge of CDC technologies and connector operations at scale (e.g., Debezium).
- Experience designing multi-region architectures, cluster-linking strategies, and disaster-recovery processes.
- Chicago, IL
- New York, NY
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×