Site Reliability Engineer Job Germany Ohio USA,IT/Tech

Location: Germany

Site Reliability Engineer (24x7 Operational Support) (w/m/d) Changing Lives as the DACH Region’s most-trusted tech talent partner. Nearly a decades experience in technical recruitment.

We’re a fast growing German AI business building products that help organisations make smarter decisions with data.

Our team is ambitious, highly technical, and genuinely collaborative. With an average age around 30, the culture is modern, direct, and delivery focused, with plenty of room to influence how things are built.

We’re looking for a Site Reliability Engineer to sit at the bridge between software engineering and Dev Ops.

This role is focused on reliability, performance, and observability, helping us engineer scalable, efficient systems and ensuring our AI powered services remain fast, stable, and measurable across environments.

What you’ll be doing

Drive observability across the entire stack (Open Telemetry, monitoring, tracing)
Conduct load testing, profiling, and performance engineering
Optimise application performance in collaboration with software engineering teams
Review, improve, and automate Dev Ops processes to reduce friction and risk
Improve system reliability through strong engineering principles and best practice
Develop automation, tooling, and platform enhancements that increase operational maturity
Help shape standards for metrics, alerting, and incident response across teams

Must have skills

Kubernetes or Open Shift
Programming experience in Python or Type Script
Docker
CI systems (Jenkins, Ansible, Git Hub Actions, or similar)
CD systems (ArgoCD or similar)
Observability stack experience (Grafana, Open Telemetry)
Observability suites such as Tempo, Dynatrace, or Instana

Nice to have skills

S3 or compatible object storage
Understanding of large language models (LLMs) and AI workloads
Strong software engineering background
Experience with Dev Ops practices and operational environments
Passion for observability, performance engineering, testing, and system reliability
Ability to understand, analyse, and visualise complex systems
A practical mindset that prioritises automation and continuous improvement

Education

Degree in Computer Science, Information Technology, Software Engineering, Systems Engineering, or a related technical field (or equivalent practical experience)
Fully remote role within a German AI business with strong momentum and clear product vision
Average team age around 30 with a modern, collaborative engineering culture
High impact role that sits close to both product engineering and platform operations
Work with modern tooling (Open Telemetry, Kubernetes, ArgoCD, Grafana) and real scale challenges
Opportunity to influence standards, reliability strategy, and performance across the organisation

Interview process (example)

Technical interview (SRE, observability, performance focus)
Practical discussion or case (real world scenarios, incident and optimisation based)
Final meeting with engineering leadership

If you enjoy making systems faster, more stable, and more observable, and you want to work on AI services that customers rely on daily, we’d love to hear from you.

Apply now or reach out for a confidential discussion.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Information Technology

Industries

Technology, Information and Media and Software Development

Germany
#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language