Site Reliability Engineer
Listed on 2026-01-12
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Site Reliability Engineer (24x7 Operational Support) (w/m/d) Changing Lives as the DACH Region’s most-trusted tech talent partner. Nearly a decades experience in technical recruitment.
We’re a fast growing German AI business building products that help organisations make smarter decisions with data.
Our team is ambitious, highly technical, and genuinely collaborative. With an average age around 30, the culture is modern, direct, and delivery focused, with plenty of room to influence how things are built.
We’re looking for a Site Reliability Engineer to sit at the bridge between software engineering and Dev Ops.
This role is focused on reliability, performance, and observability, helping us engineer scalable, efficient systems and ensuring our AI powered services remain fast, stable, and measurable across environments.
What you’ll be doing- Drive observability across the entire stack (Open Telemetry, monitoring, tracing)
- Conduct load testing, profiling, and performance engineering
- Optimise application performance in collaboration with software engineering teams
- Review, improve, and automate Dev Ops processes to reduce friction and risk
- Improve system reliability through strong engineering principles and best practice
- Develop automation, tooling, and platform enhancements that increase operational maturity
- Help shape standards for metrics, alerting, and incident response across teams
- Kubernetes or Open Shift
- Programming experience in Python or Type Script
- Docker
- CI systems (Jenkins, Ansible, Git Hub Actions, or similar)
- CD systems (ArgoCD or similar)
- Observability stack experience (Grafana, Open Telemetry)
- Observability suites such as Tempo, Dynatrace, or Instana
- S3 or compatible object storage
- Understanding of large language models (LLMs) and AI workloads
- Strong software engineering background
- Experience with Dev Ops practices and operational environments
- Passion for observability, performance engineering, testing, and system reliability
- Ability to understand, analyse, and visualise complex systems
- A practical mindset that prioritises automation and continuous improvement
- Degree in Computer Science, Information Technology, Software Engineering, Systems Engineering, or a related technical field (or equivalent practical experience)
- Fully remote role within a German AI business with strong momentum and clear product vision
- Average team age around 30 with a modern, collaborative engineering culture
- High impact role that sits close to both product engineering and platform operations
- Work with modern tooling (Open Telemetry, Kubernetes, ArgoCD, Grafana) and real scale challenges
- Opportunity to influence standards, reliability strategy, and performance across the organisation
- Technical interview (SRE, observability, performance focus)
- Practical discussion or case (real world scenarios, incident and optimisation based)
- Final meeting with engineering leadership
If you enjoy making systems faster, more stable, and more observable, and you want to work on AI services that customers rely on daily, we’d love to hear from you.
Apply now or reach out for a confidential discussion.
Seniority levelMid-Senior level
Employment typeFull-time
Job functionInformation Technology
IndustriesTechnology, Information and Media and Software Development
Germany#J-18808-Ljbffr
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).