SRE Job Belfast area,Northern Ireland UK,IT/Tech

Site Reliability Engineer

We're working with a global technology consultancy that designs, builds, and supports modern software platforms for enterprise customers worldwide. They partner closely with clients to deliver reliable, scalable, cloud-native solutions.

The Role

As an SRE, you'll play a key role in ensuring the availability, performance, and scalability of production systems, supporting customers across the EMEA region. Helping to build, mature, and enhance the SRE function. This is a hands‑on, technical role, focused on reliability, automation, and operational excellence across a distributed, cloud-based platform.

Key Responsibilities

Platform Reliability:
Deploy, operate, and improve Kubernetes clusters across multiple cloud environments.
Service Performance:
Design and implement processes to enhance system reliability, availability, and scalability.
CI/CD Enablement:
Build and optimise CI/CD pipelines to support safe, repeatable deployments.
Observability & Incidents:
Own monitoring, alerting, and incident response to minimise downtime and speed recovery.
Root Cause Analysis:
Lead post‑incident reviews and implement long‑term preventative improvements.
Automation:
Reduce operational toil through automation and performance optimisation.
On‑Call:
Participate in weekday coverage and a once‑monthly weekend rota.

Collaboration & Stakeholder Engagement

Work closely with engineering, infrastructure, and product teams to embed SRE best practices.
Advocate for reliability, resilience, and operational excellence across teams.
Collaborate with a globally distributed engineering function.
Engage directly with customers to resolve incidents and improve user experience.

Skills & Experience

Proven experience as an SRE or similar role, supporting complex distributed systems (5+ years).
Strong Kubernetes experience (AKS, EKS, GKE, or similar).
Hands‑on with observability tools such as Prometheus, Grafana, Kibana, Vector, or Superset.
Experience with at least one major cloud platform: AWS, Azure, GCP, or Linode.
SQL database experience (Postgre

SQL beneficial but not essential).
Proficiency in Python, Go, or Rust.
Strong Linux expertise, including performance tuning and troubleshooting.
Excellent communication skills, able to work effectively with engineers and customers.

Please apply now if you are meeting the above criteria, or contact Andrew Harrison directly.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language