×
Register Here to Apply for Jobs or Post Jobs. X

AWS Site Reliability Engineer; Data Platform

Job in Greater London, London, Greater London, W1B, England, UK
Listing for: SKILLFINDER INTERNATIONAL
Contract position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability, Systems Engineer, AWS
Salary/Wage Range or Industry Benchmark: 100000 - 125000 GBP Yearly GBP 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: AWS Site Reliability Engineer (Data Platform)
Location: Greater London

Role Summary

AWS Site Reliability Engineer (Data Platform)

Fully onsite London or Glasgow

12 month contract Inside IR35

We are seeking an AWS Site Reliability Engineer (SRE) to support, scale, and improve a cloud-native data platform built on AWS, Snowflake, and Databricks
. This role focuses on enhancing platform reliability through automation, disaster recovery testing, resiliency engineering, observability best practices, and proactive SLO/SLI/SLA management.

Responsibilities
  • Design, build, and maintain automation for infrastructure provisioning, platform operations, and incident response using Infrastructure as Code (IaC) and CI/CD
    .
  • Lead resiliency and disaster recovery initiatives
    , including scheduled DR drills, fault injection, and validation of recovery processes across AWS and data platform components.
  • Define, implement, and manage SLIs, SLOs, and SLAs for critical data pipelines and platform services; leverage error budgets to guide reliability-focused improvements.
  • Build and operate end-to-end observability solutions (metrics, logs, traces, alerts) for AWS services, Snowflake, and Databricks workloads.
  • Partner with data engineering and platform teams to embed reliability-by-design into architectural decisions and delivery practices.
  • Perform root cause analysis (RCA) and drive continuous improvement to reduce operational toil and enhance platform availability and performance.
  • Own and drive resolution of platform-related incidents and service requests, ensuring efficient operational support while identifying and automating recurring issues.
Required Skills & Experience
  • Strong practical understanding of SRE principles
    , including SLO/SLI/SLA design and error budget management.
  • Solid hands-on experience with AWS services (eg, EC2, S3, IAM, VPC, Cloud Watch) in production environments.
  • Experience with observability tooling, monitoring, and alerting best practices
    .
  • Proficiency in automation and IaC using tools such as Terraform, Cloud Formation, or CDK
    .
  • Scripting experience with Python and Bash
    .
  • Exposure to modern data platforms such as Snowflake and/or Databricks
    .
Nice to Have
  • Experience running DR tests
    , chaos engineering activities, or resiliency testing in cloud environments.
  • Familiarity with CI/CD pipelines and Git Ops workflows.
  • Background supporting large-scale data or analytics platforms
    .
Technology Skill Level
  • Amazon Web Services (AWS):
    Intermediate (P2)
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary