×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer, Observability

Job in Toronto, Ontario, C6A, Canada
Listing for: Chainlink Labs
Full Time position
Listed on 2026-01-11
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, Blockchain / Web3
Salary/Wage Range or Industry Benchmark: 125000 - 150000 CAD Yearly CAD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Senior Site Reliability Engineer, Observability

Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink stack provides the essential data, interoperability, compliance, and privacy standards needed to power advanced blockchain use cases for institutional tokenized assets, lending, payments, stable coins, and more. Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi.

Many of the world’s largest financial services institutions have adopted Chainlink’s standards and infrastructure, including Swift, Euroclear, Mastercard, Fidelity International, UBS, S&P Dow Jones Indices, FTSE Russell, Wisdom Tree, ANZ, and top protocols such as Aave, Lido, GMX.

The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing self‑service and decreasing cognitive load. This job would be perfect for someone who has a strong Dev Ops mentality, is passionate about building and maintaining a mature Git Ops environment, and has experience focusing on observability.

Your

Impact
  • Build and orchestrate Modern OTEL-based Observability Platform
  • Support multiple telemetry types, like metrics, logs and traces.
  • Define and support modern governance in observability and problems at scale.
  • Ensure reliability, security, and performance exceed our defined SLAs
  • Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
  • Lead the design and deployment of monitoring / observability services to detect and alert the team of needed action.
  • Ingest, aggregate, transform, and utilize data from a multitude of sources in our real‑time data pipeline.
  • Oversee the availability, performance, and supportability of our observability infrastructure.
  • Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data.
  • Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release.
  • Champion reliability and security by taking the time to do your work right the first time
Requirements
  • 7+ years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and / or platform team before
  • Ability to develop software outside of the scope of typical infrastructure requirements and configurations
  • Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
  • Expert knowledge in all aspects of designing, developing, and managing large real‑time systems
  • Experience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack.
  • Experience with distributed systems and container orchestration. You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them
  • Strong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews
Desired Qualifications
  • Excitement for blockchain, Web 3.0, and similar decentralized technologies.
  • Experience running any infrastructure in the blockchain / web3 space
  • Ability to scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
Experience working remotely in a distributed team

A strong desire to grow and challenge yourself. We would expect you to constantly find ways to improve and automate services to reduce toil

Tools and Services

AWS;
Terraform / Terragrunt;
Kubernetes, Calico and ArgoCD;
Prometheus and Grafana;
Git Hub Actions;
Packer

We expect you to be comfortable with most of those tools…

Position Requirements
10+ Years work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary