Senior Site Reliability Expert; Retail
Remote / Online - Candidates ideally in
Montreal, Montréal, Province de Québec, Canada
Listing for:
Lightspeed
Full Time, Remote/Work from Home
position
Listed on 2026-01-13
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below
Position: Senior Site Reliability Expert (Retail)
Location: MontrealAre you actively seeking a new opportunity, or simply exploring the market? Either way, you might have just found the right place!
We’re looking for a Senior SRE to join our Lightspeed Retail group in North America, a team responsible for multiple POS systems infrastructure and developer experiences. The team is at the helm of providing a stable, reliable and efficient system to our retailers.
Our team is also dedicated to designing, building, and operating the infrastructure that powers Lightspeed Retail. This platform supports the entire software delivery lifecycle, from CI/CD pipelines to highly available and scalable production environments.
NOTE:
As a global company with employees and clients outside of Quebec, fluency in English as a working language is required for this position.
What you’ll be responsible for:
As a member of the Site Reliability Expert team:Being an active member of the Retail Platform team, where you will be responsible for the observability, scalability and reliability of the Retail Platform.Designing and implementing Kubernetes clusters for various use cases, ensuring scalability, reliability, and security.Configuring and managing Kubernetes clusters, including nodes, networking, and storage.Performing updates to multi-platform Kubernetes clusters in critical production environmentsAct as both a subject matter expert and an incident lead during the incident response processInitiate and contribute to continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product developmentObsess over reliability, help teams deliver reliable softwareAdhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and Dev Ops methodologiesProvide timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems (You will be on call for periods of time)What you’ll be bringing to the team:
A passion for scalability, reliability and observability and a desire to share that passion with others in a positive, solutions-oriented wayComfortable with leading projects which require coordination and collaboration with other development teams to reach a common goalA desire to quickly grow your ability to champion process changes in the pursuit of the SRE mandateProven track record of driving optimization of cloud services, including, but not limited to data pipelines, storage, databases, caching layer, cores, memory, etcUnderstanding different types of SLAs/SLOs and different types of resource contracts, such as reserved instances and savings plans.Analytical mindset: live by the metrics, deeply understand data and use it to drive technical decisionsGood understanding of Agile development and continuous delivery best practices, software engineering tools, processes, methods and testingPrimary ownership of customer-facing, zero-downtime production environments using the following toolsets:Major cloud platforms (Amazon Web Services, Google Cloud Platform, Azure)CI/CD pipelines (Circle
CI, Jenkins, Github, ArgoCD, Helm)Containers (Docker, Kubernetes, EKS, AKS, GKE & Linux Systems)Infrastructure as Code (Terraform)Programming or Scripting languages (Bash, Python, Ruby, Java, Golang, etc.)Who you are:
You are a problem solver who does not shy away from tackling complexity and critical thinkingYou have a strong will to learn, grow and get out of your comfort zoneYou have great energy and passion for technologyYou can express yourself flawlessly in EnglishYou have strong interpersonal skillsYou are a team player and a bar raiserWhat’s in it for you:
Join a growing team and help us move to the next levelAmazing benefits & perks, including equity for all LightspeedersConstant development of both your skill-set and business acumen with limitless growth opportunitiesLots of autonomy, flexible work cultureInnovation time to explore and learn at workShaping the company by joining cultural & technical committeesTons of growth opportunities into technical or people management rolesOpportunity to join a fast-paced, high-growth companyOpportunity to learn, expand your skill set, forge wonderful relationships and make your mark within the diverse and inclusive Lightspeed family, a true Canadian tech success story…. And enjoy a range of benefits that will keep you happy, healthy and (not) hungry.
Lightspeed equity scheme (we are all owners).Flexible paid time off and remote work policies.Health insurance.Contributions to your pension plan - RRSP.Health and wellness benefit of $500 per year.Paid leave and assistance for new parents.Mental health online platform and counseling & coaching services.Training opportunities to grow your skills and careerVolunteer day.Fully stacked kitchen (hot and cold beverages, meals served)Happy hours to build your relationships with colleagues after work
Position Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here: