Job Description & How to Apply Below
Company Description
Nexthink is the leader in digital employee experience management software. The company provides IT leaders with unprecedented insight, allowing them to see, diagnose and fix issues at scale, impacting employees anywhere, with any application or network, before employees notice the issue. As the first solution to enable IT to progress from reactive problem-solving to proactive optimisation, Nexthink empowers its more than 1,200 customers to deliver better digital experiences to over 15 million employees.
Dual headquartered in Lausanne, Switzerland and Boston, Massachusetts, Nexthink has 9 offices worldwide.
#LI-Hybrid
Job Description
At Nexthink, we empower our customers with industry-leading solutions to enable continuous improvement of employee experience. We deliver unmatched visibility across all environments, so IT teams can consistently see, diagnose, and fix digital workplace issues. As a SaaS provider, our commitment is to deliver a seamless, resilient, and scalable platform around the clock.
We are looking for an experienced, proactive and innovative professional who is keen to join as a Senior Site Reliability Engineer! The mission of Nexthink's SRE team is to strengthen our infrastructure and enhance our ability to deploy, monitor, and scale systems effectively and reliably. They work closely with over 50 Product Engineering teams that develop our products and services, as well as with the Technical Platform Engineering, Security and Architecture teams to understand the reliability requirements, design and implement solutions, and promote them for adoption and usage.
Join our vibrant team of diverse and experienced engineers where cutting-edge technology meets innovation. Be a part of Nexthink's Digital Employee Experience technological revolution, ensuring our global customers enjoy a seamless user experience. Apply now and become a key player in our dynamic SRE organisation.
As a Senior Site Reliability Engineer, you will:
Implement and manage cloud-native systems (AWS) using best-in-class tools and automation.
Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes to support rapid delivery cycles.
Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind.
Define and maintain SLOs, SLAs, and error budgets, and proactively address availability and performance issues.
Develop infrastructure-as-code (Terraform or similar) for repeatable and auditable provisioning.
Build internal platform tools and automation to support provisioning, monitoring, and operational efficiency.
Monitor infrastructure and applications, ensuring high-quality user experiences.
Participate in a shared on-call rotation, responding to incidents, troubleshooting outages, and driving timely resolution and communication.
Act as an Incident Commander during the on-call duty and coordinate cross-team responses effectively to maintain an SLA.
Drive and refine incident response processes, reducing Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR).
Diagnose and resolve complex issues independently, minimising the need for external escalation.
Work closely with software engineers to embed observability, fault tolerance, and reliability principles into service design.
Automate runbooks, health checks, and alerting to support reliable operations with minimal manual intervention.
Support automated testing, canary deployments, and rollback strategies to ensure safe, fast, and reliable releases.
Contribute to security best practices, compliance automation, and cost optimisation.
Qualifications
Minimum Bachelor’s degree in Computer Science or equivalent practical experience.
5+ years of experience as a Site Reliability Engineer or Platform Engineer with strong knowledge of software development best practices.
Strong hands-on experience with public cloud services (AWS, GCP, Azure) and supporting SaaS product.
Strong programming or scripting skills (e.g., Python, Go, Bash...), and experience with infrastructure-as-code (e.g. Terraform).
Proficiency with Kubernetes, container-based deployment (e.g., Docker) and…
Position Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×