×
Register Here to Apply for Jobs or Post Jobs. X

Lead Site Reliability Engineer Remote

Remote / Online - Candidates ideally in
California, Moniteau County, Missouri, 65018, USA
Listing for: Intellum, Inc.
Remote/Work from Home position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Lead Site Reliability Engineer New Remote, United States
Location: California

Overview

Intellum is the leader in corporate education technology and powers the largest, most successful customer, partner, and employee learning programs in the world. Large brands and fast-moving companies rely on Intellum to engage and educate the audiences they touch. We are a remote-first company with team members worldwide, valuing curiosity, creativity, perseverance, and kindness. Our culture supports personal development budgets and an annual all-company retreat focused on human connections.

We are in growth mode and pursue a smart growth approach to scale the company.

Our stack
  • Core
    :
    Applications written in Ruby on Rails and Node.js, Postgre Sql, Mongo

    DB,, Redis, Memcached, Sidekiq, Active Job, Elasticsearch, Websockets
  • Infrastructure
    : 100% Linux-based cloud infrastructure (AWS, Google Cloud, Mongo

    DB Atlas) and services (ECS/EC2/Kubernetes, Elasticache, Memory Store, RDS, Cloud

    SQL, Big Query etc.)
  • Infrastructure as Code (IaC):
    Git Hub, Terragrunt, Terraform, Ansible
  • CI/CD: Spinnaker, Jenkins
  • Observability & Alerting
    :
    New Relic, AWS Cloud Watch, Google Cloud Stackdriver, Squadcast
  • Agile/Scrum practices utilizing JIRA
Responsibilities
  • SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives.
  • Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience.
  • Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department.
  • Security by Design: Champion infrastructure security. Partner with Info Sec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline.
  • Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence.
  • Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it".
Required Skills

Experience & Engineering

  • 10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications.
  • Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible).
  • Strong proficiency with SQL databases (Postgre

    SQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases.

SRE & Operations

  • Deep Observability:
    Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals".
  • SLO Governance:
    Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability.
  • Security Focus:
    Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security.
  • Incident Management:
    Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection).
Leadership
  • Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills.
  • Documentation & Training:
    Skilled in documenting solutions and training operational teams on how to effectively support and maintain systems.
  • Proactive Problem-Solving:
    Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion.
Benefits
  • Medical - 100% of employee premiums for selected individual plans
  • Dental - 100% of employee premiums covered
  • Vision - 100% of employee premiums covered
  • 401(k) plus matching (US Based Only)
  • Unlimited PTO
  • Calm subscription
  • Annual Company Retreat
Equal Opportunity

Intellum is an equal-opportunity employer. We are committed to building an inclusive team that celebrates diversity in people, perspectives, and backgrounds regardless of race, color, national origin, gender, sexual orientation, age, religion, disability, citizenship, veteran status, or any other protected status. For questions about pay ranges or the role, contact

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary