Senior Software Engineer, DevOps
San Jose, Santa Clara County, California, 95199, USA
Listed on 2026-01-12
-
IT/Tech
Cloud Computing, Data Engineer
Teamwork makes the stream work.
Roku is changing how the world watches TV
Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we've set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers.
From your first day at Roku, you'll make a valuable - and valued - contribution. We're a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines.
About the TeamRoku runs one of the largest data lakes in the world. We store over 70 petabytes of data, run more than 10 million queries per month, and scan over 100 petabytes of data per month. The Big Data team is the one responsible for building, running, and supporting the platform that makes this possible. We provide all the necessary tooling to acquire, generate, process, monitor, validate, and access data in the lake for both streaming and batch data.
We are also responsible for generating the foundational data. The systems we provide include Scribe, Kafka, Hive, Presto, Spark, Flink, Pinot, and others. The team is actively involved in Open Source, and we are planning to increase our engagement over time.
We are seeking a skilled engineer with exceptional Dev Ops skills to join our team. Responsibilities include automating and scaling Big Data and Analytics technology stacks on Cloud infrastructure, building CI/CD pipelines, setting up monitoring and alerting for production infrastructure, and keeping our technology stacks up to date.
For California Only - The estimated annual salary for this position is between $186,000 - $340,000 annually. Compensation packages are based on factors unique to each candidate, including but not limited to skill set, certifications, and specific geographical location. This role is eligible for health insurance, equity awards, life insurance, disability benefits, parental leave, wellness benefits, and paid time off.
What you'll be doing:- Develop best practices around cloud infrastructure provisioning, disaster recovery, and guiding developers on the adoption
- Scale Big Data and distributed systems
- Collaborate on system architecture with developers for optimal scaling, resource utilization, fault tolerance, reliability, and availability
- Conduct low-level systems debugging, performance measurement & optimization on large production clusters and low-latency services
- Create scripts and automation that can react quickly to infrastructure issues and take corrective actions
- Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects
- Collaborate and communicate with a geographically distributed team
- Bachelor’s degree, or equivalent work experience
- 8+ years of experience in Dev Ops or Site Reliability Engineering
- Experience with Cloud infrastructure such as Amazon AWS, Google Cloud Platform (GCP), Microsoft Azure, or other Public Cloud platforms. GCP is preferred.
- Experience with at least 3 of the technologies/tools mentioned here:
Big Data / Hadoop, Kafka, Spark, Airflow, Presto, Druid, Open search, HA Proxy, or Hive - Experience with Kubernetes and Docker
- Experience with Terraform
- Strong background in Linux/Unix
- Experience with system engineering around edge cases, failure modes, and disaster recovery
- Experience with shell scripting, or equivalent programming skills in Python
- Experience working with monitoring and alerting tools such as Grafana or Pager Duty, and being part of call rotations
- Experience with Chef, Puppet, or Ansible
- Experience with Networking, Network Security, and Data Security
- AI literacy and curiosity. You have either
1) tried Gen AI in your previous work or outside of work, or
2) are curious about Gen AI and have explored it.
#LI-SR2
AccommodationsRoku welcomes applicants of all backgrounds and provides reasonable accommodations and adjustments in accordance with applicable law. If you require reasonable accommodation at any point in the hiring process, please direct your inquiries to
Our Hybrid Work ApproachRoku fosters an inclusive and collaborative environment where teams work in the office Monday through Thursday. Fridays are flexible for remote work except for employees whose roles are required to be in the office five days a week or employees who are in offices with a five day in office policy.
BenefitsRoku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).