Site Reliability Engineer
Listed on 2026-03-12
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, Cybersecurity
Why Work at Lenovo
We are Lenovo. We do what we say. We own what we do. We WOW our customers.
Lenovo is a US $69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services.
Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit , and read about the latest news via our Story Hub.
Description and Requirements About Our TeamLenovo is building Quantum
, a next generation hybrid AI platform that spans Windows, Android, and cloud. As part of this initiative, we are growing the reliability engineering organization that powers Qira
, Lenovo's cross device Personal AI.
We are hiring Site Reliability Engineers (SREs) to strengthen the reliability, observability, and operational excellence of Qira's AI systems across device, edge, and cloud. Depending on your strengths, you may be aligned to areas such as Observability, Operations, or Service Reliability.
Qira works with the speed and creativity of a startup inside Lenovo - you'll help build foundational systems with clarity, ownership, and modern engineering practices.
LocationOn-site in Chicago, IL. Hybrid (3 days on-site, 2 days remote)
What You Might Work On Reliability & Systems EngineeringSupport the reliability, availability, and performance of distributed systems across cloud, edge, and device environments.
Help define, measure, and monitor SLIs and SLOs for core Qira services.
Identify reliability risks and collaborate with senior engineers on mitigation plans.
Operational ExcellenceParticipate in oncall rotations and assist with incident response and postincident reviews.
Contribute improvements to runbooks, automation, and tooling that reduce alert noise and operational toil.
Help enhance detection, alerting, and response workflows.
Observability & InsightImplement and improve telemetry using Open Telemetry
, Grafana
, and related tools.
Build dashboards and tools that improve visibility into system health and AI service behavior.
Ensure observability data is complete, accurate, and actionable.
Deployments & Change SafetySupport safe, reliable deployment workflows including canaries, staged rollouts, and automated rollbacks.
Assist in improving CI/CD systems and deployment tooling.
Work closely with senior SREs, Dev Ops engineers, AI/ML teams, and platform engineers.
Contribute to reliability reviews, operational readiness checks, and cross‑team projects.
Advocate for modern SRE and Dev Ops practices within the organization.
Basic Qualifications4+ years of experience in Site Reliability Engineering, Dev Ops, Platform Engineering, or production systems operations
.
Bachelor's Degree in Computer Science, Engineering, or related technical field (or equivalent practical experience).
Foundational experience supporting distributed systems in production.
Ability to write scripts or tools in Python, Go, Bash, or similar languages.
Solid understanding of Linux systems, networking basics, and system performance fundamentals.
Experience with cloud platforms (Azure preferred, AWS or GCP acceptable).
Familiarity with monitoring/observability (metrics, logs, tracing).
Experience with containers and Kubernetes.
Preferred QualificationsExperience with
Open Telemetry instrumentation and telemetry pipelines.
Hands‑on experience with Grafana
, Prometheus, Loki, or Tempo.
Exposure to AI/ML systems, inference services, or data‑intensive workloads.
Experience…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).