×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in San Diego, San Diego County, California, 92189, USA
Listing for: SHEIN Technology LLC
Full Time position
Listed on 2026-03-01
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability, Cloud Computing, IT Support
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Staff Site Reliability Engineer

About SHEIN

SHEIN is a global online fashion and lifestyle retailer, offering SHEIN branded apparel and products from a global network of vendors, all at affordable prices. Headquartered in Singapore, with more than 15,000 employees operating from offices around the world, SHEIN is committed to making the beauty of fashion accessible to all, promoting its industry-leading, on-demand production methodology, for a smarter, future-ready industry.

Position

Summary

We are seeking a Staff Site Reliability Engineer (Official

Title:

Staff Site Reliability Engineer I) with deep experience operating and evolving large-scale, mission-critical systems where availability and reliability are non-negotiable. At SHEIN, Site Reliability Engineers are hybrid software and systems engineers responsible for keeping production services always on while enabling the platform to scale rapidly and safely. In this role, you will own and support complex services and infrastructure, ensuring they consistently meet reliability and performance expectations.

At the Staff level, you will also provide technical leadership, influencing platform architecture, reliability strategy, and operational standards across the organization. The SRE team owns and maintains critical open-source and in-house technologies that underpin the platform and serves as a core contributor to major engineering initiatives. We are accountable for driving platform operability forward by reducing incident frequency, minimizing MTTR, and improving system resilience, efficiency, and resource utilization.

You will work closely with global, cross-functional teams to design, build, and evolve observability and operational tooling—including metrics, logs, traces, alerting, and automation—providing deep visibility into system behavior. Through hands-on engineering and operational excellence, you will proactively identify risks and failure modes, help prevent incidents before they occur, and lead fast, effective responses when they do. To succeed in this role, you will combine strong software engineering skills, solid to deep expertise in Linux, networking, and distributed systems, and a passion for solving problems of scale, complexity, and reliability.

Your work will directly contribute to delivering a stable, scalable, and high-performing experience for customers worldwide.

Job Responsibilities
  • Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents.
  • Triage and resolve production incidents, driving root cause analysis and contributing to continuous improvements that reduce MTTR and prevent recurrence.
  • Monitor and manage capacity planning and resource utilization, partnering with cross-functional teams to ensure systems scale safely while remaining cost-effective.
  • Own and operate core open-source infrastructure such as APISIX, Nginx, Kubernetes, Kafka, Elasticsearch, Redis, Consul, Etcd, Zookeeper and other large-scale distributed systems.
  • Design, build, and maintain observability solutions (metrics, logs, traces, alerting) to improve system visibility, reliability, and resiliency.
  • Automate operational workflows and eliminate manual toil through scripting, tooling, and process improvements.
  • Develop and maintain technical documentation, including runbooks, architecture diagrams, operational procedures, and on-call playbooks.
  • Work closely with global engineering teams to improve infrastructure reliability and performance through better system design and operational discipline.
  • Mentor Senior and mid-level SREs, raising the overall technical bar and operational maturity of the team.
  • Lead efforts to modernize the platform in alignment with industry best practices and evolving technology standards.
Job Requirements
  • Bachelor’s degree in Computer Science, Information Systems, or a related technical discipline, or equivalent practical experience.
  • 6+ years of experience owning and operating large-scale, high-traffic, 24/7 production systems, ideally in cloud or cloud-native environments.
  • Solid foundations in Linux, networking, and distributed systems, with the ability to debug complex production…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary