×
Register Here to Apply for Jobs or Post Jobs. X

Cloud Infrastructure – Site Reliability Engineer; SRE

Job in Sunnyvale, Santa Clara County, California, 94087, USA
Listing for: Alibaba Cloud
Full Time position
Listed on 2026-01-12
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 104400 - 171000 USD Yearly USD 104400.00 171000.00 YEAR
Job Description & How to Apply Below
Position: Cloud Infrastructure – Site Reliability Engineer (SRE)

Global Talent Acquisition Talent Sourcer

Alibaba Cloud Native Message Middleware Team is responsible for message products, including Rocket

MQ and other messaging products. We are committed to creating a more stable, user‑friendly, streaming, and large‑scale messaging platform for the future.

Responsibilities
  • Oversee stability maintenance, performance tuning, and high‑availability architecture design for cloud middleware, including messaging middleware (Kafka/Rocket

    MQ).
  • Manage the containerized middleware lifecycle on Kubernetes clusters: implement deployments, auto‑scaling, version upgrades, and resource optimization in K8s environments.
  • Lead the troubleshooting of middleware‑related incidents (e.g., message backlog, service registration failures) through log analysis, distributed tracing, and monitoring systems.
  • Develop diagnostic tools using Java/Go to resolve production issues, performance bottlenecks, and compatibility challenges.
  • Build Python/Go/Shell automation tools to standardize middleware deployment, monitoring, and disaster recovery workflows.
  • Implement chaos engineering experiments, capacity planning strategies, and failover mechanisms to enhance system resilience.
  • Strong scripting skills in Shell/Python and experience with Infrastructure as Code (IaC) tools (Terraform preferred).
Qualifications
  • Over 2 years of experience in distributed systems reliability engineering, familiar with high‑availability architecture design, and proficient in at least one of Python, Go, or Java.
  • Experience with messaging middleware cluster management, message reliability assurance, and performance optimization for Kafka/Rocket

    MQ.
  • Hands‑on experience deploying middleware on Kubernetes (Helm/Operator preferred).
  • Ability to convert operations experience into automated solutions and familiarity with various message middleware, e.g., Kafka and Rocket

    MQ.
Preferred Qualifications
  • Familiar with core SRE practices (incident review, error budgeting, chaos engineering) and experienced in building automated risk control systems.

The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job‑related knowledge, skills, and experience.

If hired, employee will be in an “at‑will” position and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.

Job Details
  • Seniority level:
    Entry level
  • Employment type:

    Full‑time
  • Job function:
    Software Development
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary