×
Register Here to Apply for Jobs or Post Jobs. X

Sr Engineer, Site Reliability T500-22222

Job in 500001, Hyderabad, Telangana, India
Listing for: TMUS Global Solutions
Full Time position
Listed on 2026-02-04
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Job Description & How to Apply Below
Position: Sr Engineer, Site Reliability [T500-22222]
About T-Mobile:
T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

About TMUS Global Solutions:
TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.
TMUS India Private Limited operates as TMUS Global Solutions.

Job Overview:
At  T-Mobile , we don’t just build technology — we empower people. We believe in investing in  YOU  — your growth, your impact, and your future. We’re unstoppable when individuals like you come together to solve bold challenges, inspire innovation, and build platforms that serve millions.
As a Senior Site Reliability Engineer (SRE), you will help ensure the availability, performance, and stability of platforms powering T-Mobile’s finance, credit, collections, document management, and supply chain systems. You will collaborate with application developers, Dev Ops, and cloud teams to build reliable, observable, and automated systems. This role is ideal for engineers passionate about operational excellence, learning distributed systems, and scaling production environments using code and data.

Key Responsibilities:

Reliability Engineering & Operations:
Contribute to the availability and performance of large-scale, customer-facing systems through  monitoring, alerting, and incident response .
Assist in designing and implementing  resiliency strategies , including health checks, failovers, circuit breakers, and retries.
Participate in  on-call rotations , help triage incidents, and assist in root cause analysis and post-incident reviews.

Automation & CI/CD Support:
Develop  scripts, tools, and automation  to reduce manual toil and improve operational efficiency.
Support infrastructure deployment and service rollout via  CI/CD pipelines  and  Infrastructure-as-Code  workflows (e.g., Terraform, Helm).
Work with developers to improve  service deployment, configuration management , and rollback strategies.

Observability & Metrics:
Help build and maintain  dashboards, alerts, and logs  that provide visibility into system health and application behavior.
Use tools such as  Prometheus, Grafana, Splunk , or Open Telemetry to monitor services and infrastructure.
Analyze system performance data to guide optimizations and proactively detect issues.

Cross-Team Collaboration
Work with Dev Ops, SREs, and software engineers to ensure that services are  built for reliability and observability .
Contribute to documentation, runbooks, playbooks, and operational readiness reviews.
Support development teams in designing systems that meet  SLOs and operational standards .

Qualifications:

Bachelor’s degree in computer science, Engineering, or a related technical field.
8+ years of experience in infrastructure, operations, Dev Ops, or SRE roles.
Proficiency in scripting or programming languages such as Java, Python, Go, and Bash.
Strong familiarity with Linux systems, container orchestration (Kubernetes), and cloud platforms (Azure preferred/GCP also relevant).
Hands-on experience with monitoring and observability tools such as Grafana, Splunk, and Open Telemetry.
Expertise in Kubernetes and container orchestration, including Docker templates, Helm charts, and Git Lab templates.
Knowledge of authentication, authorization, encryption, SSL/TLS, SSH/SFTP, PKI, X.509 certificates, and PGP.
Solid understanding of incident management tools such as Service Now.

Preferred

Skills:

Exposure to  incident management frameworks , including alerting, escalation, and postmortem practices.
Understanding of  SRE principles : SLOs, SLIs, error budgets, and service-level indicators.
Familiarity with tools like  HAProxy, Envoy Proxy, Kafka, Rabbit

MQ , or other core infrastructure components.

Experience with performance tuning of Kubernetes runtime components.

Experience with  CI/CD…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary