Associate Manager - Reliability Operations
Job in
500016, Prakāshamnagar, Telangana, India
Listed on 2026-02-05
Listing for:
Confidential
Full Time
position Listed on 2026-02-05
Job specializations:
-
IT/Tech
SRE/Site Reliability, IT Support, Cloud Computing, Systems Engineer
Job Description & How to Apply Below
About us Build the future of banking.
is a next-generation banking technology company providing cloud-native, fully stackable processing and core banking platforms for issuers. With a focus on scalability, compliance, and innovation, Zeta empowers financial institutions to modernize their technology infrastructure and deliver secure, seamless digital banking experiences.
Our impact runs at real-world scale. Today, over 25 million cards are live on Zeta-powered platforms across 7 countries, supported by a passionate team of 1,700+ Zetanauts across India, the US, EMEA, and Asia. Backed by Soft Bank Vision Fund, Mastercard , and other reputed strategic investors, we reached a valuation of $2 billion in 2025.
Our focus is on establishing product lines that focus on key outcomes by addressing real customer pain points, modernizing legacy systems, and strengthening core fundamentals. As a result, our systems and platforms support a wide range of banking and payments capabilities, including:
1. Tachyon , our cloud-native banking stack built for population-scale systems
2. Cipher , our unified authentication platform for secure, high-volume banking environments.
3. Digital Credit as a Service , enabling banks to launch credit lines on UPI. 4. Elena , our intelligent and conversational AI platform for banking.
5. Pixel , India's first digital-native credit card, launched in partnership with HDFC Bank , for whom we also revamped their Pay Zapp mobile app:
Winner of the Celent Model Bank Award for Payments Innovation 2024.
6. Sparrow , the leading card experience for non-prime cardholders in the US …and more across cards, payments, lending, and core banking.
We are an engineering-first organization that values ownership, bias for action, and long-term thinking. Together, we solve some of the hardest problems in banking tech. Our culture is built around trust, collaboration, and creating the conditions for you to drive impact proportionate to your potential. Reinforcing our commitment to creating an inclusive and supportive workplace, we have been consistently recognized as a Great Place to Work.
If you want to build cutting-edge banking tech that enables banks to serve millions reliably, securely, and at a population scale, Zeta is your playground.
If you would like to learn more about how we have grown and evolved over the years, . You can also explore and follow us on , , and .
The Role:
The Associate Manager - Reliability Operations leads a team to rigorously uphold service level objectives (SLOs) through expert alert management, SOP-compliant ticket escalations, and coordinated support for SRE-signed deployments across multiple sites.
This role drives operational accountability, fosters seamless SRE partnerships, and ensures production stability in a high stakes 24x7 SaaS environment
Responsibilities
Drives SLO adherence by implementing advanced metric monitoring, enforcing error budgets, and spearheading proactive initiatives to prevent breaches and elevate system reliability.
Ensures all alerts receive immediate acknowledgment, with tickets escalated to SRE teams for any issues lacking defined SOPs, systematically reducing escalations, downtime, and MTTR.
Coordinates standard deployments across sites following SRE sign-off, overseeing logistics, real-time rollout health monitoring, and rigorous post-deployment SLO validation.
Collaborates strategically with SRE teams on deployment planning, comprehensive risk assessments, troubleshooting, and post-release optimizations for flawless execution and rapid recovery.
Oversees and refines team processes for alert triage, SOP documentation/updates, and knowledge sharing, integrating automation to minimize manual toil and enhance operational resilience.
Mentors staff on SLO-driven decision-making, conducts in-depth audits of alert/ticket workflows, analyses trends in operational data, and delivers actionable reliability KPI reports to stakeholders.
Skills
Proven track record in 24x7 SaaS/cloud support operations, handling high-pressure incidents and customer-impacting events.
Strong proficiency in monitoring/incident tools (Prometheus,…
Position Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×