Site Reliability Engineer - Data Availability
Georgia, USA
Listed on 2026-03-02
-
IT/Tech
SRE/Site Reliability, Cloud Computing, IT Support, Systems Engineer
Site Reliability Engineer - Data Availability
Singapore | working from home up to 40% | Reference 7751
We are looking for a skilled and reliability-driven Site Reliability Engineer (SRE) to strengthen our engineering team. In this hybrid role, you will combine hands‑on 2nd level support responsibilities with monitoring, automation, and reliability engineering. You will play a key role in ensuring the stability, observability, and continuous improvement of our production systems supporting real‑time financial data processing.
What You Will Do Operational Support & Incident Management- Provide 2nd level support for production systems and critical business applications.
- Investigate, troubleshoot, and resolve incidents and performance issues.
- Perform root cause analysis (RCA) and document findings in a structured manner.
- Collaborate closely with development teams to ensure sustainable issue resolution.
- Contribute to post‑incident reviews and continuous improvement initiatives.
- Design, implement, and maintain monitoring dashboards.
- Improve alert quality and reduce noise through effective threshold and metric design.
- Analyze logs, metrics, and system behavior to proactively detect anomalies.
- Automate operational processes using Ansible and scripting.
- Contribute to CI/CD and deployment reliability improvements.
- Continuously optimize system reliability, availability, and operational efficiency.
- Proven experience in Site Reliability Engineering, Dev Ops, or 2nd level production support.
- Strong analytical and troubleshooting skills in complex distributed environments.
- Structured, solution‑oriented approach with strong ownership mindset.
- Effective communication skills and ability to work with cross‑functional teams.
- Motivation to reduce manual effort through automation and process improvements.
- Hands‑on experience with Elastic Stack and Grafana for monitoring and logging.
- Experience with Ansible for configuration management and automation.
- Experience with Git/Git Lab.
- Familiarity with scripting languages.
- Good understanding of networking fundamentals (TCP/IP, DNS, HTTP).
- Experience with Linux systems and shell scripting.
- Strong verbal and written English.
If you have any questions, check out our FAQ page or call Anthony Mills at .
For this vacancy we only accept direct applications.
Diversity is important to us. Therefore, we are looking to receive applications regardless of any personal background.
What We Offer Flexible Work ModelsWe trust our employees and offer a work environment that is well‑balanced, productive and fosters success.
Personal DevelopmentYou will benefit from a culture of continuous learning and feedback. Your personal growth is supported through an extensive learning offering.
Agile Working MethodsWhether through scrum or design thinking, we solve exciting tasks together in teams.
Be careful – don’t provide your bank or credit card details when applying for jobs. Don’t transfer any money or complete suspicious online surveys. If you see something suspicious, report this job ad.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).