Site Reliability Engineer
Listed on 2026-01-12
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
At Scalene Works People Solutions
, we're more than recruiters; we're career architects dedicated to connecting exceptional talent with top-tier opportunities. Backed by industry experts, we prioritize relationships, offer global opportunities, and champion your success every step of the way.
Ready to shape your future? Explore opportunities with us today.
We are looking for a Site Reliability Engineer (SRE) for our well‑known client.
Job Title: Site Reliability Engineer (SRE)
Experience: 5 to 8 years
Job Description: We are seeking a Site Reliability Engineer (SRE) to design, build, and maintain highly available, resilient, and scalable systems. You will collaborate closely with engineering, product, and operations teams to ensure our Java/Spring Boot applications run smoothly 24/7 in a cloud environment. Additionally, you will drive the adoption of analytics and data‑driven insights to optimize system performance and extract value from operational data.
Key Responsibilities- Reliability & Scalability
:
Design, implement, and maintain systems that are robust, scalable, and highly available, supporting millions of daily transactions. - Cloud Migration
:
Lead and support migration of applications and infrastructure to public cloud platforms, ensuring best practices in security, reliability, and cost management
. - Automation & Infrastructure as Code
:
Develop and maintain automation scripts and infrastructure using Kubernetes and Terraform
. - Monitoring & Incident Response
:
Build and enhance monitoring, alerting, and observability solutions. Respond to incidents, perform root cause analysis, and drive continuous improvement. - Collaboration
:
Partner with software engineers, product managers, and business stakeholders to deliver solutions that meet business needs and operational requirements. - Analytics & Data Insights
:
Leverage cloud‑based analytics tools to monitor system health, optimize performance, and extract actionable insights. - Continuous Improvement
:
Identify and implement opportunities to improve reliability, efficiency, and scalability of the platform.
- Proven experience as a Site Reliability Engineer, Dev Ops Engineer, or similar role supporting large‑scale, mission‑critical systems.
- Strong hands‑on experience with Kubernetes and Terraform
. - Experience deploying and operating applications in public cloud environments (AWS, Azure, GCP).
- Solid understanding of Java and Spring Boot applications
. - Experience with monitoring, logging, and observability tools (Prometheus, Grafana, ELK, Splunk).
- Strong troubleshooting and problem‑solving skills.
- Excellent communication and collaboration skills.
- Experience in financial services or payments/transaction processing environments
. - Familiarity with cloud‑based analytics platforms and data engineering concepts.
- Experience with CI/CD pipelines and automation tools (Jenkins, Git Hub Actions).
- Knowledge of security best practices in cloud environments.
KRAZ
Referrals increase your chances of interviewing at Scalene Works People Solutions LLP by 2x.
Apply BELOW
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: