More jobs:
Job Description & How to Apply Below
The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies. REQUIREMENTS:
Should have 5 to 8 years of experience.
Well-versed with scripting/programming languages (Python/Bash/Power Shell, etc.) to automate manual work, particularly within cloud environments.
Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure.
Working experience with automation tools (Jenkins, Git Lab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud.
Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments.
Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations.
provide on-call support and participate in incident management & response activities as needed.
Expert with troubleshooting production issues and bugs.
Good knowledge of Unix systems, networking, web technologies, and databases.
Incident Management experience coupled with effective communication skills for production workload.
Working knowledge in any one of the cloud platforms (AWS or GCP). What you'll do:
Lead reliability engineering projects and drive them to closure.
Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues.
Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services.
Automate processes and find opportunities to improve the observability and availability of the Platform to reduce toil.
Implement and manage observability tools for comprehensive monitoring, alerting, and logging.
Own end-to-end availability and performance of different services & tools.
Practice sustainable incident response and blameless postmortems.
Provide on-call support for incident management and participate actively in response activities.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×