Site Reliability Engineer II Job Irving area,Texas USA,IT/Tech

Site Reliability Engineer (Irving, TX)

Join a world‑class team of skilled engineers who build creative digital solutions to support our colleagues and clients. We make a broad organizational impact by delivering cutting‑edge technology solutions that power Gartner. Gartner IT values its culture of nonstop innovation, an outcome‑driven approach to success, and the notion that great ideas can come from anyone on the team.

What we’re looking for:

Gartner is looking for a Site Reliability Engineer to join our collaborative, Agile team. This position will improve Gartner’s customer experience and increase the value of our products by increasing the reliability and performance of our client‑facing application and service offerings.

Why you’ll want to come to work:

Measure performance against SLOs in partnership with stakeholders, and ensure systems continue to meet SLOs over time.
Work to improve performance, scalability, and stability of applications.
Participate in operational support and on‑call rotation shifts for supported systems and products.
Respond to incidents in production and help triage the application/system issues and identify root causes or remediations to help restore services quickly.
Conduct blameless post‑mortems to troubleshoot priority incidents.
Use automation to reduce the probability and/or impact of problem recurrence.
Identify and evaluate alerting posture.
Create dashboards and reports to communicate key metrics.
Implement and manage Dev Ops capabilities using continuous integration/continuous delivery toolsets and automation.
Collaborate and share lessons learned regarding performance and reliability issues with all stakeholders, including developers, other SREs, operations teams, and project management teams.
Participate in continuous improvement in software quality and infrastructure reliability and resilience.
Build and maintain documentation for all assigned projects.
Build and maintain performance testing frameworks, tools, and methodologies.
Automate manual operational work (toil) using pipelines or by using new software or any other appropriate mechanisms.
Conduct analytics on previous incidents to understand root causes and better predict and prevent future issues; keep a proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
Participate with stakeholders such as Dev teams or product owners to define service level objectives (SLOs) for application & system operations.
Collaborate with development teams to promote the concept of reliability engineering during all phases of the SDLC to detect and correct performance issues and meet availability goals.

What you’ll bring to the team:

5+ years of information technology experience with 3+ years working on a Dev Ops/SRE team or similar.
Experience with incident and response management.
Experience with AWS cloud, specifically services such as EC2, EKS, API GW, Lambda, etc. or similar cloud technologies & services.
Experience with back‑end technologies such as J2EE, JDBC, Tomcat, .NET Core/C#, Spring, Hibernate, etc.
Experience with building tools to automate production support activities that enable efficiency and productivity of support teams.
Prior experience in working as a Cloud Dev Ops Engineer, Build & Release Engineer, System Administrator is preferred.
Prior experience in integrated Docker container orchestration framework using Kubernetes by creating pods, config maps, deployments using Jenkins.
Working knowledge of client‑side technologies such as NodeJS, JavaScript, React, JQuery.
Experience with troubleshooting, root‑cause analysis, application design, and implementing components.
Working experience with monitoring tools like Splunk and APM tools such as Dynatrace, Data Dog, New Relic, App Dynamics, etc.
Working knowledge of production support processes such as incident/change/problem management, call triaging and escalation procedures.
Exposure to Akamai/Cloudflare/Cloudfront as CDN.
Strong operating systems (UNIX/Linux) background.
Preferred:
Exposure to performance engineering concepts.
Desired:
Exposure to chaos testing or chaos engineering.
Experience in collaborating with Dev/DBA/Architecture teams…


Increase/decrease your Search Radius (miles)



Job Posting Language