More jobs:
Senior Site Reliability Engineer
Job in
Indianapolis, Marion County, Indiana, 46218, USA
Listed on 2026-03-03
Listing for:
Salesforce.com, Inc.
Full Time
position Listed on 2026-03-03
Job specializations:
-
IT/Tech
Systems Engineer, IT Support, Cloud Computing, Cybersecurity
Job Description & How to Apply Below
Job Category
Software Engineering
Job Details
About Salesforce
Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn't a buzzword - it's a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.
Ready to level-up your career at the company leading workforce transformation in the agentic era? You're in the right place! Agentforce is the future of AI, and you are the future of Salesforce.
Job Title:
Senior Site Reliability Engineer
About the Role:
The Site Reliability Engineering team is part of the Digital Enterprise Technology Platform Engineering organization, responsible for architecting, scaling, and maintaining the IT monitoring and observability ecosystem. You will ensure Enterprise IT services' reliability by driving proactive telemetry strategies and deep-system visibility.
We're looking for a self-starter with the ability to take ownership of tasks, work under pressure, and balance multiple assignments simultaneously while maintaining a positive outlook. You'll lead the evolution of observability frameworks, contribute ideas, and provide feedback on complex monitoring architectures while providing expertise for IT projects and enhancements across various IT organizations.
Responsibilities:
* Manage, assess, plan, and support core observability platform operations and strategy.
* Lead process changes and implementations related to the monitoring and logging stack (e.g., Splunk, Grafana, New Relic).
* Provide escalation support for configuration and platform issues, participating in on-call schedules to resolve major incidents using deep-dive observability data.
* Collaborate with key stakeholders (Service Managers, Product Managers, Application Architects, Business Support, and Operations) to gather and develop complex monitoring and alerting requirements.
* Develop AI, automation, and integrations to deliver predictive monitoring and automated anomaly detection.
* Work with third-party vendors and partners to address platform-related enhancements and evaluate next-gen observability tooling.
* Support and manage the introduction of new monitoring tools and orchestrate migrations to modern Open Telemetry-based standards.
* Present reports on Service Level Indicators (SLIs), Service Level Objectives (SLOs), and correlation metrics to the Enterprise Operations team periodically.
* Work under Agile scrum methodology and provide technical mentorship on observability best practices to junior team members.
* Create standard operating procedures for monitoring-as-code and share them with the team for effective execution.
Minimum Qualifications:
* Bachelor's degree in Computer Science or related technical field, or equivalent experience in technical leadership
* 5-8 years of experience designing and implementing distributed systems to handle large-scale telemetry and log data
* 5-8 years of experience building and scaling high-volume observability pipelines.
* Proven mastery of full-stack observability suites (Splunk, Thousand Eyes, or similar).
* Direct experience implementing Open Telemetry (OTel) standards.
* Strong background in "Monitoring as Code" using Terraform or similar automation tools.
* Demonstrable ability in Bash/Powershell, Python, and JavaScript (NodeJS), especially program comprehension
* Understanding of REST-based API design principles and best practices
* Experience with server administration (Linux and Windows)
* Knowledge of monitoring tools like Zabbix, Splunk, Grafana, New Relic, or Thousand Eyes
* Experience with AWS public cloud and VMware vSphere
* Knowledge of configuration management and orchestration tools like Puppet, Ansible, or Terraform
* Experience with Docker and containerized applications
* Strong troubleshooting and debug skills (reading log files, analyzing…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×