×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in New York, New York County, New York, 10261, USA
Listing for: Salesforce
Full Time position
Listed on 2026-03-06
Job specializations:
  • IT/Tech
    Systems Engineer, IT Support
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Location: New York

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Software Engineering

Job Details About Salesforce

Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn’t a buzzword — it’s a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.

Ready to level-up your career at the company leading workforce transformation in the agentic era? You’re in the right place! Agentforce is the future of AI, and you are the future of Salesforce.

Job Title

Senior Site Reliability Engineer

About

The Role

The Site Reliability Engineering team is part of the Digital Enterprise Technology Platform Engineering organization, responsible for architecting, scaling, and maintaining the IT monitoring and observability ecosystem. You will ensure Enterprise IT services' reliability by driving proactive telemetry strategies and deep-system visibility.

We're looking for a self-starter with the ability to take ownership of tasks, work under pressure, and balance multiple assignments simultaneously while maintaining a positive outlook. You'll lead the evolution of observability frameworks, contribute ideas, and provide feedback on complex monitoring architectures while providing expertise for IT projects and enhancements across various IT organizations.

Responsibilities
  • Manage, assess, plan, and support core observability platform operations and strategy.
  • Lead process changes and implementations related to the monitoring and logging stack (e.g., Splunk, Grafana, New Relic).
  • Provide escalation support for configuration and platform issues, participating in on-call schedules to resolve major incidents using deep-dive observability data.
  • Collaborate with key stakeholders (Service Managers, Product Managers, Application Architects, Business Support, and Operations) to gather and develop complex monitoring and alerting requirements.
  • Develop AI, automation, and integrations to deliver predictive monitoring and automated anomaly detection.
  • Work with third-party vendors and partners to address platform-related enhancements and evaluate next-gen observability tooling.
  • Support and manage the introduction of new monitoring tools and orchestrate migrations to modern Open Telemetry-based standards.
  • Present reports on Service Level Indicators (SLIs), Service Level Objectives (SLOs), and correlation metrics to the Enterprise Operations team periodically.
  • Work under Agile scrum methodology and provide technical mentorship on observability best practices to junior team members.
  • Create standard operating procedures for monitoring-as-code and share them with the team for effective execution.
Minimum Qualifications
  • Bachelor's degree in Computer Science or related technical field, or equivalent experience in technical leadership
  • 7 - 10 years of experience designing and implementing distributed systems to handle large-scale telemetry and log data
  • 7 - 10 years of experience building and scaling high-volume observability pipelines.
  • Proven mastery of full-stack observability suites (Splunk, Thousand Eyes, or similar).
  • Direct experience implementing Open Telemetry (OTel) standards.
  • Strong background in "Monitoring as Code" using Terraform or similar automation tools.
  • Demonstrable ability in Bash/Powershell, Python, and JavaScript (NodeJS), especially program comprehension
  • Understanding of REST-based API design principles and best practices
  • Experience with server administration (Linux and Windows)
  • Knowledge of monitoring tools like Zabbix, Splunk, Grafana, New Relic, or Thousand Eyes
  • Experience with AWS public cloud and VMware vSphere
  • Knowledge of configuration management and orchestration tools like Puppet, Ansible, or Terraform
  • Experience with Docker and containerized applications
  • Strong troubleshooting and debug skills (reading log files, analyzing memory leaks)
  • Strong analytical skills and…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary