Software Engineer - Principal Member of Technical Staff; PMTS - Availability
Listed on 2026-03-01
-
IT/Tech
Systems Engineer
To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.
Job CategorySoftware Engineering
Job DetailsAbout Salesforce
Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn't a buzzword - it's a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.
Ready to level-up your career at the company leading workforce transformation in the agentic era? You're in the right place! Agentforce is the future of AI, and you are the future of Salesforce.
Role DescriptionThe Availability Standards team is part of the overall Salesforce technology organization. We manage the high-level frameworks used to measure platform uptime and performance, bridging the gap between centralized reporting and the individual engineering teams that own specific services. We follow a consultative engineering approach where our experts partner with service owners to build a deep understanding of service health, telemetry, and automated testing.
This level of expertise allows our team to advocate for the customer and influence the product roadmap by ensuring that every service team has the visibility they need to maintain world-class availability.
Role Description:
The Engineering Availability Standards position is a critical role designed for a seasoned engineering veteran who has experience managing, leading, or coordinating with high-scale cloud services. Your mission is to transform how we calculate, visualize, and act upon platform health data. You will serve as the technical bridge between our global availability standards and the distributed engineering teams that power our infrastructure.
You will be responsible for shifting our monitoring strategy from simple reporting into active, high-fidelity signals that engineering teams use for real-time alerting and incident response. This role requires the ability to influence technical roadmaps across different product families and automate the integration of reliability testing and observability into standard software development life cycles.
Job Responsibilities- Utilize software engineering skills and production experience to provide input into long-range platform requirements and operational guidelines, with a focus on making health data actionable for service owners.
- Analyze and understand how service teams manage their telemetry, and help drive continuous improvement of health signals based on the knowledge of specific service architectures.
- Partner with internal engineering teams to integrate global availability standards into their existing monitoring pipelines, dashboards, and automated alerting flows.
- Identify and mitigate friction in the onboarding process by leveraging existing automated test suites to create high-quality, streamlined reliability signals with minimal manual effort.
- Serve as a technical subject matter expert to ensure that centralized infrastructure services (logging, monitoring, and data platforms) are optimized to support the needs of individual service owners.
- Quarterback the integration of failure signals into standard engineering workflows, ensuring that detected issues result in automated work items and proactive investigations.
- Deliver presentations highlighting availability metrics, reliability trends, and success stories to diverse engineering and leadership audiences.
- A related technical degree required.
- 5+ years of proven experience in production environments (this could include previous experience as a software engineer, systems engineer, service owner, or lead developer).
- Fluency in Java or a similar object-oriented language (Python, C++, etc.) to provide input on platform requirements and automation.
- Deep understanding of telemetry systems and experience building or managing production monitoring and alerting frameworks.
- Experience using Linux environments and the…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).