Principal Engineer, Federal Cloud Platform
Listed on 2026-02-28
-
Software Development
Saviynt is a leader in identity security, delivering an AI-powered platform that governs and secures access to applications, data, and business processes for some of the world’s largest enterprises and government institutions. Built for the AI era, Saviynt enables organizations to move faster—securely and compliantly.
Why This Role MattersSaviynt’s platform is mission-critical for our customers. As we scale globally, reliability, availability, and performance are not optional—they are core product features.
As a Principal Engineer, you will define and drive the reliability strategy for our SaaS platform. This is a high-impact, hands‑on engineering role with broad influence across infrastructure, platform, and application teams. You will shape how Saviynt designs, operates, and measures reliability at scale.
This role is ideal for engineers who want to work on hard reliability problems, influence architecture across teams, and leave a lasting mark on a growing SaaS platform in Federal.
What You’ll Do- Design, build, and maintain shared infrastructure services and platforms that our product and application teams depend on.
- Manage vulnerability management and hold teams accountable to meet customer‑facing Service Level Agreements (SLAs).
- Design Continuous Delivery (CD) processes for government deployments that will eventually be used commercially.
- Develop robust, internal‑facing tools and automation for infrastructure provisioning and management primarily using Go (Golang) or Python.
- Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.), focusing on reusable patterns and modules for other teams.
- Design and implement shared Event‑Driven Architecture components and messaging platforms using technologies like Kafka or Google Pub/Sub that product teams can easily utilize.
- Design and build resilient Distributed Systems components that serve as building blocks for other applications, focusing on reliability, fault tolerance, and performance.
- Manage and optimize shared infrastructure across Multi‑Region Cloud Environments, ensuring that platform services are globally available and performant for all consumers.
- Establish and enhance centralized Observability and Monitoring platforms and tools that provide self‑service insights for consuming teams.
- Define and implement clear, well‑documented RESTful API designs for the infrastructure services you build, ensuring ease of integration for internal clients.
- Implement and manage Service Mesh (e.g., Envoy, Istio) capabilities, providing traffic management, security, and policy enforcement as a shared platform for services.
- Design, implement, and optimize highly available Relational Database services or shared data platforms for broad organizational use.
- Collaborate closely with product development teams to understand their infrastructure needs and pain points, providing technical guidance and support.
- Participate in on‑call rotations to support the critical shared infrastructure you build.
- 9+ years of experience in an Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a strong focus on building tools and services for other engineers.
- Deep expertise with Kubernetes in production environments, particularly in providing it as a platform (single‑tenant and multi‑tenant deployment architectures).
- Strong programming skills in Go (Golang) and Python, with experience building robust, maintainable backend services and automation.
- Extensive hands‑on experience with at least one major Cloud Provider (AWS, GCP, or Azure); multi‑cloud experience is a strong plus, especially in building abstractions over them.
- Proven experience designing and implementing Event‑Driven Architecture and message queuing systems (e.g., Kafka, RMQ, NATS) as shared services.
- Solid understanding and practical experience with CI/CD pipeline tools (especially Git Lab CI) and experience establishing automated delivery processes for other teams.
- Demonstrable experience designing and operating Distributed Systems, with an understanding of patterns for creating reliable, shared components.
- Familiarity with Multi‑Region Cloud…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).