Principal Infrastructure Architect – Cloud & SaaS Platforms
San Jose, Santa Clara County, California, 95199, USA
Listed on 2026-02-28
-
IT/Tech
Systems Engineer, Cloud Computing
Job Title
Principal Infrastructure Architect – Cloud & SaaS Platforms
We are a global leader in online protection, dedicated to making the digital world a safer place. We are seeking a highly experienced and hands-on Principal Infrastructure Architect with a deep background in large-scale multi-cloud environments (AWS, GCP, Azure) and modern SaaS delivery. This is a unique opportunity to lead the architectural evolution of our platform, driving a critical migration from legacy EC2 topologies to cloud-native EKS/Kubernetes clusters, and designing the backbone for our next-generation AI and real-time data services.
We highly value experience gained at FAANG or other leading Big Tech companies.
Role Location
This is a Hybrid position located at either or San Jose or Newport Beach, CA offices. You will be required to be on-site 2 to 3 days per week. When you are not working on-site, you will be working from your home office. We are only considering candidates within a commutable distance to either San Jose or Newport Beach, CA offices and are not offering relocation assistance at this time.
AboutThe Role
- Cloud Native Strategy & Migration:
Lead the architectural design and execution of migrating legacy EC2-based workloads to Amazon EKS and Kubernetes. Define standards for multi-region availability, auto-scaling, and spot instance orchestration. - Advanced Traffic Management:
Architect and deploy high-performance API Gateways and specialized LLM Gateways to manage traffic for Generative AI workloads. Implement Service Mesh (e.g., Istio, Linkerd) for advanced traffic splitting, mTLS, and observability. - Real-Time Data
Infrastructure: Design robust infrastructure for diverse storage engines, including AWS-native databases (Dynamo
DB, Aurora), OLAP systems, and real-time databases like Aerospike and Druid to support sub-millisecond latency requirements. - Event-Driven Backbone:
Architect scalable Pub/Sub messaging systems (Kafka, SNS/SQS, Pulsar) to decouple microservices and enable event-driven architectures at internet scale. - Comprehensive Observability:
Define and implement a unified observability strategy based on Open Telemetry (OTLP) standards. Integrate platforms like Grafana, Datadog, and Graylog to provide a single pane of glass visibility into logs, metrics, and traces. - Identity & Security Engineering:
Modernize Authentication and Authorization systems (OIDC, OAuth2, SPIFFE/SPIRE). Deploy and manage centralized Secret Stores (Hashi Corp Vault, AWS Secrets Manager), security gateways, and automated certificate management systems. - Infrastructure as Code (IaC):
Champion a Git Ops culture by treating infrastructure as software. Enforce best practices using Terraform, Crossplane, or Pulumi, ensuring all environments are reproducible and audit-compliant. - Technical Leadership:
Mentor senior infrastructure engineers, drive Well-Architected reviews, and collaborate with software teams to ensure infrastructure supports rapid product iteration.
- 10+ years of professional experience in infrastructure engineering and architecture, with a proven track record of managing large-scale cloud deployments in AWS (primary), GCP, or Azure.
- Kubernetes Expert:
Deep, hands-on mastery of Kubernetes and EKS, including experience with custom controllers, operators, and migrating stateful/stateless workloads from VMs to containers. - Database Reliability:
Strong experience architecting infrastructure for high-scale data systems, specifically real-time stores (Aerospike, Redis) and analytics engines (Druid, Click House). - Security & Compliance:
Extensive experience designing secure infrastructure, including implementation of Zero Trust networks, WAFs, and secret management systems. Familiarity with compliance standards (SOC2, PCI-DSS) is a plus. - Observability Stack:
Proven ability to build observability pipelines from scratch using Open Telemetry collectors and backend visualization tools (Prometheus/Grafana/Datadog). - Automation & Scripting:
Expert-level proficiency in Go, Python, or Bash. You automate toil relentlessly and have deep experience with CI/CD pipelines (Git Lab CI, Git Hub Actions, ArgoCD). - Networking Fundamentals:
D…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).