Principal Infrastructure Architect – Cloud & SaaS Platforms
San Jose, Santa Clara County, California, 95199, USA
Listed on 2026-02-28
-
IT/Tech
Systems Engineer, Cloud Computing
Job Title
Principal Infrastructure Architect – Cloud & SaaS Platforms
Role OverviewWe are a global leader in online protection, dedicated to making the digital world a safer place. We are seeking a highly experienced and hands‑on Principal Infrastructure Architect with a deep background in large‑scale multi‑cloud environments (AWS, GCP, Azure) and modern SaaS delivery. This is a unique opportunity to lead the architectural evolution of our platform, driving a critical migration from legacy EC2 topologies to cloud‑native EKS/Kubernetes clusters, and designing the backbone for our next‑generation AI and real‑time data services.
We highly value experience gained at FAANG or other leading Big Tech companies.
This is a Hybrid position located at either San Jose or Newport Beach, CA offices. You will be required to be on‑site 2 to 3 days per week. When you are not working on‑site, you will be working from your home office. We are only considering candidates within a commutable distance to either San Jose or Newport Beach, CA offices and are not offering relocation assistance at this time.
Aboutthe Role
Cloud Native Strategy & Migration: Lead the architectural design and execution of migrating legacy EC2‑based workloads to Amazon EKS and Kubernetes. Define standards for multi‑region availability, auto‑scaling, and spot instance orchestration.
Advanced Traffic Management: Architect and deploy high‑performance API Gateways and specialized LLM Gateways to manage traffic for Generative AI workloads. Implement Service Mesh (e.g., Istio, Linkerd) for advanced traffic splitting, mTLS, and observability.
Real‑Time Data Infrastructure: Design robust infrastructure for diverse storage engines, including AWS‑native databases (Dynamo
DB, Aurora),
OLAP systems, and real‑time databases like Aerospike and Druid to support sub‑millisecond latency requirements.Event‑Driven Backbone: Architect scalable Pub/Sub messaging systems (Kafka, SNS/SQS, Pulsar) to decouple microservices and enable event‑driven architectures at internet scale.
Comprehensive Observability: Define and implement a unified observability strategy based on Open Telemetry (OTLP) standards. Integrate platforms like Grafana, Datadog, and Graylog to provide a "single pane of glass" visibility into logs, metrics, and traces.
Identity & Security Engineering: Modernize Authentication and Authorization systems (OIDC, OAuth2, SPIFFE/SPIRE). Deploy and manage centralized Secret Stores (Hashi Corp Vault, AWS Secrets Manager), security gateways, and automated certificate management systems.
Infrastructure as Code (IaC): Champion a "Git Ops" culture by treating infrastructure as software. Enforce best practices using Terraform, Crossplane, or Pulumi
, ensuring all environments are reproducible and audit‑compliant.Technical Leadership: Mentor senior infrastructure engineers, drive "Well‑Architected" reviews, and collaborate with software teams to ensure infrastructure supports rapid product iteration.
10+ years of professional experience in infrastructure engineering and architecture, with a proven track record of managing large‑scale cloud deployments in AWS (primary), GCP, or Azure.
Kubernetes Expert: Deep, hands‑on mastery of Kubernetes and EKS
, including experience with custom controllers, operators, and migrating stateful/stateless workloads from VMs to containers.Database Reliability: Strong experience architecting infrastructure for high‑scale data systems, specifically real‑time stores (
Aerospike, Redis
) and analytics engines (
Druid, Click House
).Security & Compliance: Extensive experience designing secure infrastructure, including implementation of Zero Trust networks, WAFs, and secret management systems. Familiarity with compliance standards (SOC2, PCI‑DSS) is a plus.
Observability Stack: Proven ability to build observability pipelines from scratch using Open Telemetry collectors and backend visualization tools (Prometheus/Grafana/Datadog).
Automation & Scripting: Expert‑level proficiency in Go, Python, or Bash
. You automate toil relentlessly and have deep experience with CI/CD pipelines (Git Lab CI, Git Hub Actions, ArgoCD).Networking Fundamentals: Deep…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).