Infrastructure Architect
Listed on 2026-02-28
-
IT/Tech
Systems Engineer, Cloud Computing
About the Role
Grade Level (for internal use): 12
We are seeking a highly skilled Infrastructure Architect to design, build, and maintain our next‑generation hybrid cloud platform. In this role, you will be the technical lead responsible for ensuring our systems are resilient, scalable, and secure. You will bridge the gap between traditional on‑premise data centers and modern hyperscale cloud environments, leveraging Kubernetes and Infrastructure as Code (IaC) to drive automation and efficiency.
If you are passionate about building highly available systems that never sleep and love solving complex problems at the intersection of hardware and software, we want to hear from you.
Key Responsibilities Architecture & Design- Hybrid Cloud Strategy: Design and implement a robust hybrid cloud architecture that seamlessly integrates on‑premise infrastructure with major hyperscalers (AWS, Azure, or GCP).
- High Availability (HA): Architect systems for 99.99% uptime, utilizing multi‑region failover strategies, load balancing, and disaster recovery planning.
- Container Orchestration: Lead the design and management of enterprise‑grade Kubernetes clusters (K8s) across both on‑prem and cloud environments.
- Infrastructure as Code (IaC): Define all infrastructure using code. Build and maintain extensive Terraform and Chef / Ansible repositories to automate provisioning, configuration, and drift detection.
- Immutable OS & Minimal Distributions: Architect solutions using container‑optimized, immutable Linux distributions (e.g., Talos Linux, Flatcar, Bottlerocket). Move away from traditional SSH‑based management to API‑driven OS configuration to reduce overhead and ensure consistency.
- Database Reliability: Oversee the architectural health of various database platforms (MySQL, Mongo
DB, etc.), ensuring proper clustering, sharding, and backup strategies are in place.
- Immutable Security Strategy: Move beyond traditional in‑place patching. Design automated workflows for node rotation and recycling to apply OS updates and security fixes, ensuring production environments always run on the latest, verified images.
- Supply Chain Security: Implement rigorous image scanning and signing (e.g., Cosign, Notary) within the CI/CD pipeline. Enforce Software Bill of Materials (SBOM) analysis to detect vulnerabilities in base images and dependencies before they reach the cluster.
- Runtime Defense: Architect runtime security solutions using eBPF‑based tools (e.g., Falco, Tetragon) to monitor for anomalous behavior and unauthorized system calls in a shell‑less environment.
- Policy as Code: Enforce cluster security and compliance using policy engines like OPA Gatekeeper or Kyverno to prevent misconfigurations and ensure strict pod security standards.
- Observability: Implement comprehensive monitoring and distributed tracing (Prometheus, Grafana, Jaeger) to ensure deep visibility into ephemerally pods and distinct infrastructure layers.
- Experience: 7+ years in System Engineering or Dev Ops, with at least 3 years in an Architecture role.
- Cloud Proficiency: Deep expertise in at least one major public cloud provider (AWS, Azure, GCP) and experience connecting them to on‑prem data centers.
- Kubernetes Expert: Proven experience managing Kubernetes in production (CKA certification is a plus).
- IaC Mastery: Advanced proficiency with Terraform and configuration management tools.
- Container‑Optimized OS
Experience:
Deep understanding of immutable infrastructure concepts and experience managing minimal Linux distributions designed exclusively for Kubernetes (no SSH, read‑only file systems, etc.). - Database Knowledge: Experience architecting high‑availability solutions for both SQL and No
SQL databases.
- Experience with Service Mesh technologies (Istio, Linkerd).
- Background in regulated industries (Finance, Healthcare) dealing with compliance standards (SOC2, HIPAA, PCI‑DSS).
- Programming experience in Python or Go for scripting and tooling.
We require all candidates who reach the final stage of our interview process to attend at least one in‑person interview, which is ordinarily at your…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).