Principal AI Engineer
Listed on 2026-03-04
-
Software Development
Software Engineer
Our client, a large, complex enterprise organization, is looking for a Principal AI Engineer to serve as the lead architect and hands‑on builder of a unified AI Platform as a Service (PaaS) a secure, multi‑tenant foundation that helps internal teams build and operate semantic discovery, conversational experiences, and autonomous agent workflows at scale.
SummaryThe Principal AI Engineer will design and build an enterprise‑grade “AI operating layer” that turns modern foundation model capabilities into a governed, reusable platform used across multiple business domains. This role balances approximately 40% hands‑on development with 60% platform strategy
, personally building core orchestration services, standardized capability interfaces, and trust/safety guardrails. The platform will enable teams to deploy specialized agents that can collaborate via defined protocols, securely access grounded knowledge sources, and execute autonomous tasks within a controlled, high‑availability runtime.
Remote (U.S.) / Hybrid options may be available based on client needs.
Compensation Range$145,000 - $250,000 per year plus RSUs & Bonus
Responsibilities- Architect and deliver a self‑service AI platform that provides reusable patterns, reference implementations, and standardized building blocks for internal engineering teams.
- Define and communicate a multi‑year platform roadmap, ensuring technical priorities map to enterprise outcomes and adoption goals.
- Design and implement stateful orchestration (state graphs/state machines) to handle planning edge cases, recovery, and self‑correction in autonomous workflows.
- Build and operate secure remote tool gateways (e.g., MCP‑style servers) and implement controlled function‑calling interfaces for connecting agents to sensitive enterprise systems.
- Establish interoperability standards for agent‑to‑agent collaboration, enabling autonomous discovery and reliable task handoffs across independently built agent solutions.
- Design an agent identity and authorization layer that supports fine‑grained permissions, auditable actions, and strong accountability for autonomous behaviors.
- Implement a unified knowledge layer using semantic retrieval and multimodal grounding to support accurate, “source‑aligned” responses and decisions.
- Build long‑term context persistence (“memory”) using retrieval and graph‑based storage to preserve institutional knowledge and improve continuity over time.
- Create a trust and evaluation layer with automated testing pipelines to measure quality, safety, cost, latency, and reliability of agent behavior across tenants.
- Own runtime lifecycle management for agent sessions, ensuring high availability, persistence, scalability, and controlled rollout patterns.
- Lead deep code reviews focused on agent‑specific failure modes (runaway loops, tool misuse, state growth, unreliable calling patterns) and implement mitigations.
- Optimize inference performance and spend through techniques such as prompt caching, model routing, and workload‑aware runtime strategies.
- Act as a technical multiplier by mentoring senior/staff engineers on advanced agentic patterns, evaluation methods, and production hardening.
- Partner closely with Cloud and Infrastructure teams to influence enabling services and platform primitives needed for enterprise AI delivery.
- Raise the bar on engineering quality via documentation, profiling, reliability improvements, and ongoing performance tuning.
- 10+ years of software engineering experience, including 4+ years operating at a Principal/Architect level.
- 2+ years architecting and shipping LLM‑based systems, with demonstrated experience taking agentic solutions into production at scale.
- 5+ years working in agile delivery environments.
- Google Cloud Professional Cloud Architect certification.
- Proven ability to lead technical work streams and translate business needs into durable platform architectures.
- Strong expertise in asynchronous orchestration (e.g., Python) plus proficiency in a statically typed language (Java, Go, or Rust) for high‑concurrency platform services.
- Hands‑on experience with stateful graph orchestration patterns and frameworks (e.g.,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).