Principal Solution Architect - AI Infrastructure
Listed on 2026-01-12
-
IT/Tech
AI Engineer, Systems Engineer, Cloud Computing, Data Engineer
Overview
Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like one of the world’s most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We’re looking for talented team members who want to Dream. Do. Grow.
with us.
Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like one of the world’s most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We’re looking for talented team members who want to Dream. Do. Grow.
with us.
An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company—delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment.
To save time applying, Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time.
Who we’re looking forToyota Financial Services is seeking a Principal Solution Architect - AI Infrastructure to design the systems and platforms that enable enterprise-scale MLOps and LLMOps. This is a senior‑level individual contributor role within the Architecture organization, focused on building the foundational infrastructure for deploying, managing, and scaling AI and GenAI workloads in production.
You’ll work across cloud infrastructure, platform engineering, data, and cybersecurity teams to architect robust, secure, and performant environments that power model training, inference, orchestration, and retrieval‑augmented generation (RAG) systems.
This role is ideal for architects who deeply understand infrastructure, cloud‑native systems, and the unique demands of production‑scale AI workloads.
What you’ll be doing- Architect and evolve cloud‑native infrastructure to support AI/ML and LLM workloads in production
- Build platform capabilities for MLOps and LLMOps—including model training, versioning, deployment, monitoring, and rollback
- Enable GPU‑accelerated compute environments optimized for model performance, scalability, and cost‑efficiency
- Integrate and standardize infrastructure for vector databases, model registries, and orchestration frameworks
- Establish reusable patterns for model serving, inference scaling, prompt management, and latency optimization
- Design secure, multi‑tenant environments that enforce access controls, auditability, and usage limits for AI models
- Partner with engineering, platform, and data teams to ensure seamless data flow, observability, and operational resiliency
- Contribute architecture documentation, governance models, and standards to support AI infrastructure delivery across teams
- Stay informed on emerging technologies in GenAI, distributed systems, and infrastructure trends
- 10+ years of experience in infrastructure or cloud architecture roles
- 3+ years building infrastructure to support AI/ML workloads (training, tuning, inference)
- Deep expertise with AWS and infrastructure‑as‑code tools (Terraform, CDK, Cloud Formation)
- Hands‑on experience with Kubernetes (EKS or equivalent), containerization, and CI/CD pipelines
- Strong knowledge of GPU infrastructure, serverless compute, and scalable microservice patterns
- Familiarity with model hosting, inference scaling, and observability (Datadog, Cloud Watch, Prometheus)
- Strong communication and documentation skills to align technical design across domains
- Experience with LLMOps tooling and GenAI infrastructure (e.g., Lang Chain, RAG pipelines, embedding stores)
- Worked with vector databases (e.g., Pinecone, FAISS, Weaviate), model registries, and orchestration tools (e.g., MLflow, Airflow, Ray)
- Knowledge of prompt management, token usage optimization, and model…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).