Principal Solution Architect - AI Infrastructure Job Plano area,Texas USA,IT/Tech

Overview

Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like one of the world’s most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We’re looking for talented team members who want to Dream. Do. Grow.

with us.

Who we are

An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company—delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment.

To save time applying, Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time.

Who we’re looking for

Toyota Financial Services is seeking a Principal Solution Architect - AI Infrastructure to design the systems and platforms that enable enterprise-scale MLOps and LLMOps. This is a senior‑level individual contributor role within the Architecture organization, focused on building the foundational infrastructure for deploying, managing, and scaling AI and GenAI workloads in production.

You’ll work across cloud infrastructure, platform engineering, data, and cybersecurity teams to architect robust, secure, and performant environments that power model training, inference, orchestration, and retrieval‑augmented generation (RAG) systems.

This role is ideal for architects who deeply understand infrastructure, cloud‑native systems, and the unique demands of production‑scale AI workloads.

What you’ll be doing

Architect and evolve cloud‑native infrastructure to support AI/ML and LLM workloads in production
Build platform capabilities for MLOps and LLMOps—including model training, versioning, deployment, monitoring, and rollback
Enable GPU‑accelerated compute environments optimized for model performance, scalability, and cost‑efficiency
Integrate and standardize infrastructure for vector databases, model registries, and orchestration frameworks
Establish reusable patterns for model serving, inference scaling, prompt management, and latency optimization
Design secure, multi‑tenant environments that enforce access controls, auditability, and usage limits for AI models
Partner with engineering, platform, and data teams to ensure seamless data flow, observability, and operational resiliency
Contribute architecture documentation, governance models, and standards to support AI infrastructure delivery across teams
Stay informed on emerging technologies in GenAI, distributed systems, and infrastructure trends

What you bring

10+ years of experience in infrastructure or cloud architecture roles
3+ years building infrastructure to support AI/ML workloads (training, tuning, inference)
Deep expertise with AWS and infrastructure‑as‑code tools (Terraform, CDK, Cloud Formation)
Hands‑on experience with Kubernetes (EKS or equivalent), containerization, and CI/CD pipelines
Strong knowledge of GPU infrastructure, serverless compute, and scalable microservice patterns
Familiarity with model hosting, inference scaling, and observability (Datadog, Cloud Watch, Prometheus)
Strong communication and documentation skills to align technical design across domains

Added bonus if you have

Experience with LLMOps tooling and GenAI infrastructure (e.g., Lang Chain, RAG pipelines, embedding stores)
Worked with vector databases (e.g., Pinecone, FAISS, Weaviate), model registries, and orchestration tools (e.g., MLflow, Airflow, Ray)
Knowledge of prompt management, token usage optimization, and model…


Increase/decrease your Search Radius (miles)



Job Posting Language