GPU & LLM Infrastructure Product Manager
Listed on 2026-02-28
-
IT/Tech
AI Engineer
Hi,
Our client is looking for GPU & LLM Infrastructure Product Manager at Minneapolis, Minnesota / Charlotte, NC / Irving, TX (Hybrid) below is the detailed requirements. Please share your updated resume if you are interested.
RoleGPU & LLM Infrastructure Product Manager
LocationMinneapolis, Minnesota / Charlotte, NC / Irving, TX (Hybrid)
Contract About This RoleEnterprise AI Platform- GPU & LLM Infrastructure Product Manager
You will define and lead the product strategy for enterprise-scale LLM/SLM inference GPU platform. In this role, you will partner closely with GPU hardware and platform engineering teams to translate customer needs and business objectives into a clear, prioritized roadmap with measurable outcomes.
You will own capabilities across high-performance model inferencing, GPU orchestration, and platform services, including vLLM, NVIDIA/Run:
AI, and Red Hat Open Shift AI. The role also encompasses API productization, observability and evaluation, reliability and SLOs, and compliant end-to-end lifecycle management to enable secure, scalable, and enterprise-ready AI solutions.
- Lead a team to identify, strategize and execute highly complex Artificial Intelligence initiatives that span a line of business
- Recommend business strategy and deliver Artificial Intelligence enabling solutions to solve business challenges
- Define and prioritize cases, obtain the required resources and ensure the solutions deliver the intended benefits
- Leverage Artificial Intelligence expertise to evaluate technological readiness and resources required to execute the proposed solutions
- Make decisions to drive the implementation of Artificial Intelligence initiatives and programs while serving multiple stakeholders
- Resolve issues which may arise during development or implementation
- Collaborate and consult with peers, colleagues and managers to resolve issues and achieve goals
- 5+ years of Artificial Intelligence Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
- 2+ years of hands‑on experience with cloud platforms such as GCP or Azure, and container orchestration technologies including Docker and Kubernetes/Open Shift
- 2+ years of experience working on platform or ML/AI infrastructure products within regulated environments
- 2+ years of experience of proven success owning an API or platform with accountability for SLAs/SLOs, including versioning and deprecation strategies, change management, and reliability outcomes
- Strong communication skills, with the ability to influence senior stakeholders and clearly explain complex technical concepts to diverse audiences
- Working knowledge of LLM/SLM inference stacks, including vLLM, Triton, and Tensor
RT-LLM, as well as batching strategies, KV cache management, quantization techniques (e.g., FP8, INT4), and evaluation frameworks – sufficient to make informed product trade-offs with engineering teams - Familiarity with GPU and platform fundamentals, such as modern GPU architectures (e.g., H100/H200), MIG and NCCL, GPU orchestration tools (NVIDIA/Run:
AI), and Kubernetes/Open Shift AI administration and admission control patterns - Experience building developer‑centric platforms, including APIs, SDKs, and structured release and governance processes
- Hands‑on experience with observability and evaluation for GenAI systems, including dashboards, tracing, alerting, and safety and quality metrics
- Demonstrated strength in stakeholder management, partnering effectively across Risk, Security, Architecture, and line‑of‑business application teams
- 2+ years of experience working on platform or ML/AI infrastructure products within regulated environments
- 2+ years of proven success owning an API or platform with accountability for SLAs/SLOs, including versioning and deprecation strategies, change management, and reliability outcomes
- Strong communication skills, with the ability to influence senior stakeholders and clearly explain complex technical concepts to diverse audiences
- Working knowledge of LLM/SLM inference stacks, including vLLM, Triton, and Tensor
RT-LLM, as well as batching strategies, KV cache management, quantization techniques (e.g., FP8, INT4), and evaluation frameworks – sufficient to make informed product trade-offs with engineering teams - Familiarity with GPU and platform fundamentals, such as modern GPU architectures (e.g., H100/H200), MIG and NCCL, GPU orchestration tools (NVIDIA/Run:
AI), and Kubernetes/Open Shift AI administration and admission control patterns - Experience building developer‑centric platforms, including APIs, SDKs, and structured release and governance processes
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).