LLM Systems Engineer
New York, New York County, New York, 10261, USA
Listed on 2026-01-04
-
IT/Tech
Systems Engineer, AI Engineer
Location: New York
Location: United States (West Coast preferred, remote considered)
About the Company
We are a rapidly growing AI company delivering large language models mission is to ensure models not only perform well in research but also serve real-world applications reliably and efficiently. We are looking for engineers who enjoy solving high-scale inference and systems challenges.
Role OverviewWe are seeking a Senior / Staff LLM Systems Engineer to lead the development, optimization, and deployment of large language model inference pipelines. This role focuses on high-throughput, low-latency serving and production reliability, bridging ML research and platform engineering.
This is not a training-focused role – the emphasis is on serving models at scale, optimizing systems, and enabling production ML reliability
.
- Design, implement, and optimize inference pipelines for large language models
- Improve throughput and latency of model serving in production environments
- Collaborate closely with infrastructure, platform, and ML research teams to ensure smooth deployment
- Build monitoring, observability, and alerting systems for inference performance and reliability
- Identify and solve scaling challenges across GPUs, TPUs, or distributed environments
- Evaluate and adopt new technologies, frameworks, and architectures to improve inference efficiency
- Mentor other engineers and contribute to technical strategy for production ML systems
- 5+ years of software engineering experience, including hands-on ML systems experience
- Strong background in distributed systems, performance tuning, and low-latency architectures
- Experience with model serving frameworks (e.g., Triton, vLLM, Ray, Torch Serve)
- Familiarity with GPU/TPU infrastructure, multi-node deployment, and system-level optimization
- Understanding of ML workloads and trade-offs between accuracy, latency, and cost
- Proven ability to deliver production-grade ML systems at scale
- Excellent collaboration and problem-solving skills
- Work on cutting-edge LLM inference systems at scale
- Solve technically challenging, high-impact engineering problems
- Collaborate with top ML researchers and platform engineers
- Competitive compensation and flexible work arrangements
Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.
Reece Waldon
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).