×
Register Here to Apply for Jobs or Post Jobs. X

LLM Systems Engineer

Remote / Online - Candidates ideally in
New York, New York County, New York, 10261, USA
Listing for: Darwin Recruitment
Remote/Work from Home position
Listed on 2026-01-04
Job specializations:
  • IT/Tech
    Systems Engineer, AI Engineer
Salary/Wage Range or Industry Benchmark: 120000 - 160000 USD Yearly USD 120000.00 160000.00 YEAR
Job Description & How to Apply Below
Position: Staff LLM Systems Engineer
Location: New York

Location: United States (West Coast preferred, remote considered)
About the Company

We are a rapidly growing AI company delivering large language models  mission is to ensure models not only perform well in research but also serve real-world applications reliably and efficiently. We are looking for engineers who enjoy solving high-scale inference and systems challenges.

Role Overview

We are seeking a Senior / Staff LLM Systems Engineer to lead the development, optimization, and deployment of large language model inference pipelines. This role focuses on high-throughput, low-latency serving and production reliability, bridging ML research and platform engineering.

This is not a training-focused role – the emphasis is on serving models at scale, optimizing systems, and enabling production ML reliability
.

Responsibilities
  • Design, implement, and optimize inference pipelines for large language models
  • Improve throughput and latency of model serving in production environments
  • Collaborate closely with infrastructure, platform, and ML research teams to ensure smooth deployment
  • Build monitoring, observability, and alerting systems for inference performance and reliability
  • Identify and solve scaling challenges across GPUs, TPUs, or distributed environments
  • Evaluate and adopt new technologies, frameworks, and architectures to improve inference efficiency
  • Mentor other engineers and contribute to technical strategy for production ML systems
Qualifications
  • 5+ years of software engineering experience, including hands-on ML systems experience
  • Strong background in distributed systems, performance tuning, and low-latency architectures
  • Experience with model serving frameworks (e.g., Triton, vLLM, Ray, Torch Serve)
  • Familiarity with GPU/TPU infrastructure, multi-node deployment, and system-level optimization
  • Understanding of ML workloads and trade-offs between accuracy, latency, and cost
  • Proven ability to deliver production-grade ML systems at scale
  • Excellent collaboration and problem-solving skills
Why You’ll Enjoy This Role
  • Work on cutting-edge LLM inference systems at scale
  • Solve technically challenging, high-impact engineering problems
  • Collaborate with top ML researchers and platform engineers
  • Competitive compensation and flexible work arrangements

Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.

Reece Waldon

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary