×
Register Here to Apply for Jobs or Post Jobs. X

Backend Engineer Inference Optimization

Job in Seattle, King County, Washington, 98127, USA
Listing for: Vercept
Full Time position
Listed on 2026-01-13
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer, Data Engineer, Systems Engineer
Salary/Wage Range or Industry Benchmark: 150000 - 250000 USD Yearly USD 150000.00 250000.00 YEAR
Job Description & How to Apply Below

About Us

We’re a high-energy, impact-driven team, with a long track record of academic excellence. Our team includes researchers whose work has shaped the field—earning best paper awards at top AI conferences and even ranking among the most cited scientists in history of science. We’ve built fundamental, transformative research that has redefined the community, and now, we’re here to change the world—one breakthrough at a time.

What

We’re Looking For & Why Join Us

We’re looking for a Backend Engineer – Inference Optimization who thrives on solving some of the hardest systems problems in AI. You’ll focus on pushing the limits of foundation model inference performance, working at the intersection of cutting-edge ML and high-performance systems engineering. This is your opportunity to set new benchmarks for latency, throughput, and efficiency at scale.

What is this role?

As a Backend Engineer, you’ll own the design and optimization of inference pipelines for large-scale models. You’ll work closely with researchers and infrastructure engineers to identify bottlenecks, implement advanced techniques like quantization and KV caching, and deploy high-performance serving systems in production. Your work will directly determine how fast and cost-effectively users can access next-generation AI.

What do we expect?
  • Deep experience in optimizing model inference pipelines, model quantization and KV caching.
  • Proficiency in backend systems and high-performance programming (Python, C++, or Rust).
  • Familiarity with distributed serving, GPU acceleration, and large-scale systems.
  • Ability to debug complex performance issues across model, runtime, and hardware layers.
  • Comfort working in fast-moving environments with ambitious technical goals.
Nice to have:
  • Hands-on experience with vLLM or similar inference frameworks.
  • Background in GPU kernel optimization (CUDA, Triton, ROCm).
  • Experience scaling inference across multi-node or heterogeneous clusters.
  • Prior work in model compilation (e.g., Tensor

    RT, TVM, ONNX Runtime).
  • Hands-on experience with model quantization.
Compensation & Benefits

$150K – $250K + Equity

We offer health benefits, a 401(k) plan, and meaningful equity—because we believe top talent should be supported, secure, and fully invested in the future we’re building together.

Location

Our company is in-office at our Seattle HQ.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary