×
Register Here to Apply for Jobs or Post Jobs. X

Senior Researcher: AI Computing Systems

Job in Zürich, 8058, Zurich, Kanton Zürich, Switzerland
Listing for: Huawei Switzerland
Full Time position
Listed on 2026-02-18
Job specializations:
  • Software Development
    AI Engineer, Software Engineer, Cloud Engineer - Software, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 CHF Yearly CHF 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Location: Zürich

Huawei envisions a world where technology connects people, empowers industries, and unlocks human potential. Guided by its mission to enrich lives through communication and intelligent innovation, Huawei stands at the forefront of global digital transformation. As a leader in Information and Communications Technology (ICT), the company pioneers breakthroughs in artificial intelligence, cloud computing, and smart devices - building the intelligent foundation of a fully connected world.

Through its Carrier, Enterprise, and Consumer business groups, Huawei delivers resilient digital infrastructure, advanced cloud and AI platforms, and transformative devices that enable progress at every level. Supporting 45 of the world’s top 50 telecom operators and serving one-third of the global population across more than 170 countries, Huawei is shaping a future where connectivity becomes a powerful catalyst for opportunity and sustainable growth.

This spirit of bold innovation is embodied by Huawei Technologies Switzerland AG. From its research hubs in Zurich and Lausanne, pioneering teams push the boundaries of High-Performance Computing, Computer Architecture, Computer Vision, Robotics, Artificial Intelligence, Neuromorphic Computing, Wireless Technologies, and Networking - architecting the intelligent systems that will define tomorrow’s digital era.

We are looking for a strong researcher with hands‑on LLM + RAG experience who can help build and optimize techniques such as KV‑cache precomputation, KV reuse/blending (e.g., Cache Blend‑style), and sparse attention / selective recompute. You will work close to the metal (attention kernels + profiling) and at the system level (vLLM/LMCache‑style stacks), turning research ideas into robust, high‑performance code.

Responsibilities:
  • Design and implement RAG acceleration techniques that reduce TTFT and improve throughput (e.g., document KV precomputation, reuse, caching policies).

  • Develop KV‑cache reuse / blending pipelines and integrate them into inference stacks (batching, paging, eviction, correctness/quality trade‑offs).

  • Implement and optimize sparse attention / selective attention paths, including mask construction and block‑granularity strategies.

  • Work with PyTorch and modern attention backends/kernels (e.g., Flash Attention / Flash Infer‑like kernels), profiling and optimizing performance.

  • Stay up to date with the latest research and open‑source progress in LLM inference, KV caching, and RAG systems, and translate it into practical improvements.

Qualifications:
  • PhD in Computer Science, Electrical Engineering, or a related field.

  • Strong software engineering skills in Python, with substantial PyTorch experience (model internals, attention/KV cache concepts, performance‑aware coding).

  • Solid understanding of transformer inference fundamentals: prefill vs decode, KV cache layout, masking, batching, latency/throughput trade‑offs.

  • Experience benchmarking and profiling AI LLM workloads, and diagnosing performance bottlenecks.

  • Strong communication skills and comfort collaborating across research + engineering.

Preferred Qualifications (Nice to Have):
  • Experience with vLLM and/or LMCache (integration, debugging, extending attention/KV‑cache logic).

  • Familiarity with attention kernel stacks and customization (Flash Attention/Flash Infer, Triton, CUDA extensions, custom ops).

  • Practical experience building RAG pipelines (retrieval, chunking, indexing, reranking) and understanding how retrieval interacts with inference latency.

  • Contributions to open‑source projects or publications/technical reports in AI systems, LLM inference, caching, or storage‑aware ML systems.

  • Systems background (Linux, performance engineering, storage/IO, memory hierarchy) and comfort working close to hardware.

Why join us:
  • Collaborate with world‑class scientists and engineers in an open, curiosity‑driven environment.

  • Access to state‑of‑the‑art technology and tools.

  • Opportunities for professional growth and development.

  • Competitive salary, and a high quality of life in Zurich, at the center of Europe.

  • Last but certainly not least: be part of innovative projects that make a difference.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary