Systems Research Engineer
Job in
City of Edinburgh, Edinburgh, City of Edinburgh Area, EH1, Scotland, UK
Listed on 2026-01-14
Listing for:
Huawei Technologies Research & Development (UK) Ltd
Full Time
position Listed on 2026-01-14
Job specializations:
-
IT/Tech
AI Engineer, Systems Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below
Job Vision
In an era where LLM are rebuilding the foundational software stack, Huawei’s Cloud Matrix super-node clusters and AI-native infrastructure are reshaping how large-scale models are trained, served, and deployed. The Edinburgh Research Centre plays a key role in this transformation, driving new AI Infra & Agentic Serving architectures and helping define Huawei’s next-generation large‑scale data centre and AI infrastructure systems. Positioned at the intersection of advanced systems research and industrial‑scale engineering, our team turns innovative system designs into deployable, real‑world technologies.
Key Responsibilities- Distributed Systems Research & Development: Architect, implement, and evaluate distributed system components for emerging AI and data‑centric workloads. Drive modular design and scalability across CPU, GPU, and NPU clusters, building highly efficient serving and scheduling systems.
- Performance Optimization & Profiling: Conduct in‑depth profiling and performance tuning of large‑scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high‑throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch distributed systems.
- Scalable Model Serving Infrastructure: Develop and evaluate frameworks that enable efficient multi‑tenant, low‑latency, and fault‑tolerant AI serving across distributed environments. Research and prototype new techniques for cache sharing, data locality, and resource orchestration and scheduling within AI clusters.
- Research & Publications: Translate innovative research ideas into publishable contributions at leading venues (e.g., OSDI, NSDI, Euro Sys, SoCC, MLSys, NeurIPS, ICML, ICLR) while driving internal adoption of novel methods and architectures.
- Cross‑Team
Collaboration:
Communicate technical insights, research progress, and evaluation outcomes effectively to multidisciplinary stakeholders and global Huawei research teams.
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related field.
- Strong knowledge of distributed systems, operating systems, machine learning systems architecture, inference serving, and AI infrastructure.
- Hands‑on experience with LLM serving frameworks (e.g., vLLM, Ray Serve, Tensor
RT‑LLM, TGI) and distributed KV cache optimization. - Proficiency in C/C++, with additional experience in Python for research prototyping.
- Solid grounding in systems research methodology, distributed algorithms, and profiling tools.
- Team‑oriented mindset with effective technical communication skills.
- PhD in systems, distributed computing, or large‑scale AI infrastructure.
- Publications in top‑tier systems or ML conferences (NSDI, OSDI, Euro Sys, SoCC, MLSys, NeurIPS, ICML, ICLR).
- Understanding of load balancing, state management, fault tolerance, and resource scheduling in large‑scale AI inference clusters.
- Prior experience designing, deploying, and profiling high‑performance cloud or AI infrastructure systems.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×