Member of Technical Staff — Kernel/Compiler/Communication
Listed on 2026-02-28
-
Software Development
AI Engineer
Member of Technical Staff — Kernel / Compiler / Communication About the Role
Radix Ark is seeking a
Member of Technical Staff — Kernel / Compiler / Communication
to push the limits of performance for frontier AI systems.
You will work at the lowest layers of the stack — kernels, runtimes, compilers, and communication libraries — to unlock maximum efficiency from modern accelerators and interconnects.
This role is critical to scaling training and inference across thousands of GPUs, where microseconds and memory bandwidth matter. Your work will directly shape the performance envelope of next-generation AI systems.
This is a deeply technical role for engineers who enjoy working close to hardware and solving performance problems that most engineers never encounter.
Requirements5+ years of experience in systems, compiler, or performance engineering
Strong expertise in CUDA or accelerator programming
Deep understanding of GPU architecture and memory hierarchy
Experience writing or optimizing high-performance kernels
Strong background in compilers, runtimes, or code generation
Experience with distributed communication libraries (NCCL, MPI, RCCL, etc.)
Solid knowledge of networking and interconnect technologies
Proficiency in C++ and Python
Strong debugging and profiling skills at system level
Strong PlusExperience with Triton, TVM, XLA, or MLIR
Experience building compiler passes or IR transformations
Familiarity with NVLink, Infini Band, or RDMA
Experience optimizing collective communication at scale
Background in HPC or performance‑critical systems
Contributions to kernel/compiler/ML systems open source
Experience scaling workloads to 1000+ GPUs
Experience with mixed‑precision or quantized kernels
ResponsibilitiesDesign and implement high-performance kernels for AI workloads
Optimize compiler and runtime stacks for ML systems
Improve communication efficiency across large GPU clusters
Reduce latency and increase throughput for distributed workloads
Profile and eliminate system bottlenecks across the stack
Collaborate with training and inference teams on performance optimization
Develop tooling for profiling and performance analysis
Contribute to long‑term architecture for performance‑critical systems
Push the limits of hardware–software co‑design
About Radix ArkRadix Ark is an infrastructure‑first AI company built by engineers who have shipped production AI systems, created SGLang (20K+ Git Hub stars, the fastest open LLM serving engine), and developed Miles, our large‑scale RL framework.
We build world‑class systems for AI training and inference and collaborate with frontier AI labs and cloud providers.
Our team has optimized kernels serving billions of tokens daily and designed distributed systems coordinating 10,000+ GPUs.
Join us to build the performance foundation of next‑generation AI.
CompensationWe offer competitive compensation with meaningful equity, comprehensive benefits, and flexible work arrangements. Compensation depends on location, experience, and level.
Radix Ark is an Equal Opportunity Employer and welcomes candidates from all backgrounds.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).