Member of Technical Staff — Kernel/Compiler/Communication Job Palo Alto area,California USA,Software Development

Position: Member of Technical Staff — Kernel / Compiler / Communication

Member of Technical Staff — Kernel / Compiler / Communication About the Role

Radix Ark is seeking a
Member of Technical Staff — Kernel / Compiler / Communication
to push the limits of performance for frontier AI systems.

You will work at the lowest layers of the stack — kernels, runtimes, compilers, and communication libraries — to unlock maximum efficiency from modern accelerators and interconnects.

This role is critical to scaling training and inference across thousands of GPUs, where microseconds and memory bandwidth matter. Your work will directly shape the performance envelope of next-generation AI systems.

This is a deeply technical role for engineers who enjoy working close to hardware and solving performance problems that most engineers never encounter.

Requirements

5+ years of experience in systems, compiler, or performance engineering

Strong expertise in CUDA or accelerator programming

Deep understanding of GPU architecture and memory hierarchy

Experience writing or optimizing high-performance kernels

Strong background in compilers, runtimes, or code generation

Experience with distributed communication libraries (NCCL, MPI, RCCL, etc.)

Solid knowledge of networking and interconnect technologies

Proficiency in C++ and Python

Strong debugging and profiling skills at system level

Strong Plus

Experience with Triton, TVM, XLA, or MLIR

Experience building compiler passes or IR transformations

Familiarity with NVLink, Infini Band, or RDMA

Experience optimizing collective communication at scale

Background in HPC or performance‑critical systems

Contributions to kernel/compiler/ML systems open source

Experience scaling workloads to 1000+ GPUs

Experience with mixed‑precision or quantized kernels

Responsibilities

Design and implement high-performance kernels for AI workloads

Optimize compiler and runtime stacks for ML systems

Improve communication efficiency across large GPU clusters

Reduce latency and increase throughput for distributed workloads

Profile and eliminate system bottlenecks across the stack

Collaborate with training and inference teams on performance optimization

Develop tooling for profiling and performance analysis

Contribute to long‑term architecture for performance‑critical systems

Push the limits of hardware–software co‑design

About Radix Ark

Radix Ark is an infrastructure‑first AI company built by engineers who have shipped production AI systems, created SGLang (20K+ Git Hub stars, the fastest open LLM serving engine), and developed Miles, our large‑scale RL framework.

We build world‑class systems for AI training and inference and collaborate with frontier AI labs and cloud providers.

Our team has optimized kernels serving billions of tokens daily and designed distributed systems coordinating 10,000+ GPUs.

Join us to build the performance foundation of next‑generation AI.

Compensation

We offer competitive compensation with meaningful equity, comprehensive benefits, and flexible work arrangements. Compensation depends on location, experience, and level.

Radix Ark is an Equal Opportunity Employer and welcomes candidates from all backgrounds.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language

Member of Technical Staff — Kernel​/Compiler​/Communication

Member of Technical Staff — Kernel/Compiler/Communication