More jobs:
GPU Software Engineer
Job in
Milpitas, Santa Clara County, California, 95035, USA
Listed on 2026-02-28
Listing for:
KLA-Belgium
Full Time
position Listed on 2026-02-28
Job specializations:
-
Software Development
Software Engineer, AI Engineer
Job Description & How to Apply Below
** KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays.
The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems.
There is never a dull moment with us.
** Group/Division
** Enabling the movement toward advanced chip design, KLA's Measurement, Analytics and Control group (MACH) is looking for the best and brightest research scientists, software engineers, application development engineers and senior product technology process engineers to join our team. The MACH team's mission is to collaborate with our customers to innovate technologies and solutions that detect and control highly complex process variations—at their source—rather than compensate for them at later stages of the manufacturing process.
With over 40 years of semiconductor process control experience, chip makers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Our MACH team develops leading-edge solutions for patterning process analytics and control technologies, thereby providing customers with critical insight at the feature level, field level and cross-wafer analysis. Our teams also develop advanced modeling simulation, data analytics and process control modeling technologies.
As a member of the MACH team, you’ll be joining the most sophisticated and successful process-control company in the semiconductor industry--working across functions to solve the most complex technical problems in the digital age.
** Job Description/Preferred Qualifications
** Implement and optimize CUDA kernels for image operations: convolution/filters, morphological ops, warping/resampling, color space conversions (RGB/YUV/HSV), denoising/deblurring, HDR/Tone mapping, polygon manipulation, feature extraction, and classification.
- Use GPU memory hierarchies effectively (global/shared/constant/texture), coalesce memory, apply shared memory tiling, and minimize divergence/branching.
- Profile and tune with Nsight Compute/Systems, CUDA-MEMCHECK, and cuda-gdb; instrument pipelines with metrics (FPS, latency, bandwidth, occupancy).
- Collaborate with product and algorithm teams; contribute to CI/CD (Azure/Dev Ops, CMake, Git Hub Actions/Git Lab CI) and documentation.
- Integrate accelerated primitives (NVIDIA NPP, cuFFT, cuBLAS) and OpenCV CUDA modules; build clean C++ APIs with Python bindings (pybind
11) when needed.
- implement a distributed multi-process architecture using CUDA MPS for high-throughput, concurrent workloads. You’ll own performance-critical pipelines, profile on NVIDIA GPUs, and ship production-quality C++ that meets strict latency and throughput targets
** Minimum Qualifications
** Master's Level Degree and 2+ years related work experience;
Bachelor's Level Degree and related work experience of 4+ years Minimum qualifications - BS/MS in CS, EE, or related field.
- 3–6 years of professional experience, including 2+ years focused on CUDA-based image processing.
- Strong C++17/20 fundamentals; solid understanding of parallel algorithms and data layouts (pitch-linear, planar, interleaved).
- Practical experience with Nsight profiling, occupancy analysis, and kernel optimization (tiling, warp-level intrinsics, streams).
- Experience with OpenCV (including CUDA paths), and at least one of: NPP, cuFFT,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×