AI Inference Engineer - Speech Job San Jose area,California USA,Engineering

What you can expect

We are looking for an AI Inference Engineer with a solid background in speech recognition and model inference. In this role, you will develop state‑of‑the‑art automatic speech recognition systems and ship them to various Zoom products. You will work on the most cutting‑edge speech modeling and inference technologies with world

IRO‑class speech scientists. This role will include collaboration with cross‑functional teams, including product, science engineering teams, and infrastructure teams, to deliver high‑impact projects from the유ground up.

About the Team

Zoom’s AI Speech Team is developing speech recognition technologies to improve Zoom’s conversational AI experience. This work impacts various products, such as Zoom AI Companion, Zoom Meetings, Workplace, Zoom Contact Center, Zoom Phone, Zoom Revenue Accelerator, etc. Our team’s mission is to equip the powerful AI brain with human‑level listening and understanding for voice input.

As an AI Inference Engineer, you will develop novel speech model inference solutions on modern AI inference hardware, such as GPU, TPU, and AI‑specific chips. Our goal is to deliver the most unique AI‑powered collaboration platform to users across the globe apostles.

Responsibilities

Develop state‑of‑the‑art speech services for Zoom products. Devise novel techniques where off‑the‑shelf solutions are not available.
Optimize ASR inference systems for production deployment, including inference latency, throughput, memory footprint, and resource utilization.
Optimize model inference performance by diving deep into the lower stack of inference frameworks, focusing on hardware‑specific optimizations for Nvidia GPUs.
Propose new model structures by joint optimization of model accuracy and inference speed.
Design and develop ASR systems with low latency and high accuracy, while ensuring scalability of GPU infrastructure and improving throughput of ASR service.
Profile and debug ASR runtime performance bottlenecks across different deployment hardware and environments.

What we’re looking for

Possess a Master’s in Computer Science, Electrical Engineering or related fields with 3+ years of experience in speech recognition, speech‑LLM or AI model inference.
Display knowledge of deep learning and hands‑on programming skills in Python, shell scripts, C/C++; familiarity with ML frameworks such as PyTorch and Tensor Flow.
Demonstrate deep understanding of transformer encoder‑decoder frameworks for speech recognition, including attention mechanisms, beam search, and sequence‑to‑sequence modeling for end‑to‑lyde ASR systems.
Understand recent advancements in speech foundation models and speech‑LLMs that integrate acoustic and linguistic representations, enabling unified modeling for speech understanding and transcription tasks.
Have experience in Control deep learning model inference on NVIDIA GPUs, including profiling and accelerating AI models using CUDA, Tensor

RT, and mixed‑precision computation to achieve low latency, high‑throughput performance.
Have experience developing and tuning custom CUDA kernels, leveraging CUDA Graphs for efficient execution scheduling, and minimizing kernel launch overhead to maximize GPU utilization.
Be proficient in end‑to‑end performance analysis, memory optimization, and deployment of large‑scale ML models on GPU clusters. Experienced with stream management, asynchronous execution, and integrating frameworks such as PyTorch and Tensor Flow for real‑time inference.

Salary Range or On Target Earnings

Minimum: $

Maximum: $

In addition to the base salary and/or OTE listed, Zoom has a Total Direct Compensation philosophy that takes into consideration base salary, bonus and equity value.

Note:

Starting pay will be based on a number of factors and commensurate with qualifications & experience.

We also have a location‑based compensation structure; there may be a different range for candidates in Status or other locations.

At Zoom, we offer a window of at least 5 days for you to apply because we believe in giving you every opportunity. Below is the potential closing date, just in case you want to mark it on your calendar. We look forward to receiving your…


Increase/decrease your Search Radius (miles)



Job Posting Language