×
Register Here to Apply for Jobs or Post Jobs. X

Tech Lead, AML Orchestration San Jose Regular

Job in San Jose, Santa Clara County, California, 95199, USA
Listing for: ByteDance
Full Time position
Listed on 2025-12-03
Job specializations:
  • IT/Tech
    Cloud Computing, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 257400 USD Yearly USD 257400.00 YEAR
Job Description & How to Apply Below

About the Team

The Applied Machine Learning (AML) team builds the next-generation machine learning algorithms and platforms that power Byte Dance’s recommendation systems, ads ranking, and search ranking. We drive significant impact on Byte Dance’s core businesses, focusing on scalable infrastructure, efficient orchestration, and world-class ML systems.

Role Overview

We are seeking an Tech Lead, AML Orchestration to own and advance Byte Dance’s distributed orchestration platforms. This leader will oversee a team of Machine Learning Engineers specializing in orchestration and scheduling, guiding the technical strategy for resource efficiency, distributed training, and online inference systems. The role requires deep expertise in large-scale distributed systems, orchestration frameworks, and cross-team collaboration.

Responsibilities
  • Lead, mentor, and grow a team of orchestration-focused ML engineers; set technical vision and ensure engineering excellence.
  • Design and optimize distributed orchestration and scheduling strategies across large-scale Kubernetes/Godel environments, ensuring efficiency, reliability, and scalability.
  • Drive initiatives for autoscaling, resource multiplexing, and preemption across heterogeneous workloads and clusters, including multi-datacenter and multi-cloud setups.
  • Partner with framework, platform and research teams to build next-generation distributed training and serving systems for ultra-large, high-dimensional recommendation models.
  • Architect robust and elastic online orchestration frameworks for large-scale inference, supporting evolving recommendation and ads models.
  • Stay ahead of trends in orchestration, scheduling, and distributed computing, incorporating best practices and emerging technologies.
Qualifications
  • Minimum Qualifications:

    Bachelor’s degree or higher in Computer Science, Engineering, or a related field.
  • 5+ years of experience in large-scale distributed systems, with at least 5 years in a technical leadership role.
  • Proficiency in one or more modern programming languages (Golang, Python, C++, or similar).
  • Deep understanding of orchestration frameworks (e.g., Kubernetes, Yarn) and distributed systems design principles.
  • Proven experience optimizing system performance, resource utilization, and scheduling strategies.
  • Strong analytical thinking, problem-solving, and communication skills.
  • Preferred Qualifications:

    Experience with orchestration or ML frameworks such as Ray, TFX, VeRL, vLLM, or equivalent.
  • Familiarity with distributed computing systems (Spark, Flink) and ML pipelines.
  • Contributions to open-source scheduling or ML infrastructure projects.
  • Hands-on experience with multi-tenant environments and cloud-native architectures.
  • Experience collaborating with and leading global, cross-functional teams across different time zones.
Job Information

The base salary range for this position in the selected city is $257,400 - $616,000 annually.

Compensation

Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units.

Benefits

Benefits may vary depending on the nature of employment and the country work location. Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others. Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure).

About

Us

Founded in 2012, Byte Dance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including Tik Tok, Lemon8, Cap Cut and Pico as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, Byte Dance has made it easier and more fun for people to connect with, consume, and create content.

Why…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary
Learn4Good is currently undergoing necessary server maintenance.
We hope to have the Login & Registration options back in 5 minutes, and apologize for any inconvenience.