Sr. AI Network Architect, Infrastructure
Listed on 2026-02-28
-
IT/Tech
Systems Engineer, Cloud Computing, AI Engineer, Network Engineer
Sr. AI Network Architect, Infrastructure
Location:
Seattle
Team:
Technology
Employment Type:
Regular
Job Code:
A34003A
Responsibilities- Lead requirement analysis and capacity planning for AI training and inference networks, with a strong understanding of workload characteristics and scalability needs.
- Analyze industry trends and emerging technologies in AI networking, and translate them into practical architectural decisions.
- Architect end‑to‑end AI network solutions optimized for performance, scalability, reliability, and cost.
- Own network architecture validation, benchmarking, and performance tuning for large‑scale AI clusters.
- Partner closely with infrastructure, platform, and operations teams to ensure smooth deployment and production rollout of new network architectures.
- Provide technical leadership and mentorship to engineers, and contribute to architecture reviews and long‑term infrastructure roadmaps.
Minimum Qualifications
- Bachelor or higher degree in computer science, electronic engineering, network engineering or related fields.
- Deep expertise in high‑performance networking technologies, including RDMA, PFC, ECN, DLB, GLB, QoS, and AR, with hands‑on experience optimizing large‑scale systems.
- Strong foundation in core networking protocols and architectures such as TCP/IP, BGP, SRv6, VXLAN, and ACLs.
- Solid understanding of AI training and inference workloads, including distributed training patterns and performance bottlenecks.
- Strong knowledge of collective communication principles and their impact on large‑scale GPU/accelerator clusters.
- Hands‑on experience with performance testing and benchmarking tools such as perftest and NCCL tests, and the ability to interpret results to drive architectural decisions.
- Proven experience designing and deploying AI‑focused network architectures in production environments.
Preferred Qualifications
- Experience operating and scaling large AI or HPC clusters in production.
- Experience working with GPU/accelerator fabrics and data center‑scale networking.
- Strong cross‑functional collaboration skills and the ability to influence technical direction across teams.
- The base salary range for this position in the selected city is $198360 - $416100 annually.
- Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units.
- Benefits may vary depending on the nature of employment and the country work location. Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others. Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure).
- The Company reserves the right to modify or change these benefits programs at any time, with or without notice.
Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment:
Founded in 2012, Byte Dance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including Tik Tok, Lemon8,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).