LLM AIOps Development Engineer - Data Center Networking
Listed on 2026-02-28
-
IT/Tech
Data Engineer, Data Scientist, Systems Engineer
Responsibilities
- As a core member of the team, collaborate closely with Net Ops, SRE, and platform engineering to tackle the complexities of one of the world’s largest data center networks.
- Design and implement a closed-loop AIOps for the Network platform, covering:
- Build a Panoramic Network Observability Platform:
Develop a streaming telemetry data pipeline for both physical and virtual networks, integrating multi-source data from gNMI, Netconf, IPFIX/Net Flow, and SNMP to provide a high-quality, real-time data foundation for AIOps. - Develop an Intelligent Diagnostics and Root Cause Analysis System:
Apply machine learning and deep learning algorithms to perform anomaly detection, correlation analysis, and intelligent noise reduction on network metrics, logs, and events. Pinpoint root causes across the stack from optical transceivers and switch hardware to protocol adjacencies and application traffic. - Explore Innovative Applications of LLMs and Agents such as Intelligent Operations Assistant (conversational chatbot with Retrieval-Augmented Generation that queries knowledge bases and monitoring data to provide troubleshooting guidance and network status reports) and Automated Remediation/Smart Runbooks (operational Agents that can invoke network change tools, generate, recommend, or execute remediation plans).
- Establish Capacity and Risk Prediction Capabilities:
Forecast network capacity bottlenecks, high-risk links, and sub-healthy devices based on historical data and growth models for proactive scaling and preventative maintenance. - Forge a Rock-Solid Engineering System:
Adhere to engineering best practices to design and develop a highly available and scalable AIOps platform, ensuring stability and performance from data collection and model training to online inference and closed-loop actions.
- Build a Panoramic Network Observability Platform:
- Minimum Qualifications: Solid fundamentals in Computer Science and Networking, with deep understanding of data center network architectures (e.g., Spine-Leaf Fabric) and key protocols (EVPN/VXLAN, BGP/OSPF). In-depth knowledge of the Linux network stack is essential.
- Excellent Software Engineering
Skills:
Proficiency in Golang or Python with strong coding and system design abilities; familiarity with microservices, containerization (Docker/Kubernetes), and CI/CD. - Rich Platform Development
Experience:
Practical experience in one or more areas such as real-time data pipelines and analytics systems, big data processing (Kafka, Flink, Click House/TSDB), and observability technologies (Prometheus/Open Telemetry, graph databases like Neo4j). - AIOps/ML/LLM Practices:
Interest or hands-on experience in Large Models and Agent technologies (e.g., RAG, tool use, safety evaluation) and applying them to operations.
- Preferred Qualifications: Experience operating or developing for hyperscale (100,000+ servers) data center networks; leadership or significant contributions to LLM/Agent-based intelligent operations projects with measurable impact; active contributions to open-source communities (e.g., SONiC, P4/PINS, eBPF, Prometheus, Open Telemetry); experience in high-performance networking (RDMA/RoCE), Smart
NICs, DPDK/eBPF; experience building network configuration/control systems (e.g., SONiC, gNMI, Netconf).
Founded in 2012, Byte Dance's mission is to inspire creativity and enrich life. With a suite of products including Tik Tok, Lemon8, Cap Cut, Pico, and platforms for China such as Toutiao, Douyin, and Xigua, Byte Dance connects people to content.
Why Join Byte DanceByte Dance strives to inspire creativity and enrich life through diverse teams, curiosity, humility, and impact. We emphasize an "Always Day 1" mindset to achieve meaningful breakthroughs for our people, our company, and our users.
Diversity & InclusionWe are committed to an inclusive space where employees are valued for their skills and perspectives. Our platform and workplace reflect the communities we reach.
Reasonable AccommodationByte Dance provides reasonable accommodations in our recruitment processes for candidates with disabilities or other protected reasons. If you need assistance, please reach out to us at
Job InformationFor Pay Transparency:
The base salary range in the selected city is $177,688 - $341,734 annually. Compensation may vary based on qualifications, skills, competencies, experience, and location. Base pay is part of the Total Package, which may include discretionary bonuses, incentives, and stock units. Benefits include medical/dental/vision insurance, 401(k) with company match, parental leave, disability coverage, life insurance, wellbeing benefits, and paid time off.
The Company reserves the right to modify benefits at any time.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).