×
Register Here to Apply for Jobs or Post Jobs. X

Remote Hyperscale Datacenter Network Lead

Remote / Online - Candidates ideally in
New York, New York County, New York, 10261, USA
Listing for: Anistar Technologies
Remote/Work from Home position
Listed on 2026-01-14
Job specializations:
  • IT/Tech
    Systems Engineer, Network Engineer, Cloud Computing, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 200000 - 300000 USD Yearly USD 200000.00 300000.00 YEAR
Job Description & How to Apply Below
Location: New York

Job Title:
Lead Hyperscale Datacenter Network Engineer

Location:
Remote

The base salary range for this position is $200,000 – $300,000 pending experience.
Must have Hyperscale experience

Company Summary:
Lead Network Engineerto lead our Network Operations & Reliability pillar. This role will lead the Operations & Reliability team – you'll be building our network operations function from the ground up while being hands-on with incident response, reliability engineering, and operational tooling. We are looking for someone who is hungry and passionate about the autonomy of building a team and processes that ensure our AI datacenter fabrics run with exceptional reliability at scale.

Focus

  • Operations Architecture:

    Define and build the operational model for network reliability ablish incident response workflows, escalation procedures, runbook frameworks, and operational handoff criteria. Design the systems and processes that enable 24/7 operations across distributed datacenter regions.
  • Incident Response & Reliability:

    Own Tier 2+ incident management for network infrastructure. Lead response to critical incidents, perform root cause analysis, drive permanent fixes, and build the reliability engineering practices that prevent recurrence. Partner with NOC on Tier 1 triage and escalation workflows.
  • Observability & Monitoring:

    Build comprehensive observability for network infrastructure including monitoring stack integration, alerting frameworks, telemetry collection, and performance analytics. Ensure operators have visibility into fabric health, traffic patterns, and failure conditions across all network layers.
  • Runbook Development:

    Author and maintain operational runbooks for common failure scenarios, maintenance procedures, and troubleshooting workflows. Build the knowledge base that enables NOC (Tier
    1) and regional operations engineers to respond effectively to incidents.
  • Automation & Tooling:

    Drive operational automation initiatives including auto-remediation, failure classification, and runtime tooling. Partner with Network Automation Engineers on design-time automation while owning runtime operational tooling that improves MTTR and operational efficiency.
  • Cross-Functional Partnership:

    Collaborate with Deployment teams on production handover criteria, Engineering Core on design feedback from operational experience, Hardware teams on break-fix coordination, and NOC on escalation procedures. Build strong relationships that enable seamless coordination during incidents.

About You

  • Proven Operations Leadership:7+ years in network engineering with significant focus on network operations, reliability engineering, or NOC/SOC leadership. You've built operational processes from scratch or significantly scaled existing operations. You understand what it takes to maintain high availability at scale.
  • Deep Technical Operations Expertise:

    Strong hands-on experience operating large-scale datacenter networks including EVPN/VXLAN, BGP, CLOS architectures, and high-radix switching. You've responded to production incidents, debugged complex network failures, and driven root cause analysis to permanent fixes.
  • Reliability Engineering Mindset:

    You think in terms of MTTR, MTTD, and failure domains. You've built monitoring and alerting systems, developed runbooks, and implemented automation that improves operational efficiency. You understand the balance between manual intervention and automated remediation.
  • Incident Command

    Experience:

    You've led response to critical incidents involving multiple teams and stakeholders. You remain calm under pressure, communicate clearly during outages, and drive incidents to resolution while coordinating complex troubleshooting across teams.

Nice to Haves

  • AI/HPC Fabric Operations:

    Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2), lossless Ethernet (PFC, ECN), or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion.
  • Hyperscale Operations Background:

    Experience in network operations at hyperscale companies (Meta, Google, Microsoft, AWS) or large cloud providers. You've seen mature operational practices at scale and can adapt those lessons to a fast-growing startup.
  • NOC/SOC Leadership:

    Experience building or leading Network Operations Centers, including Tier 1/Tier 2/Tier 3 escalation models, shift scheduling, and oncall rotation management. You understand how to structure operations teams for 24/7 coverage.
  • Observability Stack Expertise:

    Deep familiarity with network monitoring and observability platforms (Prometheus, Grafana, ELK, Datadog, or similar). Experience designing telemetry collection, building dashboards, and tuning alerting to reduce noise.
  • Automation & Scripting:

    Comfortable with scripting languages (Python, Go) and automation frameworks (Ansible, Terraform). You can build operational tooling yourself or partner effectively with automation engineers to deliver runtime automation.
  • SRE Principles:

    Exposure to Site Reliability…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary