Manager,Software Engineering,Compute Infrastructure Job Mountain View area,California USA,IT/Tech

Manager, Software Engineering, Compute Infrastructure

• Full-time

• Workplace Type:
Hybrid

Linked In is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We’re also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture that’s built on trust, care, inclusion, and fun – where everyone can succeed.

Join us to transform the way the world works.

We are the Host Health and Remediation team within Compute Infrastructure, focused on advancing the reliability and operability of Linked In’s compute infrastructure. Our mission is to provide a unified, reliable, and transparent host health signal and to remediate unhealthy hosts across Linked In’s entire server fleet.

The team offers the opportunity to tackle large-scale, technically challenging problems, work on cutting-edge infrastructure systems, and directly contribute to Linked In’s fleet reliability through automation, observability, and data-driven insights. The impact of this work is felt across the entire company.

This role will be based in Mountain View, Bellevue, or San Francisco.

At Linked In, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a Linked In office on select days, as determined by the business needs of the team.

Join Linked In’s Host Health and Remediation team to shape the next generation of our compute infrastructure! In this role, you’ll lead efforts on large-scale systems, automating host health monitoring and remediation to boost reliability and minimize downtime across the entire server fleet. This high-visibility, company-wide initiative offers the opportunity to design scalable solutions, influence key infrastructure decisions, and collaborate across multiple teams.

This manager will play a pivotal role in ensuring the reliability, observability, and operability of Linked In’s entire server fleet. By leading the Host Health and Remediation team, this role directly impacts the availability and performance of Linked In’s compute infrastructure, enabling all services and products across the company to operate reliably position combines technical leadership, strategic planning, and team development, making it critical for advancing Linked In’s long-term infrastructure goals.

If you have experience managing teams that build large-scale compute infrastructure systems and want to make a lasting impact on Linked In’s future, we’d love to connect!

Responsibilities

• Talent Management:
Recruit, coach, mentor, and grow high-performing engineers; foster a culture of accountability, innovation, and continuous learning.

• Technical Leadership:
Guide architectural decisions, review designs, and ensure scalable, reliable infrastructure solutions.

• Strategic Planning:
Define and drive the technical roadmap for host health monitoring and remediation; prioritize initiatives with maximum business impact.

• Cross-Team

Collaboration:

Partner with other infrastructure, operations, and platform teams to define best practices, share insights, and influence company-wide reliability improvements.

• Operational Excellence:
Ensure systems are observable, automated, and proactively maintained to minimize downtime and maximize fleet health.

Basic Qualifications

• BA/BS Degree in Computer Science or related technical discipline, or equivalent practical experience.

• 1+ year(s) of management experience or 1+ year(s) of staff level engineering experience with management training.

• 5+ years of industry experience in software design, development, and large-scale software engineering.

• Strong understanding of large-scale systems, reliability engineering, monitoring, and automation.

• Proven experience managing engineering teams in distributed systems or compute infrastructure.

• Experience programming…


Increase/decrease your Search Radius (miles)



Job Posting Language