Datacentre Operations Engineer
Listed on 2026-01-12
-
IT/Tech
Systems Engineer, IT Support, Hardware Engineer, Cloud Computing
Company Overview:
Ori Industries is at the forefront of AI infrastructure, revolutionising the connection between software and hardware for the AI era. Our mission is to empower AI teams with scalable, secure, and efficient infrastructure solutions that support seamless model training, deployment, and scaling.
Job Summary:We’re looking for a qualified, experienced Data centre/Hardware Engineer to run our multi-million dollar HPC infrastructure based in Dallas Fort Worth, US. You’ll be well versed with managing and optimising data centres, dealing promptly with hardware failures, optimising environmental performance as well as deploying new hardware and services 24/7 x 365. You’ll be hands on with high performing HPC compute and will operate with utmost diligence, professionalism and focus to ensure the equipment underpinning our services operate at peak performance.
What You’ll Do:- Troubleshooting and Support:
Quickly diagnose and resolve hardware and network issues to maximise uptime. - Respond to critical hardware alerts via our monitoring and observability platform.
- Contribute to ongoing service improvement to improve our monitoring capability RMA and Support:
Manage vendor relationships, handling RMAs and support requests within Ori’s Service Level Objectives (SLOs) to meet customer contract SLAs. - Data Center Management:
Guide data center acquisition, setup, and ongoing maintenance, fostering compliance and leveraging strong vendor partnerships. - Fully own acquisition of hardware assets from the point of purchase and delivery, through lifecycle management and disposal - all while owning asset management within ORI’s CMDB system.
- Hardware Installation and Maintenance:
Deploy and maintain HPC and AI hardware for uninterrupted operations, including performing low-level system maintenance such as hardware troubleshooting, firmware updates, and replacement of components as needed. - Datacenter Environment Technologies:
Oversee cooling, power distribution, and other critical data center technologies to maintain high operational standards. - Capacity Planning and Resource Allocation:
Support strategic planning to align infrastructure capabilities with current and projected demands. - Develop and maintain data centre/hardware management SOP’s ensuring continual alignment with ORI’s governance and compliance requirements
- Apply ITSM frameworks:
Incident, Major Incident, Change Management, and service improvement. - Operate and support services 24x7x365 for production environments, including on‑call rotation
- Contribute to Incident postmortem analyses, root cause analysis, document learnings, and automate remediations
- Mentor junior engineers and act as an Operational requirements consultant to other departments
- Communicate technical decisions clearly to non-technical stakeholders and customers
- Uphold a culture of: do, document, automate
- Willing to cross train and upskill in Infrastructure/Platform SRE practises.
- Willing to travel across North America to support future data centre onboarding and deployments.
- Degree in Computer Science, or 10 years industry experience.
- 3+ years of experience in data center operations, HPC, or related roles.
- Proven track record working with HPC Nvidia GPU or equivalent systems, high-performance storage, and networking.
- Expertise in hardware installation, network configuration, and low-level system maintenance, including hardware troubleshooting and firmware management.
- Knowledge of data center environment technologies, including cooling and power distribution.
- Experience in data center design, greenfield deployments, and operations.
- Strong understanding of hardware and spares management, with the ability to handle
- RMAs and support cases within defined SLOs to meet SLA requirements.
- Solid understanding of HPC and AI workloads.
- Strong problem-solving abilities and the resilience to thrive in a fast-paced environment.
- Excellent communication skills and ability to collaborate with cross-functional, internationally dispersed teams.
- Strong grasp of ITSM and service operation best practices
- Excellent communication and mentorship skills
- Comfortable interfacing with internal stakeholders and external…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).