AI Infrastructure Consultant Job Riyadh area,Riyadh Region Saudi Arabia,IT/Tech

Job Title: AI Infrastructure Consultant

Job Type: Permanent

Job Location: Riyadh, Saudi Arabia

Job Summary:

We are seeking a seasoned AI Infrastructure Consultant to lead the design, implementation, and optimization of our high-performance computing environment. This role is critical for bridging the gap between raw hardware capabilities (GPUs) and scalable AI/ML model deployment. You will be responsible for ensuring our infrastructure is robust, cost-effective, and capable of supporting complex machine learning workloads at scale.

Roles and Responsibilities:

Architecture & Design

Assess AI/ML workload requirements to design end-to-end compute, storage, and networking architectures.
Architect specialized GPU clusters (NVIDIA A100/H100 or similar) tailored for training and inference.
Define high-speed networking requirements (e.g., Infini Band, RoCE) and low-latency storage solutions for massive datasets.

Containerization & Orchestration

Implement and manage Docker containerization for consistent model environments.
Deploy and scale AI workloads using Kubernetes (or managed services like EKS/GKE/AKS), ensuring high availability and seamless resource scheduling.

MLOps & CI/CD Integration

Build and maintain robust CI/CD pipelines specifically for AI models, automating the journey from code to production.
Integrate automated testing, versioning for models/data, and deployment strategies (Canary, Blue-Green).

Monitoring & Cost Optimization

Establish comprehensive monitoring frameworks to track infrastructure utilization and GPU health.
Analyze performance bottlenecks and implement strategies to optimize cost-performance, ensuring maximum ROI on expensive compute resources.

Required

Qualifications & Skills:

Total

Experience:

10+ years in IT Infrastructure, Systems Engineering, or Dev Ops.
AI Specialization: 2–3 years of hands‑on experience specifically in AI/ML infrastructure.
GPU Expertise:
Proven track record in GPU setup, CUDA configurations, and managing hardware acceleration for deep learning.
Orchestration:
Expert‑level knowledge of Kubernetes and the CNCF ecosystem.
Cloud & Hybrid:
Proficiency in major cloud providers (AWS/Azure/GCP) and on‑premise data center environments.
Soft Skills:

Strong consultancy mindset with the ability to translate complex technical requirements into actionable architectural roadmaps.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language