×
Register Here to Apply for Jobs or Post Jobs. X

Lead Software Architect – AI Infrastructure & Cluster

Job in Singapore, Singapore
Listing for: CREW by HRNET
Full Time position
Listed on 2026-03-13
Job specializations:
  • IT/Tech
    Systems Engineer, Cybersecurity, IT Support
Salary/Wage Range or Industry Benchmark: 12000 SGD Monthly SGD 12000.00 MONTH
Job Description & How to Apply Below
Position: Lead Software Architect – AI Infrastructure & Cluster Performance |

Location: Singapore (travel-ready for regional AI cluster deployments)
Competitive
Salary: From SGD $12,000
Work Hours: 9 AM – 6 PM, Monday – Friday

Location: Ubi Road (East)

Key Responsibilities Leadership Responsibilities
  • Lead multi-disciplinary engineering teams in AI cluster performance and deployment projects.
  • Define technical roadmaps, standards, and best practices for large-scale AI infrastructure.
  • Mentor and upskill engineers in high-performance AI frameworks, cluster optimization, and security hardening.
  • Manage stakeholder communications, performance reporting, and deployment planning.
  • Drive decision-making for hardware-software trade-offs, contingencies, and multi-region deployments.
  • Foster a culture of reliability, proactive monitoring, and continuous performance improvement.
  • Collaborate with cross-functional teams to enforce governance, Zero-Trust access, and infrastructure hardening.
Technical Responsibilities

Cluster & Hardware Optimization:

  • Conduct cluster-level audits for software consistency across 3,456 GPUs and 48 racks.
  • Fine-tune BIOS, firmware, kernel, and network parameters (NVLink, Infini Band, PCIe Gen5/6) for maximum throughput.
  • Validate collective communications using NVIDIA NCCL and SHARP for zero-bottleneck AI training.

AI Frameworks & Orchestration:

  • Deploy, integrate, and optimize PyTorch, JAX, Slurm, Kubernetes with GPU-direct storage.
  • Implement real-time telemetry for GPU/NPU health, power, and thermal metrics.

Security & Compliance:

  • Implement Hardware Root-of-Trust, Secure Boot, and Zero-Trust IAM policies.
  • Enforce inline encryption (AES-GCM 256) for AI fabrics and secure sensitive training data.
  • Conduct vulnerability scanning, penetration testing, and ensure compliance with ISO
    27001, SOC2, NIST AI RMF.
  • Implement secrets management for API keys, SSH keys, and SSL certificates.

Networking & Performance Tuning:

  • Optimize ultra-low latency networks (400G/800G fabrics) and eliminate congestion using Sharpv4, RoCEv
    2.
  • Audit network paths, validate topology alignment with SDN, and monitor fabric performance proactively.
  • Configure and validate GPUDirect RDMA/Storage for direct GPU-to-storage data movement.

Send
Full Name, Contact Number & Resume
to:
📱
📲 Email: athirah.rosli

Please include your availability, notice period and expected salary in your application.

* Only shortlisted candidates will be contacted.

CREW by HRNet | HRnet Ventures Pte Ltd (24C2435)

Athirah Bte Rosli (R2197227)

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary