Forward Deployed Engineers - Decentralized - Computing Leader
Listed on 2026-02-28
-
IT/Tech
Cloud Computing, AI Engineer, Systems Engineer
Location: New York
Senior Forward Deployed Engineer – AI Infrastructure & Systems About The Role
As a Senior Forward Deployed Engineer, you’ll serve as the technical bridge between advanced AI infrastructure and the organizations that depend on it. This is a hands‑on, high‑impact role for engineers who thrive in dynamic, complex environments and are eager to help customers bring massive‑scale AI workloads to life. Your mission is to work side‑by‑side with customers to design, deploy, and optimize large‑scale GPU clusters and the infrastructure that powers today’s most demanding machine learning and AI systems.
You’ll blend deep technical expertise with strong communication skills to guide customers through every stage of their AI infrastructure journey—from first deployment to long‑term optimization.
- Design, deploy, and manage clusters exceeding 1,000 GPUs using custom-built automation playbooks and infrastructure‑as‑code tools.
- Diagnose and enhance the performance of compute, storage, and networking systems, collaborating closely with providers to deliver peak efficiency.
- Orchestrate large‑scale data migrations across cloud and on‑prem environments, handling petabytes of data with precision and speed.
- Troubleshoot complex issues across the stack—whether that’s debugging hardware anomalies or optimizing distributed data loaders across multi‑region buckets.
- Develop robust internal tooling to streamline deployments, strengthen reliability, and empower automation where it truly makes an impact.
- Provide direct technical support during customer operations and participate in a rotating on‑call schedule for critical environments.
- A customer‑first mindset and the ability to translate complex problems into practical, actionable solutions.
- Comfort navigating ambiguity and building order from chaos in fast‑moving, high‑stakes technical environments.
- A bias toward action—balancing hands‑on problem solving with a disciplined approach to automation and scalability.
- Clear, concise communication skills and a collaborative, low‑ego attitude that strengthens every interaction.
- 2+ years of experience in Software Engineering, Site Reliability Engineering, Dev Ops, Systems Administration, or High‑Performance Computing.
- Proficiency in deploying and managing Kubernetes and/or SLURM clusters.
- Hands‑on experience coding in Go, Python, and Bash.
- Strong familiarity with Ansible, Terraform, and other automation or Infrastructure‑as‑Code tools.
- Solid foundation in Computer Science, Engineering, or a related technical field.
- Exceptional verbal and written communication skills in English.
- Building and operating AI workloads at 1,000+ GPU scale.
- Developing and maintaining large‑scale, multi‑tenant Kubernetes‑based services.
- Deploying and managing datacenter hardware or bare‑metal environments via MaaS, Net Box, or equivalent tools.
- Managing Infini Band or RoCE network deployments supporting multi‑tenant architectures.
- Designing and operating petabyte‑scale all‑flash or distributed storage systems (e.g., DDN, VAST, Weka, Ceph, or Lustre).
You’re a builder, a problem‑solver, and a trusted technical partner. You take pride in crafting reliable systems that scale elegantly under pressure. You’re as comfortable writing infrastructure code as you are explaining architectural tradeoffs to customers. Most importantly, you thrive in environments where deep technical curiosity meets real‑world impact.
About AndiamoTalent Partners for the AI Revolution. As a globally recognized staffing and consulting firm, we specialize in placing the top 2% of technology and go‑to‑market professionals with the world’s largest and most well‑known companies. For over 20 years, we've maintained the status of tier‑one vendor for firms such as Palantir, Amazon, Fluidstack, Bloomberg, Relativity Space, Firefly, Master Card, Visa, Two Sigma, Citadel, as well as other major financial services firms, elite hedge funds, Google‑backed tech start‑ups, and major software firms.
Our talent solutions include Permanent Placement, Contract Staffing, Executive Search, and Dedicated Recruiting Services (RPO). Find out more at
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).