Gen AI Engineer
Listed on 2026-02-28
-
Software Development
AI Engineer, Machine Learning/ ML Engineer, Data Engineer
100% Remote
Job Title: AI/ML Engineer – Large Language Model Pretraining (100B+ Parameters)
Log-line:
Gen AI Engineer creating and developing LLMs from "Test to Production".
Location - West Coast 100% Remote
As a Gen AI Engineer, you will lead the pre-training of massive LLMs (100B+ parameters), requiring deep expertise in distributed training, large-scale optimization, and model architecture. This is a rare opportunity to work with petabyte-scale datasets and cutting-edge compute clusters in a high-impact environment.
Key Responsibilities- Architect and implement large-scale training pipelines for LLMs with 100B+ parameters.
- Optimize distributed training performance across thousands of GPUs/TPUs.
- Collaborate with research scientists to translate experimental results into production-grade training runs.
- Manage and preprocess petabyte-scale datasets for pretraining.
- Implement state-of-the-art techniques in scaling laws, model parallelism, and memory optimization.
- Conduct rigorous benchmarking, profiling, and performance tuning.
- Contribute to Client research in LLM architecture, training stability, and efficiency.
- Advanced degree (PhD or Master’s) in Computer Science, Machine Learning, or related field from a top 20 global university in CS.
- 3+ years of hands‑on experience with large-scale deep learning model training.
- Proven experience in pretraining models exceeding 10B parameters
, preferably 100B+. - Deep expertise in distributed training frameworks (
Deep Speed, Megatron‑LM, PyTorch FSDP, Tensor Flow Mesh, JAX/TPU
). - Proficiency with parallelism strategies (data, tensor, pipeline) and mixed precision training
. - Experience with large-scale cloud or HPC environments (
AWS, Azure, GCP, Slurm, Kubernetes, Ray
). - Strong skills in Python
, CUDA
, and performance optimization. - Strong publication record in top-tier ML/AI venues (NeurIPS, ICML, ICLR, ACL, etc.) preferred.
- Experience with LLM fine‑tuning (RLHF, LoRA, PEFT).
- Familiarity with tokenizer development and multilingual pretraining.
- Knowledge of scaling laws and model evaluation frameworks for massive LLMs.
- Hands‑on work with petabyte‑scale distributed storage systems
.
Verify: United States Employment Opportunities Only
E-Verify is an internet‑based system operated by the Department of Homeland Security and the Social Security Administration and allows employers to confirm an individual’s employment eligibility to work in the United States. Under the E‑Verify rules, effective September 8, 2009, federal agencies subject to the Federal Acquisition Regulation are required to modify, and include in new contracts, a provision that requires federal contractors and subcontractors to use E‑Verify.
ITCO Solutions is required to adhere to these requirements.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).