Machine Learning Engineer – Fine-Tuning and -device AI Job Palo Alto area,California USA,IT/Tech

Position: Machine Learning Engineer – Fine-Tuning and On-device AI

Machine Learning Engineer – Fine-Tuning and On-device AI

HP IQ is HP’s new AI innovation lab. Combining startup agility with HP’s global scale, we’re building intelligent technologies that redefine how the world works, creates, and collaborates.

We’re assembling a diverse, world-class team—engineers, designers, researchers, and product minds—focused on creating an intelligent ecosystem across HP’s portfolio. Together, we’re developing intuitive, adaptive solutions that spark creativity, boost productivity, and make collaboration seamless.

We create breakthrough solutions that make complex tasks feel effortless, teamwork more natural, and ideas more impactful—always with a human-centric mindset.

By embedding AI advancements into every HP product and service, we’re expanding what’s possible for individuals, organisations, and the future of work.

Join us as we reinvent work, so people everywhere can do their best work.

About the Role

We are seeking a Machine Learning Engineer to lead the fine-tuning, optimization, and deployment of AI models for diverse tasks, with a strong emphasis on on-device inference. You will work on cutting‑edge applications such as orchestration, planning, multi‑agent coordination, and other intelligent decision‑making systems.

You will be responsible for adapting foundation models (LLMs, multimodal models) to specialized domains, making them fast, accurate, and efficient for resource‑constrained environments—while ensuring robustness and safety.

What You Might Do

Fine‑tune large language models, multimodal models, and task‑specific models for orchestration, planning, and any other workflows as defined.
Design and run experiments to improve task accuracy, robustness, and generalization.
Explore and apply methods like full fine‑tuning, LoRA, QLoRA and other types of parameter‑efficient fine‑tuning.
Employ advanced techniques such as QAT, DPO, GRPO to further improve the model quality.
On-Device Optimization
- Prune, quantize and compress models (e.g., INT8, INT4, mixed‑precision) for CPU, GPU, NPU and edge accelerators.
- Optimize models for low‑latency inference using frameworks like OpenVINO, ONNX Runtime, QNNetc..
- Build robust data pipelines for domain‑specific datasets, including synthetic data generation and annotation.
- Define evaluation metrics. Perform evaluations and analyze results.
- Establish best practices for versioning, reproducibility, and continuous improvement of model performance.
AI Orchestration & Planning
- Develop and refine models to support multi‑step reasoning, tool orchestration, and decision planning.
- Work with stakeholders on orchestrator architecture.
- Collaborate with product and research teams to design intelligent, context‑aware assistant capabilities.

Essential Qualifications

5+ years of experience in applied machine learning, including at least 3 years in LLM fine‑tuning.
Proficiency in Python and ML framework ecosystem (Hugging Face, PyTorch).
Strong understanding of transformer architectures, attention mechanisms, and PEFT techniques.
Experience with on‑device inference optimization (OpenVINO, ONNX, QNN).
Familiarity with orchestration/planning architectures and techniques for AI assistants.
Track record of delivering production‑ready ML solutions in latency‑sensitive environments.

Preferred Qualifications

Experience with multi‑agent systems or AI assistant orchestration.
Familiarity with advanced inference optimization techniques such as KV cache paging, flash attention.
Knowledge about common inference engines, including but not limited to llama.cpp, vLLM.

Salary Range: $120,000 - $215,000

Compensation & Benefits (Full-Time Employees)

The salary range for this role is listed above. Final salary offered is based upon multiple factors including individual job‑related qualifications, education, experience, knowledge and skills.

Health insurance
Vision insurance
Long term/short term disability insurance
Employee assistance program
Flexible spending account
Life insurance
Generous time off policies, including
- 4-12 weeks fully paid parental leave based on tenure
- 11 paid holidays
- Additional flexible paid vacation and sick leave (US benefits overview)

HP IQ is HP’s new AI innovation lab,…


Increase/decrease your Search Radius (miles)



Job Posting Language