More jobs:
Job Description & How to Apply Below
Our client is a leading software solutions organisation, who specialise in empowering businesses with innovative technology solutions by providing SaaS solutions tailored to business needs.
They are currently hiring for an
AI Engineer to own the end-to-end lifecycle of AI features, from data ingestion and RAG setup to fine-tuning, evaluation, deployment, and continuous improvement, so they can ship reliable, cost-effective AI products.
Location:
On-site (Dubai, UAE)
Type:
Full-time
- Design and implement RAG pipelines (chunking, embeddings, vector stores, retrieval strategies) using tools like Ollama, Lang Chain, Llama Index, or equivalent.
- Stand up local and cloud LLM orchestration (prompt routing, tool use, function calling, guards) with strong observability.
- Run fine-tuning / LoRA / adapters; build data re-entry loops to capture outputs and feedback for secondary retraining.
- Create robust prompt engineering patterns (templates, guards, evals, versioning) and latency/cost controls.
- Build evaluation suites (RAGAS, custom golden sets, offline + online A/B tests) and quality dashboards.
- Productionize models with MLOps best practices (CI/CD, model registries, feature stores, experiment tracking).
- Ensure privacy, safety, and compliance (PII handling, red-teaming, prompt injection defenses, content filters).
- Collaborate with PM/Design/Eng to scope features and deliver increments quickly.
- 3+ years shipping ML/AI systems, including at least one production RAG deployment.
- Hands‑on with Lang Chain/Ollama (or similar), vector DBs (Pinecone, Weaviate, Milvus, pgvector), and embedding models.
- Experience with fine‑tuning (HF Transformers, PEFT/LoRA) and dataset curation/cleaning.
- Strong Python skills; solid grasp of APIs, microservices, and async patterns.
- Familiar with LLM evals, metrics (precision@k, faithfulness, groundedness), and cost/perf tuning.
- Cloud/containerization:
Docker, any of AWS/GCP/Azure, basic GPU/accelerator know‑how.
- Llama 3/4, Mistral, OpenAI/Anthropic APIs;
Guardrails/Gandalf;
Weights & Biases or ML. - Feature stores, Kafka, Airflow; security hardening and secret management.
- Basic front‑end to prototype admin/eval tools (React/Next.js).
- RAG answer quality (e.g., >85% groundedness on eval set) and unit cost over time.
- P50 latency within target; model incidence rate (hallucinations, jailbreaks) MoM.
- Time‑to‑ship for new datasets/features.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×