×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

SRE​/LLM Ops Engineer; BE

Job in Town of Belgium, Belgium, Ozaukee County, Wisconsin, 53004, USA
Listing for: CluePoints
Full Time position
Listed on 2026-01-15
Job specializations:
  • IT/Tech
    AI Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: SRE/LLM Ops Engineer (BE)
Location: Town of Belgium

SRE/LLM Ops Engineer (BE) at Clue Points

Join to apply for this role at Clue Points . This position is open for applications.

Company Description

At CluePoints, we’re redefining how clinical trials are run. As the premier provider of Risk-Based Quality Management (RBQM) and Data Quality Oversight software, we harness advanced statistics, artificial intelligence, and machine learning to ensure the quality, accuracy, and integrity of clinical trial data, helping life sciences organisations bring safer, more effective treatments to patients faster.

Job Description

We’re proud to be an ambitious, fast‑growing technology scale‑up with a dynamic and diverse international team representing more than 40+ nationalities. Collaboration, flexibility, and continuous learning are part of our DNA.

At CluePoints, you’ll find a culture where you can grow, make an impact, and have fun along the way. Guided by our values of Care, Passion, and Smart Disruption
, we’re united by a shared mission: to create smarter ways to run efficient clinical trials and deliver AI‑powered insights that improve human outcomes worldwide.

The Role

The SRE, LLMOps (AI Platform) ensures our LLM‑powered services are reliable, observable, and safe in production on Azure and Kubernetes. You’ll blend classic SRE disciplines with LLM‑specific operations: robust evaluation pipelines, prompt/version governance, model/vendor failover, guardrails, and cost/performance monitoring. You know how to build automation with Lang Chain/Lang Graph, operate API‑based LLMs in production, and manage the inherent non‑determinism of models through rigorous testing and observability.

Job Requirements

What You’ll Bring
  • Experience:

    5+ years in SRE/Dev Ops/Platform Engineering with 1–2+ years operating LLM or ML‑backed applications in production (API‑based or hosted models).
  • LLMOps: hands‑on with Lang Chain/Lang Graph building end‑to‑end chains/agents and RAG flows; comfort with vector stores (e.g., Azure AI Search, Pinecone), prompt/version control, and dataset tooling.
  • Observability: proficiency instrumenting LLM traces and app telemetry, alert tuning, and root‑cause analysis; familiarity with Lang Smith and/or Arize Phoenix (token/cost tracking, latency, failure modes).
  • Cloud & Platform: strong Azure and Kubernetes (AKS) background;
    Git Ops (Flux/ArgoCD), Helm/Kustomize; CI/CD (Git Hub Actions/Git Lab/Jenkins);
    IaC (Terraform); secrets, networking, and security baselines.
  • Languages & tooling:
    Python (preferred) and one of Type Script/Go; REST/Graph

    QL;
    OpenAI/Azure OpenAI/Anthropic APIs; experience with Redis caches, message queues, and feature flags.
Job Responsibilities
  • Instrument deep observability: implement tracing for LLM chains/agents (inputs, outputs, token usage, latency, model/version), correlate with app metrics/logs, and set actionable alerts; leverage Lang Smith/Arize Phoenix (or similar) and Open Telemetry where appropriate.
  • Safety & guardrails: integrate content safety, PII redaction, jailbreak/prompt‑injection defenses, and policy‑based rails; document exceptions and reviewer workflows. Prefer native platform features (e.g., Azure AI Content Safety) or programmable rails (e.g., NVIDIA NeMo Guardrails).
  • Cost & capacity management: monitor token and request costs, throughput, and rate limits; implement caching, request shaping, and multi‑tier model selection to balance quality, latency, and spend.
  • Build evaluation & testing pipelines: create golden datasets and automated evals (offline + CI/CD + canary) to catch regressions from code, prompt, data, or model changes; use Lang Smith/OpenAI Evals (or equivalents) to track quality trends over time.
  • Platform operations on Azure/Kubernetes: ensure secure, compliant, and cost‑efficient operation; maintain IaC, secrets, networking, scaling, and DR/BCP; partner with Security and QA on regulated SaaS controls.
  • Cross‑functional enablement: work with product/dev teams to set acceptance criteria for AI features, add runtime feature flags/kill‑switches, and embed evals/telemetry from day one.
Job Benefits

What We Offer – Belgium
  • Health Insurance through Alan (100% hospitalisation cover, 80% ambulatory and dental)
  • Mobility…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary