×
Register Here to Apply for Jobs or Post Jobs. X

Senior MLOps​/LLMOps Engineer

Job in Town of Poland, Jamestown, Chautauqua County, New York, 14701, USA
Listing for: Exadel open positions
Full Time position
Listed on 2026-01-12
Job specializations:
  • IT/Tech
    Cloud Computing, Data Engineer
Job Description & How to Apply Below
Location: Town of Poland

We’re an AI-first global tech company with 25+ years of engineering leadership, 2,000+ team members, and 500+ active projects powering Fortune 500 clients, including HBO, Microsoft, Google, and Starbucks.

From AI platforms to digital transformation, we partner with enterprise leaders to build what’s next.

What powers it all? Our people are ambitious, collaborative, and constantly evolving.

About the Client

The customer is one of the largest online gambling companies in the world, with over 26 million clients across all markets. The company was founded in 1997 and listed on Nasdaq Stockholm in 2004. They are committed to offering their clients the best possible deal and user experience, while ensuring a safe and fair gambling environment.

What You’ll Do

Platform & Deployment

  • Manage and evolve ML/LLM infrastructure on Kubernetes/EKS (CPU/GPU) for multi-tenant workloads across AWS/Azure, ensuring region‑aware scheduling, cross‑region access, and artifact management
  • Provision cloud environments, maintain deployment workflows, and build Git Ops‑native pipelines (Git Lab CI, Jenkins, ArgoCD, Helm, FluxCD) for fast, safe rollouts

LLM Operations & Optimization

  • Deploy, scale, and optimize LLMs (GPT, Claude, etc.) with attention to prompt engineering, performance, and cost
  • Operate Argo Workflows for data prep, model training, and batch compute, and track model performance and drift via AI observability frameworks

CI/CD & Infrastructure as Code

  • Design robust CI/CD pipelines across dev, staging, and production. Implement IaC with Terraform, Cloud Formation, and Helm
  • Manage container orchestration, secrets, and secure deployments

Observability & Reliability

  • Set up monitoring with Prometheus/Grafana, Splunk, Cloud Watch, and ELK
  • Implement alerting strategies, troubleshoot production issues, and ensure high availability

Data Platform & Reproducibility

  • Build and maintain data pipelines and platforms (Apache Iceberg) for reproducible ML experiments, lineage tracking, and automated governance
  • Collaborate with data engineers for seamless integration with model training workflows

Developer Experience & Enablement

  • Create APIs, CLIs, and UIs for self‑serve infrastructure. Provide documentation, templates, and best practices
  • Treat the ML platform as a product, gathering feedback and improving usability

Architecture, Security & Governance

  • Define scalable, secure, and compliant platform architecture. Implement Fin Ops practices, cost monitoring, and multi‑tenant optimization
  • Drive CI/CD culture and continuous improvement across teams
What You Bring
  • 8+ years in Dev Ops, Platform Engineering, or SRE, including 2+ years in MLOps/LLMOps
  • Hands‑on experience with AWS (Bedrock, S3, EC2, EKS, RDS/Postgre

    SQL, ECR, IAM, Lambda, Step Functions, Cloud Watch) and Kubernetes workloads, including GPU, autoscaling, and multi‑tenant configurations
  • Skilled in container orchestration, secrets management, and Git Ops deployments (Jenkins, ArgoCD, FluxCD)
  • Experience deploying and scaling LLMs (GPT, Claude-family), with prompt engineering and performance optimization
  • Strong Python skills (FastAPI, Django, Pydantic, boto3, Pandas, Num Py) and solid ML framework knowledge (scikit‑learn, Tensor Flow, PyTorch)
  • Proficient in building reproducible data pipelines, IaC (Terraform, Cloud Formation, Helm), CI/CD pipelines, and observability (Prometheus/Grafana, Splunk, Datadog, Open Telemetry)
  • Strong networking, security, and Linux fundamentals. Excellent communicator, self‑motivated, and focused on improving developer experience
Nice to have
  • Experience with distributed compute frameworks such as Dask, Spark, or Ray
  • Familiarity with NVIDIA Triton, Torch Serve, or other inference servers
  • Experience with ML experiment tracking platforms like Weights & Biases, MLflow, or Kubeflow
  • Fin Ops best practices and cost attribution strategies for multi‑tenant ML infrastructure
  • Exposure to multi‑region and multi‑cloud designs, including dataset replication strategies, compute placement, and latency optimization
  • Experience with LakeFS, Apache Iceberg, or Delta Lake for data versioning and lakehouse architectures
  • Knowledge of data transformation tools such as DBT
  • Experience with data pipeline…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary