Sr. AI Platform Engineer Job New York New York USA,Software Development

Location: New York

Location

501, Fifth Avenue, Suite 805 New York, NY 10017

Minimum Qualifications

8+ years of experience as a Platform Engineer (Site Reliability / Dev Ops), with at least 3+ years in AI/ML platform development (MLOps).
Deep expertise in Python, with strong design and debugging skills.
Ability to work independently and lead complex projects with excellent problem‑solving, analytical, and communication skills.
Proficiency with cloud platforms such as AWS, GCP, or Azure and familiarity with MLOps/AI Dev Ops tools like MLflow or Kubeflow, proficient in CI/CD, infrastructure as code (Terraform / Cloud Formation).
Hands‑on expertise with CI/CD pipelines, model observability, and incident response for AI/ML services.

Preferred Qualifications

Experience implementing and optimizing platforms supporting large language model (LLM) pipelines with frameworks such as Lang Chain, Llama Index, Hugging Face Transformers, or similar.
Hands‑on knowledge of scaling & setting up vector database platforms such as Qdrant (or other vector DBs like Pinecone, Weaviate) for semantic search and embeddings management.
Exposure to MLOps tools, Ray.io, Anyscale, or other distributed orchestration & inference frameworks.
Experience with developing and deploying containerized applications using Docker and Kubernetes, including Helm charts and automated scaling.
Understanding of LLMOps patterns — model registry, prompt versioning, and feedback loops.

Responsibilities

Platform Design and Architecture: build and operate a highly available, scalable, modular AI platform using technologies such as Qdrant, Anyscale, and Ray to support LLM orchestration, vector search, and multi‑agent frameworks.
Core Infrastructure Development: build essential APIs and infrastructure to power conversational applications, AI agents, and analytics tools.
LLM Operational Solutions: implement workflows for large language models, including inference pipelines, fine‑tuning, caching, and evaluation for open‑weight and hosted models.
Deployment & Performance Optimization: deploy AI services on AWS with Kubernetes (EKS), Lambda, and ECS, ensuring scalability and resilience while optimizing vector databases and model runtimes for cost and performance.
Collaboration, Governance, & Mentorship: partner with engineering teams and research teams to deliver production‑grade, self‑healing, and performance‑optimized services for AI/RAG pipelines, establish governance/security standards, and mentor junior engineers in AI infrastructure best practices & reviews.

Job Title

Sr. AI Platform Engineer

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language