Software Engineer - AI/ML Infra
Job in
Chevy Chase, Montgomery County, Maryland, 20815, USA
Listed on 2025-12-01
Listing for:
GEICO
Full Time
position Listed on 2025-12-01
Job specializations:
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Cloud Computing, Data Engineer
Job Description & How to Apply Below
GEICO .
For more information, please .
** At GEICO, we offer a rewarding career where your ambitions are met with endless possibilities.
**** Every day we honor our iconic brand by offering quality coverage to millions of customers and being there when they need us most. We thrive through relentless innovation to exceed our customers’ expectations while making a real impact for our company through our shared purpose.
**** When you join our company, we want you to feel valued, supported and proud to work here. That’s why we offer The GEICO Pledge:
Great Company, Great Culture, Great Rewards and Great Careers.
** GEICO AI platform and Infrastructure team is seeking an exceptional Senior ML Platform Engineer to build and scale our machine learning infrastructure with a focus on Large Language Models (LLMs) and AI applications. This role combines deep technical expertise in cloud platforms, container orchestration, and ML operations with strong leadership and mentoring capabilities. You will be responsible for designing, implementing, and maintaining scalable, reliable systems that enable our data science and engineering teams to deploy and operate LLMs efficiently candidate must have excellent verbal and written communication skills with a proven ability to work independently and in a team environment.
KEY RESPONSIBILITIES ML Platform & Infrastructure
* Design and implement scalable infrastructure for training, fine-tuning, and serving open source LLMs (Llama, Mistral, Gemma, etc.)
* Architect and manage Kubernetes clusters for ML workloads, including GPU scheduling, autoscaling, and resource optimization
* Design, implement, and maintain feature stores for ML model training and inference pipelines
* Build and optimize LLM inference systems using frameworks like vLLM, Tensor
RT-LLM, and custom serving solutions
* Ensure 99.9%+ uptime for ML platforms through robust monitoring, alerting, and incident response procedures
* Design and implement ML platforms using Data Robot, Azure Machine Learning, Azure Kubernetes Service (AKS), and Azure Container Instances
* Develop and maintain infrastructure using Terraform, ARM templates, and Azure Dev Ops
* Implement cost-effective solutions for GPU compute, storage, and networking across Azure regions
* Ensure ML platforms meet enterprise security standards and regulatory compliance requirements
* Evaluate and potentially implement hybrid cloud solutions with AWS/GCP as backup or specialized use cases
DevOps & Platform Engineering
* Design and maintain robust CI/CD pipelines for ML model deployment using Azure Dev Ops, Git Hub Actions, and MLOps tools
* Implement automated model training, validation, deployment, and monitoring workflows
* Set up comprehensive observability using Prometheus, Grafana, Azure Monitor, and custom dashboards
* Continuously optimize platform performance, reducing latency and improving throughput for ML workloads
* Design and implement backup, recovery, and business continuity plans for ML platforms
Technical Leadership & Mentoring
* Mentor junior engineers and data scientists on platform best practices, infrastructure design, and ML operations
* Lead comprehensive code reviews focusing on scalability, reliability, security, and maintainability
* Design and deliver technical onboarding programs for new team members joining the ML platform team
* Establish and champion engineering standards for ML infrastructure, deployment practices, and operational procedures
* Create technical documentation, runbooks, and deliver internal training sessions on platform capabilities
Cross-Functional Collaboration
* Work closely with data scientists to understand requirements and optimize workflows for model development and deployment
* Collaborate with product engineering teams to integrate ML capabilities into customer-facing applications
* Support research teams with infrastructure for experimenting with cutting-edge LLM techniques and architectures
* Present technical solutions and platform roadmaps to leadership and cross-functional stakeholders
REQUIRED QUALIFICATIONS Experience & Education
* Bachelor’s degree in computer science, Engineering, or related technical field (or…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×