×
Register Here to Apply for Jobs or Post Jobs. X

ML Ops Engineer

Job in Atlanta, Fulton County, Georgia, 30383, USA
Listing for: Southern Company
Full Time position
Listed on 2026-01-15
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 120000 - 160000 USD Yearly USD 120000.00 160000.00 YEAR
Job Description & How to Apply Below

Job Description:

ML Ops Engineer

Position Overview

The ML Ops Engineer will design and operate the production backbone for Southern Company’s AI Hub, ensuring AI and machine learning systems are deployed, monitored, and governed s role drives the enterprise-wide MLOps framework—establishing standards, lifecycle governance, and observability—while delivering secure, resilient production services and reusable AI products that accelerate innovation across operating companies. Success requires balancing rapid iteration with the reliability, safety, and compliance expected of a critical infrastructure enterprise.

Key Responsibilities

Operationalize AI and agentic systems. Build and maintain CI/CD pipelines for models, prompts, tools, and multi-agent workflows, enabling consistent promotion from experimentation to production.

Implement AI observability and reliability. Establish monitoring for agent behavior, model performance, drift, cost, and safety outcomes using logs, traces, metrics, and evaluators.

Enforce governance through automation. Embed guardrails, approvals, and policy-as-code into deployment pipelines, enabling compliant AI delivery without manual bottlenecks.

Manage model and agent lifecycle. Own versioning, rollout strategies (canary, shadow, rollback), and decommissioning for models, agents, and supporting tools.

Ensure platform resilience and scalability. Design runtime patterns that meet availability, latency, and fail-safe requirements, including degraded-mode and read-only behaviors for sensitive use cases.

Support multi-vendor and multi-cloud execution. Enable portable deployments across hyperscalers and model providers, minimizing lock-in while maintaining consistent operational controls.

Partner with engineering and data teams. Work closely with AI Architects, data engineers, and product squads to resolve production issues and continuously improve developer experience.

Qualifications

Educational Background:
Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or related field.

Experience:

Proven experience (5 plus years) in cloud engineering or Dev Ops with 2 plus years in MLOps or AI infrastructure, Data Engineering, ML Engineering, or similar role.

Domain Expertise

Experience operating machine learning and AI systems in regulated or mission-critical environments.

Strong understanding of ML lifecycle management, including experimentation, validation, deployment, monitoring, and retirement.

Familiarity with agentic AI runtime patterns, including orchestration, tool execution, and human-in the-loop controls.

Knowledge of enterprise AI governance, observability, and maturity models Manage model and agent lifecycle.

Individual Skills

Operational mindset with strong ownership and bias toward reliability and automation.

Ability to troubleshoot complex, distributed AI systems under production constraints.

Clear communicator who can translate operational risks into actionable improvements.

Continuous improvement orientation, balancing speed, safety, and cost.

Technical Expertise

Hands-on expertise with CI/CD and MLOps tooling (e.g., Git Hub Actions, Azure Dev Ops, Terraform).

Experience deploying and operating LLMs, agents, and inference services using containers and orchestration platforms (e.g., Kubernetes).

Proficiency in observability stacks for AI systems (logging, tracing, metrics, evaluation pipelines).

Strong grounding in cloud security and identity, including secrets management, network isolation, and least-privilege access.

Experience with enterprise model registries, feature stores, vector databases, and automated testing for AI workflows.

Deep expertise in Python. Experience with machine learning frameworks and libraries like PyTorch, or scikit-learn.

Experience with ML lifecycle tools like MLflow.

Cloud Platforms:
Experience with cloud computing services (Azure and GCP preferred) and their machine learning tools.

Preferred Qualifications

Certifications:

Relevant certifications in AI, ML, or data engineering.

Industry

Experience:

Experience in the energy sector is a plus.

Experience in multi-cloud environment is a plus

Experience designing reusable AI products, agents, and…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary