Digital - Principal SRE; AI Engineer
Columbus, Franklin County, Ohio, 43224, USA
Listed on 2026-03-08
-
IT/Tech
Cloud Computing, AI Engineer, SRE/Site Reliability
Description
The Digital - Principal SRE (AI Engineer) role is a position that blends expertise in artificial intelligence, machine learning, and reliability engineering. This professional is responsible for designing, deploying, and maintaining AI-driven solutions while ensuring the reliability, scalability, and performance of digital platforms and services. The ideal candidate will work closely with Digital SRE engineers, data scientists, Dev Ops, and operations teams to deliver robust, efficient, and automated systems that support business goals.
Job DescriptionSummary:
The IS Technical Specialist provides technical and consultative support on the most complex technical matters. This role typically reports to the Head of Digital SRE and may involve on-call responsibilities. The position provides opportunities to work on cutting‑edge AI solutions, collaborate with cross segment teams, and drive reliability for mission‑critical digital services.
Duties and Responsibilities:- Design, develop, and implement AI-driven systems and automation tools to enhance the reliability and efficiency of digital platforms.
- Monitor the health, availability, and performance of AI-enabled applications and infrastructure using SRE best practices.
- Collaborate with cross-functional teams to integrate machine learning models into production environments, ensuring seamless deployment and operation.
- Establish and enforce service-level objectives (SLOs), error budgets, and incident response procedures for AI-driven services.
- Identify, troubleshoot, and resolve complex incidents related to AI systems, leveraging observability and monitoring tools.
- Drive continuous improvement by analyzing post-incident reviews, automating manual tasks, and optimizing system performance.
- Stay up to date with advancements in AI, SRE, and cloud technologies, recommending innovative solutions to enhance digital reliability.
- Document processes and runbooks for operational transparency and knowledge sharing.
- AI Platform Integration:
Develop abstraction layers across AI providers (Google, OpenAI, etc.) to enable seamless integration and enablement. - Conduct design workshops, POCs, and code‑with sessions to shape data-driven agent workflows with stakeholders, fostering trust and adoption.
- Measure & Improve:
Define and use key metrics, test harnesses, and evaluation plans to measure agent accuracy, latency, safety, and cost effectiveness. - Knowledge Sharing:
Craft reusable patterns, documentation, and best practices to influence internal assets and client roadmaps.
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.
- 5+ years experience with AI/ML engineering, SRE, Dev Ops, or related roles.
- 5+ years experience programming skills in Python, Java, or similar languages, with experience in developing and deploying machine learning models.
- 5+ years hands‑on experience with cloud platforms (e.g., AWS, GCP) and containerization technologies (Docker, Kubernetes).
- Familiarity with observability tools (Prometheus, Grafana, ELK stack) and Service Now incident management platforms.
- Solid understanding of SRE principles: monitoring, alerting, SLOs, error budgets, and automation.
- 5+ years experience with infrastructure‑as‑code (Terraform, Ansible) and CI/CD pipelines.
- Excellent problem‑solving skills, attention to detail, and ability to work in a fast‑paced, collaborative environment.
- Strong communication and documentation abilities.
- Experience operationalizing large language models (LLMs) or generative AI systems in production settings.
- Background in MLOps, data engineering, and/or cloud‑native AI deployment.
- Knowledge of security best practices for AI and cloud infrastructure.
- Contributions to open source AI/SRE projects or relevant technical communities.
Exempt Status: (Yes = not eligible for overtime pay) (No = eligible for overtime pay)
Workplace Type:Office
Our Approach to Office Workplace TypeCertain positions outside our branch network may be eligible for a flexible work arrangement. We’re combining the best of both worlds: in‑office and work from home. Our approach enables our teams to deepen connections, maintain a strong community, and do their best work. Remote roles will also have the opportunity to come together in our offices for moments that matter. Specific work arrangements will be provided by the hiring team.
Huntington is an Equal Opportunity Employer.
Tobacco-Free Hiring Practice:
Visit Huntington's Career Web Site for more details.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).