×
Register Here to Apply for Jobs or Post Jobs. X

AI-Ops Engineer

Job in Palo Alto, Santa Clara County, California, 94306, USA
Listing for: LeadStack Inc.
Full Time position
Listed on 2026-01-10
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, AI Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 50 - 60 USD Hourly USD 50.00 60.00 HOUR
Job Description & How to Apply Below

This range is provided by Lead Stack Inc. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$50.00/hr - $60.00/hr

Position Overview
  • The AI-Ops Engineer is a key technical contributor responsible for evolving traditional Dev Ops into AI
    - Ops s role leverages AI and machine learning to automate and enhance IT operations including performance monitoring, anomaly detection, root‑cause analysis, and automated remediation.
  • Working at the intersection of cloud infrastructure, AI‑driven automation, and operational excellence, the engineer embeds intelligence into infrastructure, deployment, and monitoring to ensure high availability, predictive issue resolution, and operational efficiency across CGOE's global online programs.
Key Responsibilities AI-Driven Operations & Automation
  • Implement AIOps solutions that use ML algorithms to automate performance monitoring, workload scheduling, and infrastructure management.
  • Build anomaly detection systems that identify infrastructure issues before they impact users.
  • Develop automated root‑cause analysis capabilities using ML to correlate events and filter noise from critical alerts.
  • Create predictive maintenance workflows that analyze historical patterns to proactively mitigate issues.
  • Design and implement automated remediation scripts that respond to incidents without human intervention.
Observability & Intelligent Monitoring
  • Architect comprehensive observability platforms that aggregate data from disparate sources into unified dashboards.
  • Implement intelligent alerting systems using NLP and ML to reduce alert fatigue and surface actionable insights.
  • Build real‑time analytics dashboards for coordinated diagnosis across teams.
  • Deploy application performance monitoring (APM) solutions integrated with AI‑driven analytics. Ensure end‑to‑end visibility across cloud infrastructure, applications, and AI/ML workloads.
  • Design, build, and maintain scalable, secure AWS infrastructure using Infrastructure as Code (Cloud Formation, Terraform, or CDK).
  • Implement and manage containerized environments using Docker, AWS ECS, Fargate, and Kubernetes (EKS).
  • Build CI/CD pipelines for continuous delivery, integrating AI‑powered code quality and deployment optimization.
  • Manage cloud automation and optimization to improve cost‑efficiency and resource utilization.
  • Ensure compliance with Stanford and regulatory standards (FERPA, GDPR) for secure data handling and governance.
  • Partner with cross‑functional teams to implement domain‑agnostic AIOps solutions across the organization.
  • Use Git‑based version control and code review best practices as part of a collaborative, agile workflow.
  • Document operational procedures, runbooks, and AIOps workflows for team knowledge sharing.
  • Continuously evaluate and adopt emerging AIOps tools, AWS services, and AI‑driven automation technologies.
  • Contribute to building an AI‑first operational culture that prioritizes automation and predictive capabilities.
Education & Certifications
  • Bachelor’s degree in computer science, Dev Ops, Cloud Engineering, or a related field (Master's preferred).
  • AWS certification preferred (Solutions Architect, Sys Ops Administrator, or Dev Ops Engineer);
    Professional‑level certification a plus.
Experience
  • 3+ years of experience in Dev Ops, SRE, or Cloud Engineering roles.
  • 2+ years of hands‑on experience with AWS infrastructure (EC2, ECS, Lambda, S3, IAM, VPC). Experience implementing monitoring, observability, and alerting solutions at scale.
  • Familiarity with ML/AI concepts and their application to operational automation.
Technical Skills
  • Languages: Python (required);
    Bash, Go, or Type Script preferred.
  • AIOps & Monitoring: Cloud Watch, X‑Ray, Prometheus, Grafana, Datadog, or Splunk with ML capabilities.
  • Infrastructure as Code: AWS Cloud Formation, Terraform, or AWS CDK.
  • CI/CD Tools: Git Hub Actions, AWS Code Pipeline, Jenkins, or Git Lab CI.
  • Data & Analytics: Experience with log aggregation, metrics analysis, and event correlation platforms.
Seniority Level

Associate

Employment Type

Contract

Job Function

Other

Industries

Higher Education

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary