More jobs:
AI-Ops Engineer
Job in
Fremont, Alameda County, California, 94537, USA
Listed on 2026-03-04
Listing for:
CYNET SYSTEMS
Full Time
position Listed on 2026-03-04
Job specializations:
-
IT/Tech
Cloud Computing, Data Engineer
Job Description & How to Apply Below
Job Description :
Pay Range: $55hr - $60hr
Experience Requirements:
- 5+ years in IT Operations, Data Engineering, or related fields.
- Experience in Azure Data Services, ETL/ELT processes, and ITIL-based operations.
- 2+ years in AIOps implementation, monitoring, and automation.
- Bachelor s or master s degree in computer science, Engineering, or a related field.
Skills:
- Client, Power Shell, or Python, Alerts & Logs Monitoring, Confluence and Share Point
- Basic Understanding of Azure Data Services (ADF, Synapse, Databricks).
- Experience in monitoring alerts from data pipelines (Azure Data Factory, Synapse, ADLS, MS Fabric etc.)
- Familiarity with ETL/ELT concepts, data validation, and pipeline orchestration.
- Experience in identifying failures in ETL jobs, scheduled loads, and streaming data services.
- Hands-on experience with IT monitoring tools (e.g. Client, Azure Monitor, Dynatrace, or similar tools).
- Skilled in creating and updating runbooks and SOPs.
- Familiarity with data refresh cycles, batch vs. streaming differences.
- Familiarity with ITIL processes for incident, problem, and change management.
- Strong attention to detail, ability to follow SOPs, and effective communication for incident updates.
- Solid understanding of containerized services (Docker/Kubernetes) and Dev Ops pipelines (Azure Dev Ops, Git Hub Actions), always with an eye on data layer integration.
- Proficiency in Jira, Confluence and SharePoint for status updates and documentation.
- Understanding of scripting (Power Shell, Python, or Shell) for basic automation tasks.
- Ability to interpret logs and detect anomalies proactively.
- Analytical thinking for quick problem identification and escalation.
- Exposure to CI/CD for data workflows, real-time streaming (Event Hub, Kafka).
- Understanding of Data governance and compliance basics.
- Experience with anomaly detection, time-series forecasting, and log analysis.
- Monitor and support data pipelines on Azure Data Factory, Databricks, and Synapse.
- Perform incident management, root-cause analysis for L1 issues, and escalate as needed.
- Surface issues clearly & escalate to appropriate SME teams so they can be fixed at the root avoid repetitive short fixes.
- Identify whether issues are at pipeline level, data source level, or infrastructure level and route accordingly.
- Document incident resolution patterns for reuse.
- Acknowledge incidents promptly and route them to the correct team.
- Execute daily health checks, maintain logs, and update system status in collaboration tools.
- Work strictly as per SOPs documented by the team.
- Maintain and update SOPs, runbooks, and compliance documentation.
- Update system health status every 2 hours during the shift in Confluence or SharePoint.
- Update incident status every 4 hours for P1/P2 tickets.
- Complete service tasks on time as per SLA to release queues quickly.
- Ensure compliance with enterprise data security, governance, and regulatory requirements.
- Collaborate with data engineers, analysts, Dev Ops/SRE teams and business teams to ensure reliability and security.
- Implement best practices in ML operations and productionization.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×