More jobs:
Job Description & How to Apply Below
Job Description
Join a dynamic team shaping the tech backbone of our operations, where your expertise fuels seamless system functionality and innovation.
As a Technology Support II team member in Asset & Wealth Management, you will play a vital role in ensuring the operational stability, availability, and performance of our production application flows. Your efforts in troubleshooting, maintaining, identifying, escalating, and resolving production service interruptions for all internally and externally developed systems support a seamless user experience and a culture of continuous improvement.
Job Responsibilities
Develops automation scripts and tools in Python to streamline operations and incident response.
Implements and maintain observability solutions (e.g., Prometheus, Grafana, Cloud Watch) to monitor system health, collect metrics, and enable proactive incident detection. Defines, measure, and report on Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services.
Supports the deployment, monitoring, and reliability of Large Language Model (LLM) applications and systems utilizing Model Context Protocol (MCP).
Ensures high availability and reliability of applications running on modern infrastructure such as AWS, Kubernetes, and related cloud-native platforms, and batch processing environments.
Deploys, monitor, and troubleshoot workloads on AWS, leveraging cloud-native services.
Manages, monitor, and optimize batch jobs using schedulers like Autosys and Control-M.
Writes and optimize SQL queries for data extraction, transformation, and reporting.
Participates in on-call rotations, respond to production incidents, and drive root cause analysis and postmortems. Works closely with data science, engineering, and operations teams to support AI/ML model deployment, LLM workflows, and batch processes.
Identifies reliability gaps and drive initiatives to improve system resilience, scalability, and efficiency.
Required Qualifications , Capabilities, And Skills
Formal training or certification on software engineering concepts and 2+ years applied experience
Proven experience as an SRE, Dev Ops Engineer, or similar role supporting AI/ML, LLM, and batch processing environments.
Exposure to Large Language Models (LLMs) and Model Context Protocol (MCP). Proficiency in Python for automation and scripting.
Strong knowledge of AWS cloud services and infrastructure.
Experience with SQL and relational databases.
Hands-on experience with job schedulers (Autosys, Control-M, or similar).
Familiarity with observability and telemetry tools (e.g., Prometheus, Grafana, Cloud Watch, Datadog).
Understanding of SLI/SLO concepts and their application in production environments.
Solid troubleshooting and incident management skills.
Preferred Qualifications , Capabilities, And Skills
Experience supporting AI/ML and LLM model deployment and monitoring.
Exposure to containerization and orchestration (Docker, Kubernetes).
Knowledge of CI/CD pipelines and infrastructure as code (Terraform, Cloud Formation).
Experience with other cloud platforms (Azure, GCP) is a plus.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×