More jobs:
Technical Lead; DevOps & Infrastructure Focus - Vice President
Job in
Mississauga, Ontario, Canada
Listed on 2026-01-11
Listing for:
Citibank (Switzerland) AG
Full Time
position Listed on 2026-01-11
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing
Job Description & How to Apply Below
## For additional information, please review .* ##
** Design & Implementation:
** Lead the design, implementation, and ongoing management of secure, scalable, and resilient infrastructure components.* ##
** Secret & Certificate Management:
** Administer and maintain secret and certificate management solutions using Hashi Corp Vault, including policy definition and integration.* ##
** Database Management:
** Perform hands-on administration and optimization of database systems (Postgre
SQL, Oracle, Mongo
DB), including performance tuning, backup, and recovery strategies.* ##
** Workflow Orchestration:
** Deploy, monitor, and troubleshoot data orchestration workflows using Apache Airflow, and develop/optimize DAGs.* ##
** Messaging Systems:
** Implement and manage messaging queues such as Kafka and IBM MQ, including cluster setup and configuration.* ##
** API Integrations:
** Develop, maintain, and troubleshoot RESTful API and SOAP integrations critical for system connectivity.* ##
** Build Automation:
** Implement and optimize build and deployment processes using Gradle.* ##
** Container Orchestration:
** Design, implement, and manage container orchestration platforms with Kubernetes and Helm, including integration with Cyber Ark and Hashi Corp for secrets management. Create, debug, and troubleshoot Kubernetes PODs, Jobs, and Deployments using YAML.* ##
** Storage Management:
** Configure and manage persistent storage solutions including PVC, SONiC NAS, and S3, with an awareness of storage requirements for AI/ML workloads.* ##
** Networking & Load Balancing:
** Set up and maintain load balancing solutions (e.g., Nginx, HAProxy, AWS ELB/ALB, Kubernetes Ingress controllers) for high availability and performance.* ##
** Monitoring & Logging:
** Implement, configure, and utilize comprehensive monitoring and logging solutions (Prometheus, Grafana, ELK Stack) to ensure system health and proactively identify issues, including those relevant to AI/ML applications.* ##
** Automation & Scripting:
** Develop robust automation scripts and tools using Python, Bash, Go, or similar languages to streamline operations and enhance efficiency.* ##
** Incident Response:
** Participate actively in on-call rotations, responding to and resolving critical incidents with hands-on troubleshooting.* ##
** Documentation:
** Create and maintain technical documentation, architecture diagrams, and runbooks for infrastructure components and processes.* ##
** Agile Facilitation:
** Facilitate all Scrum ceremonies (Sprint Planning, Daily Scrum, Sprint Review, Sprint Retrospective) for the Dev Ops/Infrastructure engineering team.* ##
** Technical
Coaching:
** Coach the team on advanced engineering practices, self-organization, cross-functionality, and continuous improvement in the context of infrastructure development, including support for AI/ML initiatives.* ##
** Impediment Resolution:
** Proactively identify and resolve technical impediments and process bottlenecks within the team and across organizational boundaries, paying special attention to unique challenges posed by AI/ML infrastructure.* ##
** Backlog Refinement:
** Collaborate closely with stakeholders (e.g., product owners, technical leads) to ensure a well-defined and prioritized backlog for infrastructure work, technical debt, operational improvements, and AI/ML platform needs.* ##
** Process Improvement:
** Drive continuous improvement in the team's agile and Dev Ops practices, helping them adapt and optimize their workflow for maximum efficiency and quality.* ##
** Team Shielding:
** Protect the team from external distractions, allowing focused time for hands-on engineering work.* ##
** Secret & Certificate Management:
** Proven hands-on experience with Hashi Corp Vault (installation, configuration, policy management, integrations).* ##
** Database Administration:
** Strong hands-on experience with at least two of Postgre
SQL, Oracle, or Mongo
DB (installation, tuning, replication, backup/restore).* ##
** Workflow Orchestration:
** Hands-on experience deploying, managing, and developing DAGs for Apache Airflow.* ##
** Messaging Systems:
** Solid hands-on experience with Kafka and/or IBM MQ (cluster setup, topic management, producer/consumer configuration).* ##
** Container Orchestration:
** In-depth hands-on experience with Kubernetes and Helm, including YAML configuration, troubleshooting PODs/Jobs/Deployments, and integrations with secrets management (Cyber Ark, Hashi Corp).* ##
** Storage Management:
** Practical experience with Kubernetes PVCs, Persistent Volumes, S3, and/or enterprise NAS solutions (e.g., SONiC NAS).* ##
** Monitoring & Logging:
** Strong hands-on experience with Prometheus, Grafana, and the ELK Stack (setup, dashboard creation, query optimization, alert configuration).* ##
** Scripting & Automation:
** High proficiency in Python, Bash, or Go for automation, tooling development, and system administration.* ##
** Cloud Platforms:
** Extensive hands-on experience with at least one major…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×