Senior Cloud Operations Engineer
Listed on 2026-01-12
-
IT/Tech
Systems Engineer -
Engineering
Systems Engineer
Senior Cloud Operations Engineer Level 3
Department:
Reports To:
Director of Engineering
FLSA Status: Exempt
Position SummaryThe Senior Cloud Operations Engineer is responsible for the stability, performance, and operational excellence of our SaaS environments. This role will lead day-to-day cloud operations, drive automation, reduce operational risk, and partner closely with engineering, security, and product teams to ensure our platform is reliable, secure, and cost-efficient. The ideal candidate combines strong technical depth with disciplined operational practices, excellent judgment during incidents, and a bias toward automation.
Essential Duties and Responsibilities- Cloud Operations & Reliability
- Maintain and improve production, staging, and development environments deployed in Kubernetes on AWS.
- Implement and manage monitoring, alerting, logging, and observability frameworks.
- Lead incident response, postmortems, and continuous improvement initiatives.
- Own backup, restoration, disaster recovery, and business continuity practices.
- Perform capacity planning and performance tuning.
- Build and maintain Infrastructure-as-Code (Terraform, Pulumi, or equivalent).
- Automate provisioning, configuration, and environment lifecycle management.
- Standardize repetitive tasks and eliminate operational toil.
- Manage secrets, configuration, and environment versioning across environments.
- Security & Compliance (In Partnership with Security)
- Enforce least-privilege IAM and cloud guardrails.
- Support vulnerability management, patching workflows, and dependency hygiene.
- Contribute to compliance readiness (SOC 2, ISO 27001, HIPAA, etc.).
- Ensure logging, retention, and audit practices are consistently applied.
- Track usage and optimize cloud spend across services and environments.
- Implement tagging, budgets, alerts, and showback/visibility mechanisms.
- Recommend architecture or tooling changes to control cost without sacrificing performance.
- Collaboration & Leadership
- Partner with developers to improve reliability and deployability of services.
- Mentor engineers on cloud operations best practices.
- Contribute to runbooks, documentation, and onboarding materials.
- Advocate for operational excellence, change management, and risk reduction.
- Adaptability - Demonstrates persistence and overcomes obstacles. Measures self against standard of excellence. Recognizes and acts on opportunities. Sets and achieves challenging goals. Takes calculated risks to accomplish goals.
- Communications - Exhibits good listening and comprehension. Expresses ideas and thoughts in written form. Expresses ideas and thoughts verbally. Keeps others adequately informed. Selects and uses appropriate communication methods.
- Continuous Learning - Assesses own strengths and weaknesses. Pursues training and development opportunities. Seeks feedback to improve performance. Shares expertise with others. Strives to continuously build knowledge and skills.
- Design – Applies design principles. Demonstrates attention to detail. Generates creative solutions. Translates concepts and information into images and diagrams. Uses feedback to modify designs.
- Initiative – Asks for help when needed. Looks for and takes advantage of opportunities. Seeks increased responsibilities. Takes independent actions and calculated risks. Undertakes self-development activities. Volunteers readily.
- Innovation – Develops innovative approaches and ideas. Displays original thinking and creativity. Generates suggestions for improving work. Meets challenges with resourcefulness.
- Problem Solving - Develops alternative solutions. Gathers and analyzes information skillfully. Identifies problems in a timely manner. Resolves problems in early stages. Works well in group problem solving situations.
- Team Leadership – Acknowledges team accomplishments. Defines team roles and responsibilities. Ensures progress toward goals. Fosters team cooperation. Supports group problem solving.
To perform this job successfully, an individual must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).