Principal DevOps Engineer Job Redwood City area,California USA,IT/Tech

Principal Dev Ops Engineer (Multi-Cloud)

We are looking for a Principal Dev Ops Engineer to be a technical leader and the main reason for our multi-cloud operations strategy. This isn't just about maintaining systems; it's about shaping the future of our global infrastructure. You will architect the scalable, secure, and available foundation that powers our platform across AWS, Azure, GCP, and private clouds. You will have the autonomy to innovate, the goal of improving, and the opportunity to mentor a team of engineers.

If you are passionate about owning the uptime of complex distributed systems, we want to talk to you.

You will report to Sr. Staff Product Operations Engineer, Cloud operations.

Technology You'll Use

AWS, Azure, GCP

Your Role Responsibilities? Here's What You'll Do

Architect & Strategize:
Lead the design of our next-generation deployment architecture for a microservices-based platform. Drive technological choices for team tooling and infrastructure, ensuring long-term scalability and reliability.
AIOPS:
Implement AIOps frameworks to improve operational tasks and enhance system self-healing capabilities.
Develop CI/CD Pipelines:
Design, manage, and increase our CI/CD pipelines using tools like Jenkins, Git, and Git Hub to allow rapid, reliable, and automated software delivery.
Ensure Uptime:
Take ultimate ownership of our production environment's stability. Lead end-to-end incident management, from escalation to Root Cause Analysis (RCA). Manage patching, upgrades, and disaster recovery processes. You will include participation in a 24x7 on-call rotation to support critical uptime.
Automate & Operate:
Engineer and own a world-class observability stack (e.g.,Prometheus, Grafana, Cloud Watch, ELK). Develop automation scripts and frameworks to streamline operational tasks and enhance system self-healing capabilities.
Mentor & Lead:
Act as a technical leader and mentor for the team. Share your expertise, establish best practices, and improve the technical capabilities of the entire team.
You are with a deep passion for solving complex infrastructure and scalability challenges in a distributed systems environment.
You have, demonstrated by experience the uptime and reliability of critical production systems.
You are an adept cross-cultural collaborator, while in a distributed, multicultural team environment (France/India).
You are a disciplined who is responsible, in a remote or hybrid work model.

What We'd Like to See

AIOps & Experienced Automation: experience using observability data for AIOps programmes. Familiarity with applying statistical analysis or machine learning models for predictive monitoring, anomaly detection, and automated root cause analysis.
Infrastructure as Code (IaC):
Mastery of tools like Terraform or Cloud Formation. Experience with configuration management tools like Ansible, Chef, or Puppet.
Scripting & Automation:
Expert-level proficiency in at least one scripting language (Python, Bash, Mongo

DB Queries) with a portfolio of successful automation projects.
CI/CD:
Deep experience building CI/CD pipelines and deployment tools (Jenkins, Git, Git Hub).Observability:
Hands‑on experience building monitoring/logging for distributed systems (Prometheus, Grafana, Cloud Watch).
Containerization: understanding and practical experience with Docker and Kubernetes (or other orchestrators).Networking & OS: understanding of Unix/Linux fundamentals and advanced TCP/IP networking concepts (DNS, Load Balancers, Firewalls, VPC/VNet).

Role Essentials

Bachelor of Science (BSc) degree in Engineering, Computer Science, or a related technical field.
8+ years of progressive experience in Dev Ops, SRE, or Cloud Platform Engineering, with at least 3 years in a senior role managing large-scale production environments.
Deep, hands‑on expertise in at least one major public cloud (AWS, Azure, or GCP) and production experience with at least one other. Experience with OCI cloud.
Experience supporting microservices-based architectures in a production environment.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language