Senior SRE; Storage Platforms
Listed on 2026-03-12
-
IT/Tech
IT Support, Systems Engineer
Senior SRE (Storage Platforms)
Location:
100% Remote
Position Type: 3M C2H
Hourly / Salary: to $90W2-$120W2+ (based on experience)
Job SummaryVaco is currently seeking a Senior SRE (Storage Platform) for a 3M C2H opportunity that is 100% remote. The Senior SRE (Storage Platform) will design, implement, and support Software Defined Storage (SDS) and Kubernetes platforms in a private cloud environment. The Senior SRE (Storage Platform) will focus on scalability, resilience, automation, and performance using IaC and GITOps practices. The Senior SRE (Storage Platform) will be a deeply technical role requiring expert-level understanding of SDS, Kubernetes, and extensive working knowledge on Linux OS.
The Senior SRE (Storage Platform) will collaborate with platform and SRE teams to maintain secure, performant, and multitenant-isolated services that serve high-throughput, mission-critical applications.
- Storage Architecture - Design / Implement / Operate Large-Scale SDS Architectures Across Private / Public Cloud Regions within ITIL Methodology
- Enterprise Storage Platforms - Deploy / Support Enterprise Storage (Pure Storage / HPE / Net App) / SDS Solutions (Ceph / Longhorn)
- Self-Service Enablement - Build Self-Service Storage Workflows for Kubernetes CSI / Open Stack Consumers (VM / Bare Metal)
- IaC Development - Ansible / Terraform / Helm / GIT with Python / Bash Automation
- CI/CD for Infrastructure - Implement CI/CD Pipelines Supporting Infrastructure Updates / Patching / Upgrades / Testing / Rollback
- Observability / Auto-Remediation - Build Monitoring / Alerting / Auto-Remediation using GITOps / Tools (Prometheus / Loki / Grafana)
- HA / DR - Architect / Maintain High Availability / Disaster Recovery / Scale-Out Storage Infrastructure
- Design Documentation - Develop / Review High-/Low-Level Design Documents for Storage Infrastructure
- Advanced Troubleshooting (deep expertise)
-Troubleshooting across Storage / Kubernetes / Hypervisors / Networking / Linux Systems - Operations / Incident Response - Participate in On-Call Rotations / Incident Response / Root Cause Analysis
- Global Collaboration - Collaborate Globally on Change Management / Documentation / Operational Best Practices
- Senior SRE (Storage Platforms) (hands-on) - Senior / Lead Engineer Partnering with Architecture / Product Development Teams to Design / Build / Own Internal Developer Platforms / Reliability Frameworks / Shared Services / CI/CD Tooling | Drive Secure-by-Design / Dev Sec Ops Principles Across SDLC | Actively Contribute to Production-Quality Coding
- Reliability Engineering Enablement - Embed Reliability / Performance / Security Guardrails into Developer Workflow by Creating Reusable Libraries / SDKs / Templates / Static Analysis Rules / Git Hub Copilot Context Files | Shift-Left Testing / Observability / Security Controls into Pull Requests / CI Pipelines
- SLI / SLO Implementation - Define / Instrument / Enforce SLIs / SLOs via Application-Level Telemetry (Metrics / Logs / Traces) | Build Instrumentation Code / Dashboards / Alerting Policies | Integration with Observability Platforms to Enable Error Budget Tracking
- Troubleshooting / Production Debugging - Perform Deep Code-Level Debugging in Distributed / Cloud-Native Environments | Analyze Logs / Stack Traces / Thread Dumps / Performance Profiles | Resolve Complex Reliability / Concurrency / Memory / Latency Issues
- Incident Leadership / Root Cause Analysis - Lead Blameless Post-Mortems | Own Corrective / Preventive Actions at the Code / Architectural Levels | Implement Long-Term Fixes via Refactoring / Resilience Patterns / Automated Safeguards
- Automation / Developer Productivity - Design / Develop Production-Grade Automation (Microservices / CLI Tools / CI/CD Pipelines / IaC Modules) | Eliminate Manual Toil / Improve Deployment Safety / Increase System Resilience With Minimal Human Intervention
- On-Call Model / Rotation - Participate in a Lightweight / Developer-Centric On-Call Rotation focused Exclusively on Internally Owned Platforms / Frameworks | Triage Tooling Issues / Reliability Gaps without Traditional Customer-Facing Pager Responsibilities
- Engineering Standards / Architecture Governance - Establish Coding Standards / Secure Coding Guidelines / Architectural Patterns (Resiliency / Scalability / Observability) | Define Best Practices for Testing / Versioning / Dependency Management
- Mentorship / Technical Leadership - Mentor Junior / Mid-Level Engineers | Conduct Deep Technical Code Reviews (Performance / Security / Maintainability) | Drive Continuous Improvement in Engineering Quality / Reliability Practices
- Certifications - CNCF Certified Kubernetes Administrator (CKA) / Certified Kubernetes Security Specialist (CKS) / Red Hat Ceph Storage Administrator (EX125) / ITIL Foundation / Advanced Certifications Supporting ITSM Best Practices, etc.
- Open Stack Storage Administration / Open Stack Cinder Multi-Backend Administration
- Backup Platforms / Enterprise Backup…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).