Senior SRE; Storage Platforms Job Addison area,Texas USA,IT/Tech

Position: Senior SRE (Storage Platforms)

Senior SRE (Storage Platforms)

Location:

100% Remote

Position Type: 3M C2H

Hourly / Salary: to $90W2-$120W2+ (based on experience)

Job Summary

Vaco is currently seeking a Senior SRE (Storage Platform) for a 3M C2H opportunity that is 100% remote. The Senior SRE (Storage Platform) will design, implement, and support Software Defined Storage (SDS) and Kubernetes platforms in a private cloud environment. The Senior SRE (Storage Platform) will focus on scalability, resilience, automation, and performance using IaC and GITOps practices. The Senior SRE (Storage Platform) will be a deeply technical role requiring expert-level understanding of SDS, Kubernetes, and extensive working knowledge on Linux OS.

The Senior SRE (Storage Platform) will collaborate with platform and SRE teams to maintain secure, performant, and multitenant-isolated services that serve high-throughput, mission-critical applications.

Responsibilities

Storage Architecture - Design / Implement / Operate Large-Scale SDS Architectures Across Private / Public Cloud Regions within ITIL Methodology
Enterprise Storage Platforms - Deploy / Support Enterprise Storage (Pure Storage / HPE / Net App) / SDS Solutions (Ceph / Longhorn)
Self-Service Enablement - Build Self-Service Storage Workflows for Kubernetes CSI / Open Stack Consumers (VM / Bare Metal)
IaC Development - Ansible / Terraform / Helm / GIT with Python / Bash Automation
CI/CD for Infrastructure - Implement CI/CD Pipelines Supporting Infrastructure Updates / Patching / Upgrades / Testing / Rollback
Observability / Auto-Remediation - Build Monitoring / Alerting / Auto-Remediation using GITOps / Tools (Prometheus / Loki / Grafana)
HA / DR - Architect / Maintain High Availability / Disaster Recovery / Scale-Out Storage Infrastructure
Design Documentation - Develop / Review High-/Low-Level Design Documents for Storage Infrastructure
Advanced Troubleshooting (deep expertise)
-Troubleshooting across Storage / Kubernetes / Hypervisors / Networking / Linux Systems
Operations / Incident Response - Participate in On-Call Rotations / Incident Response / Root Cause Analysis
Global Collaboration - Collaborate Globally on Change Management / Documentation / Operational Best Practices

Job Requirements

Senior SRE (Storage Platforms) (hands-on) - Senior / Lead Engineer Partnering with Architecture / Product Development Teams to Design / Build / Own Internal Developer Platforms / Reliability Frameworks / Shared Services / CI/CD Tooling | Drive Secure-by-Design / Dev Sec Ops Principles Across SDLC | Actively Contribute to Production-Quality Coding
Reliability Engineering Enablement - Embed Reliability / Performance / Security Guardrails into Developer Workflow by Creating Reusable Libraries / SDKs / Templates / Static Analysis Rules / Git Hub Copilot Context Files | Shift-Left Testing / Observability / Security Controls into Pull Requests / CI Pipelines
SLI / SLO Implementation - Define / Instrument / Enforce SLIs / SLOs via Application-Level Telemetry (Metrics / Logs / Traces) | Build Instrumentation Code / Dashboards / Alerting Policies | Integration with Observability Platforms to Enable Error Budget Tracking
Troubleshooting / Production Debugging - Perform Deep Code-Level Debugging in Distributed / Cloud-Native Environments | Analyze Logs / Stack Traces / Thread Dumps / Performance Profiles | Resolve Complex Reliability / Concurrency / Memory / Latency Issues
Incident Leadership / Root Cause Analysis - Lead Blameless Post-Mortems | Own Corrective / Preventive Actions at the Code / Architectural Levels | Implement Long-Term Fixes via Refactoring / Resilience Patterns / Automated Safeguards
Automation / Developer Productivity - Design / Develop Production-Grade Automation (Microservices / CLI Tools / CI/CD Pipelines / IaC Modules) | Eliminate Manual Toil / Improve Deployment Safety / Increase System Resilience With Minimal Human Intervention
On-Call Model / Rotation - Participate in a Lightweight / Developer-Centric On-Call Rotation focused Exclusively on Internally Owned Platforms / Frameworks | Triage Tooling Issues / Reliability Gaps without Traditional Customer-Facing Pager Responsibilities
Engineering Standards / Architecture Governance - Establish Coding Standards / Secure Coding Guidelines / Architectural Patterns (Resiliency / Scalability / Observability) | Define Best Practices for Testing / Versioning / Dependency Management
Mentorship / Technical Leadership - Mentor Junior / Mid-Level Engineers | Conduct Deep Technical Code Reviews (Performance / Security / Maintainability) | Drive Continuous Improvement in Engineering Quality / Reliability Practices

Preferred (not required)

Certifications - CNCF Certified Kubernetes Administrator (CKA) / Certified Kubernetes Security Specialist (CKS) / Red Hat Ceph Storage Administrator (EX125) / ITIL Foundation / Advanced Certifications Supporting ITSM Best Practices, etc.
Open Stack Storage Administration / Open Stack Cinder Multi-Backend Administration
Backup Platforms / Enterprise Backup…


Increase/decrease your Search Radius (miles)



Job Posting Language