Site Reliability Engineer SRE
Listed on 2026-01-27
-
IT/Tech
Cloud Computing, Systems Engineer, IT Support, Cybersecurity
Job Category: Engineer
Job Type: Onsite
Job Location: District of Columbia Washington
Compensation: Depends on Experience
W2: W2-Contract Only;
Kindly note that applications on a C2C basis will not be considered for this role.
C2C: Contract - W2
Job DescriptionRandstad is seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington, DC
. In this position, you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be responsible for ensuring the scalability, performance, and high availability of cloud-based services across AWS and Azure environments. By leveraging Infrastructure-as-Code, advanced observability with Dynatrace, and SRE principles like error budgets and SLOs, you will drive operational excellence and lead incident response efforts for mission-critical applications.
- Deployment & Automation:
Architect and manage CI/CD pipelines (Git Hub Actions, AWS Code Pipeline) and automate global infrastructure using Terraform, Cloud Formation, or CDK. - Performance & Capacity:
Drive cost-optimization initiatives, manage auto-scaling thresholds, and execute resiliency/performance testing to ensure system durability. - Incident Management:
Act as a primary on-call responder using ITIL frameworks and Service Now; develop Root Cause Analysis (RCA) documentation and maintain knowledge bases. - Observability & Monitoring:
Implement distributed tracing and optimize monitoring via Dynatrace and Kibana to create advanced dashboards and anomaly detection. - Reliability Engineering:
Define and monitor SLIs and SLOs while managing error budgets to balance feature velocity with system stability. - Security & Compliance:
Oversee service accounts, manage digital certificates, and execute rapid remediation for security incidents.
- Education:
Bachelor’s degree in Computer Science, Engineering, or a related technical field. - Experience:
2 to 4 years of professional experience in SRE, Dev Ops, or Infrastructure roles. - Cloud Proficiency:
Practical, hands‑on experience with both AWS and Azure platforms. - Technical
Skills:
Mid-level proficiency in Python (or similar scripting languages) and configuration management tools like Ansible. - Containerization:
Solid understanding of Docker and orchestration via Kubernetes or ECS. - Infrastructure Fundamentals:
Strong knowledge of Linux systems, networking protocols, and both Relational/No
SQL database architectures. - Soft Skills:
Excellent written and verbal communication skills with the ability to manage competing priorities independently. - Flexibility:
Ability to participate in a production on-call rotation, including work outside standard business hours.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).