More jobs:
Site Reliability Engineer
Job in
Washington, District of Columbia, 20022, USA
Listed on 2026-01-27
Listing for:
JSM Consulting Inc.
Full Time
position Listed on 2026-01-27
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, Cybersecurity
Job Description & How to Apply Below
Seeking a Site Reliability Engineer for a high-impact role with a premier client based in Washington, DC
. In this position, you will bridge the gap between development and operations by applying a software engineering mindset to system administration and infrastructure. You will be responsible for ensuring the scalability, performance, and high availability of cloud-based services across AWS and Azure environments. By leveraging Infrastructure-as-Code, advanced observability with Dynatrace, and SRE principles like error budgets and SLOs, you will drive operational excellence and lead incident response efforts for mission-critical applications.
- Deployment & Automation:
Architect and manage CI/CD pipelines (Git Hub Actions, AWS Code Pipeline) and automate global infrastructure using Terraform, Cloud Formation, or CDK. - Performance & Capacity:
Drive cost-optimization initiatives, manage auto-scaling thresholds, and execute resiliency/performance testing to ensure system durability. - Incident Management:
Act as a primary on-call responder using ITIL frameworks and Service Now; develop Root Cause Analysis (RCA) documentation and maintain knowledge bases. - Observability & Monitoring:
Implement distributed tracing and optimize monitoring via Dynatrace and Kibana to create advanced dashboards and anomaly detection. - Reliability Engineering:
Define and monitor SLIs and SLOs while managing error budgets to balance feature velocity with system stability. - Security & Compliance:
Oversee service accounts, manage digital certificates, and execute rapid remediation for security incidents.
- Education:
Bachelor's degree in Computer Science, Engineering, or a related technical field. - Experience:
2 to 4 years of professional experience in SRE, Dev Ops, or Infrastructure roles. - Cloud Proficiency:
Practical, hands-on experience with both AWS and Azure platforms. - Technical
Skills:
Mid-level proficiency in Python (or similar scripting languages) and configuration management tools like Ansible. - Containerization:
Solid understanding of Docker and orchestration via Kubernetes or ECS. - Infrastructure Fundamentals:
Strong knowledge of Linux systems, networking protocols, and both Relational/No
SQL database architectures. - Soft Skills:
Excellent written and verbal communication skills with the ability to manage competing priorities independently. - Flexibility:
Ability to participate in a production on-call rotation, including work outside standard business hours.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×