×
Register Here to Apply for Jobs or Post Jobs. X

IT Infrastructure Support Site Reliability Engineer II

Job in Atlanta, Fulton County, Georgia, 30383, USA
Listing for: Astreya Inc.
Full Time position
Listed on 2026-03-04
Job specializations:
  • IT/Tech
    Systems Engineer, IT Support
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
IT Infrastructure Support Site Reliability Engineer II page is loaded## IT Infrastructure Support Site Reliability Engineer II locations:
Atlanta, GAtime type:
Full time posted on:
Posted Todayjob requisition :
R0014666
** About the Job
** We are seeking an experienced Site Reliability Engineer to join our IT Infrastructure Support team,  responsible for ensuring the reliability, scalability, and performance of critical physical security  infrastructure and supporting systems. In this role, you will combine software engineering expertise with  operations knowledge to build and maintain automation tools, monitoring systems, and processes that  support enterprise-grade server, network, and security device management.

You will work closely with  cross-functional teams to define and enforce service level objectives, reduce operational toil through  automation, and drive continuous improvement in system resilience. This position requires 24x5  availability with on-call rotation to ensure uninterrupted support for mission-critical infrastructure.
** Key Responsibilities
** Partner with leadership to establish, monitor, and enforce Service Level Indicators (SLIs) and  Service Level Objectives (SLOs) for infrastructure tooling, including configuration compliance  rates, patch success rates, and deployment latency metrics.
Provide Level 3 expertise for tooling-specific incidents, focusing on automating incident  remediation workflows and reducing Mean Time To Repair (MTTR) through intelligent  automation and runbook development.
Identify and automate repetitive manual tasks across managed infrastructure, targeting measurable  reductions in operational overhead (e.g., 50% reduction in manual server build time) through  scripting and workflow automation.
Conduct thorough root cause analysis and lead blameless postmortems for all major service-  impacting incidents, driving systemic improvements in tooling reliability and infrastructure  resilience.
Engineer and maintain automated processes and scripts to populate, update, and synchronize asset  management platforms (e.g., Net Box), configuration management databases, and monitoring  systems for internal and external stakeholders.
Design, develop, and deploy full-stack applications, custom plugins, and automation scripts to  extend functionality of management and monitoring systems, enabling direct device interaction for  configuration management.
Develop and maintain fully automated Infrastructure-as-Code configurations for Windows and  Linux server roles using tools such as Ansible, Terraform, or Puppet, including drift detection and  auto-remediation capabilities.
Build end-to-end automation pipelines for vulnerability patching, security baseline enforcement  (CIS benchmarks), and continuous compliance auditing against internal and regulatory standards  for physical security devices.
Develop API-driven tools for network configuration management, automated firmware updates,  pre/post-change validation, and real-time network health monitoring across the device fleet.
Deploy and standardize monitoring agents, centralized log collection systems, and custom  dashboards with alerts based on critical SLIs (latency, error rate, saturation, traffic) for servers and  edge devices.
Build automation scripts for intelligent ticket handling, problem validation, and escalation  workflows within enterprise ticketing systems, ensuring 2-hour initial response SLAs are  consistently met.
Participate in 24x5 on-call rotation to provide timely support for infrastructure systems, security  devices, and related tooling, ensuring service continuity and rapid incident response.
** Required Skills
** 6+ years of experience in Site Reliability Engineering, Dev Ops, or Infrastructure Engineering   Strong proficiency in Python, Bash, and Power Shell for automation scripting, with experience in  Go for building high-performance backend services and APIs. Hands-on experience with Infrastructure-as-Code tools (Terraform, Ansible, Chef, or Puppet) and  configuration management practices, including drift detection, version control, and automated  remediation.
Advanced knowledge of Linux and Windows…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary