Information Technology - Site Reliability Engineer
Listed on 2026-01-11
-
IT/Tech
Cloud Computing, Systems Engineer
Information Technology - Site Reliability Engineer
Function:
Technology Engineering and Service Operations
Responsible for the provisioning of technology infrastructure and enabling services for the enterprise. Ensures the design, build and run of our technology platforms deliver for both our external and internal customers in an efficient manner while appropriately managing associated risks.
Configures software to automate consumable services for infrastructure and applications using state‑of‑the‑art Dev Ops principles. Empowers development teams through the introduction, development and/or maintenance of efficient tools and processes. Ensures continuous, high‑velocity delivery and automated deployment through the use of software provisioning, configuration management, source‑code management and team collaboration applications.
Key Responsibilities- Work in a Dev Ops environment responsible for building and running large‑scale, massively distributed, fault‑tolerant systems.
- Collaborate closely with development and operations teams to build highly available, cost‑effective systems with extremely high uptime metrics.
- Work with cloud operations team to resolve trouble tickets, develop and run scripts, and troubleshoot incidents.
- Create new tools and scripts designed for auto‑remediation of incidents and establishing end‑to‑end monitoring and alerting on all critical aspects.
- Build infrastructure‑as‑code (IaC) patterns that meet security and engineering standards using technologies such as Terraform, cloud CLI scripting, and cloud SDK programming.
- Participate in a 24/7 incident response team, following the sun operating model for incident and problem management.
- Dev Ops:
Apply engineering skills to improve resilience of products/services, design, code, verify, test, document, and modify programs/scripts. - Systems Thinking:
Leverage best practices and understand technology trends to integrate and maintain high availability. - Operational Excellence:
Prioritize tasks, monitor key metrics, and identify process improvements for smoother, faster operations. - Troubleshooting:
Use methodical approaches to define, investigate, and resolve system and process issues. - Technical Communication:
Explain technical information and impacts to stakeholders, with strong written and verbal skills.
• BS degree in Computer Science or related technical field (or equivalent experience).
• 4–7 years of experience in software engineering, systems administration, database administration, and networking.
• 2+ years of experience developing and/or administering software in public cloud.
• Experience monitoring infrastructure and application uptime/availability to meet functional and performance objectives.
• Proficiency in languages such as Python, Bash, Java, Go, JavaScript and/or Node.js.
• Demonstrable cross‑functional knowledge with systems, storage, networking, security and databases.
• System administration skills, including automation and orchestration of Linux/Windows using Terraform, containers (Docker, Kubernetes, etc.).
• Proficiency with continuous integration and continuous delivery tooling and practices.
• Cloud certification strongly preferred.
EEO Employer Statement
Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law.
Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).