Site Reliability Engineer
Listed on 2026-01-12
-
IT/Tech
Systems Engineer, Cloud Computing
Join to apply for the Site Reliability Engineer role at Skyhigh Security
.
Skyhigh Security is a dynamic, fast‑paced cloud company that is a leader in the security industry. Our mission is to protect the world’s data, and because of this, we live and breathe security. We value learning at our core, underpinned by openness and transparency.
Role OverviewThe Site Reliability Engineer at Skyhigh Security will be responsible for monitoring, maintaining, and troubleshooting operational issues of a high‑availability production environment. The SRE will act as a bridge between Operations, Engineering, and Product Management teams, representing the customer point of view to continue driving enhancements to our products and uptime. SREs are responsible for managing and improving the operational aspects of systems, such as monitoring, alerting, incident response, and vendor interactions.
Responsibilities- Perform Incident Management and Change Management to maintain continuous availability of all Cloud Infrastructure services.
- Ensure all SRE and operating procedures are maintained and executed.
- Maintain a 24x7 production environment with a high level of service availability and perform quality reviews, managing operational issues.
- Perform root cause analysis for major incidents and drive the process by involving required stakeholders.
- Perform problem management by analyzing metrics, alarms, and dashboards to troubleshoot problem areas, report issues, and assist in performance tuning and fault finding.
- Implement proactive monitoring, alerting, trend analysis, and self‑healing solutions.
- Explore and innovate new technologies, features, and tools to improve the platform and automate operational tasks using Bash, Python, or any other programming language.
- Manage and maintain Runbooks and Standard Operating Procedures.
- Manage, coordinate, and document all types of maintenance activities and outages.
- Perform patching and upgrades for vulnerability management.
- Work closely with the teams to initiate the development of new ideas into internal tools.
- Understand the existing architecture and work with various Engineering teams to develop and execute strategies to provide a high‑quality production service.
- Capable of working a flexible work schedule in a 24‑x‑7 environment with rotational shifts.
- Bachelor’s degree in computer science, electrical engineering, or a related area, with 7+ years of SRE experience in a large enterprise organization.
- System admin experience on Linux environments.
- Experience with end‑to‑end monitoring setup for infra and applications.
- Experience with Prometheus, Grafana, ELK, Open search, Cloudwatch, Pager Duty, and other monitoring tools.
- Solid experience with Cloud Technologies such as AWS and OCI.
- Good experience with containerized workload tools like Kubernetes.
- Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is required.
- Experience with BGP, NAT, TCP/IP, iBGP, proxies, cross‑connects.
- Experience with L2/L3 switching, knowledge of Juniper and Cisco routing devices.
- Experience understanding and managing web servers (Apache, Tomcat, Nginx).
- Ability to script/program with one or more high‑level languages, such as Python, Go, etc.
- Experience with configuration management tools like Salt, Puppet, Ansible, or similar.
- Experience with source control tools such as Git Hub and SVN.
- Experience with deployment tools Jenkins, Harness, etc.
- Experience with SQL and No
SQL databases like Redis, Crate, Elasticsearch. - Experience in performing and writing Root Cause Analysis documents.
- Strong communication and analytical/problem‑solving skills.
- Systematic approach to driving problems to resolution.
- Good to have experience/knowledge of GCP, Azure.
- Experience in Security domain will be added advantage.
- Experience with open‑source technologies like Kafka, Hadoop, HBase, Zookeeper, Oozie will be an added advantage.
- Retirement Plans
- Medical, Dental and Vision Coverage
- Paid Time Off
- Paid Parental Leave
- Support for Community Involvement
We’re serious about our commitment to a workplace where everyone can thrive and contribute to our industry‑leading products and customer support. We prohibit discrimination and harassment based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation, or any other legally protected status.
Seniority LevelMid‑Senior level
Employment TypeFull‑time
Job FunctionEngineering and Information Technology
IndustriesSoftware Development
Referrals increase your chances of interviewing at Skyhigh Security by 2x.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).