Site Reliability Engineer
Listed on 2026-01-12
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support
If you are motivated and believe in the credit union philosophy of "People Helping People," join our team!
Position OverviewThe Site Reliability Engineer role provides full spectrum support for all assigned applications. The role ensures stability through continuous monitoring and optimization. The role develops and maintains delivery pipelines with efficiency and risk mitigation strategies. The role also contributes to continual process improvement. This role mentors more junior SREs.
Essential Responsibilities- 40% - Support, monitor, troubleshoot and collaborate on how to optimize systems for reliability and efficient performance on a 24/7 365 days a year model. Partner with development and other teams to improve services through rigorous testing and release procedures. Participate in system design platform management reviews. Primary support of Dev, QA, UAT, Production and DR for a subset of given applications.
Monitors logs, triages alerts, issues, and escalates to appropriate teams. Follows ITSM processes for incident response, change management, and problem investigation. - 20% - Look for ways to automate infrastructure and operations tasks, creating sustainable systems and services. Continuously reviews processes and procedures for improvement opportunities.
- 10% - Ensure systems adhere to relevant security protocols and regulations.
- 20% - Ensure certificates are renewed and maintained within expiration windows.
- 10% - Prepare disaster recovery plans.
- Knowledge of distributed storage technologies like NFS, HDFS, CephFS, and Amazon S3, as well as dynamic resource management frameworks like Apache Mesos, Kubernetes, or Yarn
- Knowledge of system administration, cloud services (e.g., AWS, GCP), and infrastructure automation tools (e.g., Terraform, Ansible).
- Knowledge of networking, security, and database management.
- Proactive approach to triaging and troubleshooting problems, performance bottlenecks, and areas for improvement in a distributed environment.
- Professional experience with support and troubleshooting of VPNs, firewalls, networking, cloud infrastructure concepts, SSL, and mTLS.
- Strong knowledge and experience in troubleshooting platforms, applications, or infrastructure with high availability, disaster recovery, load balancing, and clustering concepts.
- Professional experience working within ITSM framework processes (Change and Incident).
- Experience with maintaining support documentation for platforms and/or supported applications.
- Must be passionate about contributing to an organization focused on continuously improving member experiences.
- 2 years of support, delivery, and/or continuous improvement experience
- Experience with continuous integration and deployment (CI/CD) practices, monitoring, and incident response.
- Working knowledge of Windows and Linux distributed environments
- Previous success in technical engineering
- Coding experience beyond simple scripts
- Professional experience with Jira, Confluence, Lucid Charts/Visio, and BMC Helix
- Bachelor’s degree, associate’s degree or 2-5 years of support, delivery, and/or continuous improvement experience
Physical Requirements
- Sitting for prolonged periods
- Computer for prolonged periods
- Telephone for prolonged periods
SECU provides equal employment opportunity to all qualified persons regardless of race, color, religion, age, sex, sexual orientation, gender identity, national origin, genetic information, disability, veteran status, or other classification protected by law.
DisclaimerState Employees' Credit Union reserves the right to fill this role at a higher/lower level based on business need.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).