Storage Site Reliability Engineer
Listed on 2026-03-07
-
IT/Tech
Systems Engineer, Cloud Computing
Introduction
At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You'll work with diverse technologies and colleagues worldwide to deliver resilient, future‑ready solutions that power innovation.
With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress.
As a Site Reliability Engineering Professional, you will specialize in reliability and resiliency with a mix of knowledge and skills in software and systems. You will be responsible for analyzing business needs, problem determination, advising, designing, building, testing, deploying, and maintaining well‑engineered information systems and ecosystems. Your primary responsibilities will include:
- Analyze Business Needs:
Analyze business requirements to identify areas for improvement and provide recommendations for enhancing system reliability and resiliency. - Problem Determination:
Identify and troubleshoot issues affecting system performance and reliability, applying technical expertise to resolve complex problems. - Design and Build:
Design, build, and test system changes and updates to ensure seamless deployment and minimal disruption to services. - Deploy and Maintain:
Deploy and maintain well‑engineered information systems and ecosystems, ensuring high levels of reliability, scalability, and performance. - Advise and Test:
Provide expert advice on system design and architecture, and test systems to ensure they meet business requirements and reliability standards.
- Software and Systems Knowledge:
Exposure to software and systems engineering principles, with an understanding of how to design, build, and maintain reliable and resilient systems. - Problem Analysis and Resolution:
Experience working with problem determination methodologies, analyzing complex issues, and applying technical expertise to resolve problems affecting system performance and reliability. - System Design and Architecture:
Exposure to system design and architecture principles, with an understanding of how to design and build well‑engineered information systems and ecosystems. - Testing and Deployment:
Experience working with testing and deployment methodologies, ensuring seamless deployment and minimal disruption to services. - Reliability and Resiliency:
Exposure to reliability and resiliency principles, with an understanding of how to analyze business needs and provide recommendations for enhancing system reliability and resiliency.
- Fundamental understanding of Linux/Unix systems is a plus.
- Fundamental knowledge of Red Hat Open Shift and Kubernetes is a plus
- Automation/Scripting:
In-depth experience with Ansible, Python, Terraform, and CI/CD tools is a plus, but a fundamental understanding is a must. - Hands‑on experience crafting alerts and dashboards using Python or any other language.
- Experience with AWS, Azure and GDP
IBM is committed to creating a diverse environment and is proud to be an equal‑opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).