ISG Plant Management RPE DevOps
Listed on 2026-02-27
-
IT/Tech
Systems Administrator, Cloud Computing, Database Administrator, SRE/Site Reliability
Overview of the Role:
• This role consists of a combination of Support and Dev Ops for a range of applications that sit within the firm’s Plant Management (PLM) Department – which is part of the Reliability Production and Engineering (RPE) Organization.
• The candidate will work with the rest of the team to support and engineer the different applications which the department has oversight of.
• The team is based in a number of regions around the globe to provide twenty-four support of all the applications.
• These applications include the firm’s system for running ready for business checks that is used by most departments within technology, a high-volume alert processing platform, a data science platform that facilitates analytical research, a multi-tenant Kubernetes platform and a suite of batch management applications.
The goal of the department’s work is to:
• maintain the stability of applications
• allow applications to scale
• help resolve issues encountered by the userbase
• make improvements to the applications
• resolve hygiene items and add new features to the systems
• reduce toil and eliminate manual process
• management of the release process
Required Skill Set:
• This role requires 2 – 5 years of experience with Dev Ops skillset, with good Linux and Python knowledge and problem-solving skills.
Other technical skills that we look for include knowledge of relational databases, SQL, Kubernetes, docker/container technologies, Jenkins, Git, shell scripting, Kafka, Zookeeper, datascience skills (such as Pandas and Num Py) and knowledge of observability tools such as Prometheus and Grafana.
Daily Tasks and Project Work:
• Provide front-line support for the core applications
• Answer user enquiries and suggest solutions to reported problems
• Troubleshoot incidents and provide problem management
• Log file investigation and host analysis to identify performance and instability issues
• Identification of toil items
• Creation of automated solutions
• Scripting and programming to add new features, improve systems and resolve bugs
• Building of network shares, load balancers, machines and other plant components to facilitate new features and applications
• Resolve hygiene items and keep the systems up to date
• Manage releases
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: