DevOps Engineer - TSSCI w/Poly
Listed on 2026-01-07
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability
Job Description Key Responsibilities
- Define standards for monitoring the reliability, availability, maintainability and performance of sponsor-owned and operated systems.
- Design and architect operational solutions for managing applications and infrastructure.
- Drive service acceptance by adopting new processes into operations and developing new monitoring for exposure of risks and automating against repeatable actions.
- Partner with service and product owners to establish key performance indicators to identify trends and achieve better outcomes.
- Provide deep troubleshooting for production issues.
- Engage with service owners to maximize a team’s ability to identify and remediate root cause performance issues quickly ensuring rapid service interruption recovery.
- Build and/or use tools to correlate disparate data sets in an efficient and automated way to help teams quickly identify the root-cause to issues and to understand how different problems relate to each other.
- Coordinate with the sponsor to support major incidents, large-scale deployments and Sec Ops user support.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances.
If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy:
- Working knowledge of K8s, Docker, Helm and automated deployment via pipeline (e.g. Concourse or Jenkins)
- Familiarity with distributed control systems such as Git
- Experience with AWS cloud services
- Experience with setting up monitoring and observability solutions across sponsor owned systems, tools and data feeds
- Proficient in scripting with Python and Java
- Willingness to work onsite full time
- Ability and willingness to share on-call responsibilities
- Advanced knowledge of Unix/Linux systems, with high comfort level at the command line
- Experience with other cloud services providers beyond AWS
- Experience with Cloud Watch or other monitoring tools inside of AWS
- Familiarity with Prometheus/Grafana or other monitoring tools for ETL feeds, APIs, servers, C2S servies, networks and AI/ML capabilities
- Good understanding of networking fundamentals
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).