×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer; SRE - Data Center & Infrastructure

Job in St. Louis, Saint Louis, St. Louis city, Missouri, 63105, USA
Listing for: Exegy
Full Time position
Listed on 2026-01-18
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Job Description & How to Apply Below
Position: Site Reliability Engineer (SRE) - Data Center & Infrastructure
Location: St. Louis

About Exegy

Exegy is a global leader in intelligent market data, advanced trading systems, and future-proof technology. Exegy serves as a trusted partner to the complete ecosystem of the buy-side, sell-side, exchanges, and financial services technology firms around the globe. Headquartered in St. Louis with regional offices in North America, the UK/Europe and Asia Pacific, Exegy has the global footprint to deliver world-class support and managed services to its customer base of elite financial market participants.

Job Summary

Exegy is seeking a highly motivated and detail-oriented Site Reliability Engineer (SRE) to support and enhance the reliability, scalability, and performance of our global data center and hybrid infrastructure environments. This role blends software engineering, systems engineering, automation, and operational rigor to ensure high‑availability services powering Exegy's mission‑critical market data products and internal platforms.

As an SRE, you will own and improve operational processes, expand automation, strengthen observability, support capacity planning, and design systems that gracefully handle failure with minimal business impact. You will collaborate across Infrastructure, Network Engineering, Security, and Dev Ops teams to deliver resilient, secure, and scalable platforms.

Responsibilities Infrastructure Reliability & Operations
  • Maintain and improve uptime across core systems including compute, storage, virtualization, load balancers, and data center network infrastructure
  • Support production services across on‑prem data centers, co‑locations, and hybrid cloud environments
  • Participate in 24×7 on‑call rotation, major incident response, and post‑mortems
  • Lead root cause analysis (RCA) and drive long‑term remediation plans
  • Identify system failure patterns and implement hardening strategies
Automation & Infrastructure‑as‑Code
  • Develop and maintain automation using Ansible, Terraform, Power Shell, Python, Puppet, or similar tools
  • Automate operational workflows, configuration management, deployments, and fail‑over testing
  • Implement and improve Infrastructure‑as‑Code (IaC) for consistency and reduced drift
Monitoring, Observability & Performance
  • Build and enhance monitoring across systems, networks, and applications (Prometheus, Grafana, Datadog, New Relic, Solar Winds, Splunk, etc.)
  • Improve alert fidelity, create health dashboards, and expand log aggregation
  • Conduct proactive performance tuning across hardware, virtualization, and OS layers (Windows/Linux)
Data Center & Systems Engineering
  • Support physical and virtual data center infrastructure including racking/stacking, cabling, hardware lifecycle, and capacity planning
  • Own patching, firmware upgrades, refresh cycles, and vendor coordination
  • Support DR/BCP testing, multi‑site fail‑over architecture, and replication strategies
  • Maintain secure baseline configurations aligned to CIS Benchmarks, NIST, and ISO standards
Collaboration & Architecture
  • Partner closely with Network, Security, Dev Ops, and Application Engineering teams to improve reliability end‑to‑end
  • Influence architecture decisions regarding capacity, resiliency, and scalability
  • Create and maintain runbooks, playbooks, standards, and operational documentation
Security & Compliance Integration
  • Implement and maintain security controls including MFA, encryption, logging, PAM, and patch compliance
  • Support audit requirements for SOC 2, ISO 27001, CIS Controls, and internal governance obligations
  • Participate in vulnerability remediation efforts and system hardening
Our Ideal Candidate Has
  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience
  • 5+ years in Site Reliability Engineering, Systems Engineering, or Infrastructure Operations
  • Hands‑on experience with VMware, Hyper‑V, or similar virtualization technologies
  • Strong Linux and Windows server administration background
  • Experience with on‑prem data centers, hardware lifecycle, and networking
  • Proficiency in automation and scripting (Power Shell, Bash, Python, Ansible, Terraform)
  • Experience with monitoring, logging, and observability platforms
  • Familiarity with AWS, Azure, or GCP in hybrid environments
  • Ability to participate in on‑call rotation and support critical incidents
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary