Sr. Site Reliability Engineer Job Dallas area,Texas USA,IT/Tech

Analytic Partners is a global leader in commercial measurement and optimization, turning data into expertise for the world’s largest brands for almost 25 years.

Our holistic approach to decisioning is powered by our industry-leading platform and team of experts, who help leaders make better decisions, faster – unlocking business growth and creating powerful customer connections.

With clients in 50+ countries and global offices across New York City, Miami, Dallas, Dublin, London, Paris, Singapore, Shanghai, Munich, Poznan, Sydney, Melbourne, Charlottesville and Denver, we’re growing fast. And we’re looking for top talent to join us in shaping the future of analytics.

To learn more about what we do, visit – and see why we’re recognized as a Leader in the industry by independent research firms Forrester and Gartner.

What You’ll Be Doing

Own the Internal Developer Platform (IDP) as a product, treating engineering teams as customers and optimizing for reliability, usability, and delivery velocity.
Define and execute a platform roadmap aligned with business priorities, developer needs, and long-term scalability.
Design, build, and evolve paved roads for application delivery, including CI/CD pipelines, infrastructure templates, service scaffolding, and standardized deployment patterns.
Build self-service capabilities that enable teams to provision, deploy, observe, and operate services with minimal friction.
Create and maintain reusable platform abstractions across AWS and Azure that standardize security, reliability, networking, and observability.
Reduce developer cognitive load by abstracting unnecessary complexity while enforcing clear guardrails for security, cost, and compliance.
Partner closely with application, product, and security teams to embed reliability, scalability, and security by design.
Establish and evolve platform standards for logging, monitoring, alerting, tracing, and incident response workloads.
Define, measure, and manage SLIs, SLOs, and error budgets for shared platform services.
Drive the reduction of operational toil through automation, standardization, and platform-first solutions.
Ensure shared platform services meet high standards for availability, performance, resilience, and scalability.
Own system-to-system integration and messaging patterns used across the platform.
Lead capacity planning, demand forecasting, and performance tuning for platform services.
Plan and execute zero-downtime upgrades, migrations, and releases of platform components.
Lead platform-level incident response workflows, post-incident reviews, and drive systemic improvements rather than one-off fixes.
Evaluate incoming platform requests and translate them into scalable, productized capabilities.
Mentor engineers and drive platform adoption through documentation, enablement, and technical evangelism.
Participate in a 24x7 on-call rotation as an escalation point for platform reliability and availability issues.
Operate effectively in ambiguous problem spaces, making sound architectural and product decisions with limited guidance.

What We Look For In You:

Bachelor’s degree in Computer Science or equivalent practical experience.
6+ years of experience in Platform Engineering, Site Reliability Engineering, Dev Ops, or Systems Engineering roles.
Strong expertise in Linux and Windows operating systems.
Advanced automation and scripting skills using Python, Bash, and/or Power Shell.
Deep, hands-on experience designing and operating AWS and Azure platforms at scale.
Strong experience building and operating CI/CD platforms (Jenkins, Git Hub Actions or equivalent).
Strong experience with Infrastructure as Code and configuration management (Terraform, Cloud Formation, ARM, or similar).
Production experience with containerized and orchestration platforms such as Docker and Kubernetes.
In-depth experience with the Hashi Corp ecosystem (Nomad, Consul, Vault).
Strong understanding of distributed systems, cloud-native architectures, and reliability patterns.
Experience designing and operating observability platforms (e.g., Splunk, Sumo Logic, or similar).
Familiarity with security and compliance practices, including vulnerability scanning and…


Increase/decrease your Search Radius (miles)



Job Posting Language