Senior Site Reliability Engineer Job Dallas area,Texas USA,IT/Tech

* Own SLOs/SLIs for availability (99.9%), latency, error rate, and quality of service across microservices.
* Design/operate end‑to‑end observability: metrics, logs, traces, synthetic checks, real‑user monitoring (RUM).
* Instrument services (Windows services, APIs, background jobs) with structured logs and trace context.
* Build health probes and SLA monitors for critical transactions and cross-service dependencies.
* Monitor system issues using various metrics, such as uptime, latency, error rate, throughput, and availability
* Deploy and maintain monitoring and on-call tools i.e.:
Splunk on-call, Prometheus, Datadog, etc.
* Lead incident response (triage, comms, coordination, real-time mitigation) and conduct blameless postmortems with actionable follow-ups.
* Maintain and continuously improve runbooks, escalation paths, on call rotations, and paging policies.
* Implement MTTA/MTTR reduction programs.
* Stand up war room protocols and ensure stakeholder updates during incidents.
* Forecast compute, storage, network needs, track headroom against growth and peak patterns.
* Conduct performance profiling and bottleneck analyses (CPU, memory, I/O, thread pools, connection pools).
* Optimize resource allocation on VMware (DRS, affinity rules, reservations) and Windows VM tuning (kernel, TCP stack, NICs).
* Validate scaling strategies (horizontal vs. vertical) and implement auto-scaling where supported.
* Standardize gold images, configuration baselines, and desired state for Windows Server (Power Shell DSC or equivalent).
* Manage patching (OS, middleware, runtime) with maintenance windows aligned to error budgets.
* Ensure backup, snapshot, and restore strategies meet RPO/RTO; regularly test restores.
* Maintain secure baselines (CIS benchmarks for Windows/VMware), vulnerability management, and patch cadence.
* Support compliance audits (PCI-CP, PCI-DSS, SOC 2/ISO 27001), produce evidence (configs, logs, access reviews), and remediate gaps.
* Automate provisioning (VM templates, DSC/Ansible for Windows, Terraform for VMware) and configuration drift detection/correction.
* Build runbooks to reduce toil (deploy, scale, rollback, etc)
* Create reliability guardrails (pre‑flight checks, change freeze rules, policy controls) as code.
* Continuously refactor scripts/runbooks into idempotent automation.
* Collaborate with development teams and other stakeholders to identify potential risks, such as security vulnerabilities, performance bottlenecks, deployment issues, or configuration errors
* implement various risk mitigation strategies, such as patching, backup, redundancy, encryption, or testing
* Collaborate with product teams and other teams to understand the user needs, expectations, and satisfaction.
* Coach engineers on SRE principles, incident handling, and reliability centric design.
* Lead knowledge sharing, runbooks quality, and postmortem culture (blameless, action-oriented).
* Provide after-hours support for production issues on a rotational basis with other team members to ensure system availability 24/7/365.
* Bachelor’s degree in computer science, Software Engineering, or equivalent combination of education and experience
* 5+ years of related experience as a Software Engineer, Dev Ops Engineer, Site Reliability Engineer or a role in similar capacity
* Extensive experience working with enterprise level micro-services applications, including deployment and maintenance of the applications in distributed environments.
* Demonstrated hands-on experience and expertise with Dev Ops tooling (Ansible, Terraform, Jenkins, Octopus deploy, etc.) networks, network security, high-level managerial skills
* In-Depth hands-on experience with on-prem and cloud compute, storage and networking solutions (vmWare, Net App, Azure, AWS, etc)
** Where

You Will Be:

** This role is
** fully in-office**, requiring
** five days a week onsite
** at one of Entrust’s offices in
** Minneapolis, Colorado, or Dallas**, as specified in the job description. Entrust operates with a distributed workforce, and this position is aligned with our in-office product development teams.

At Entrust, we don’t just offer jobs – we offer career journeys. Here is what you can expect when you join our team:

Flexibility:
Life is all about balance. Whether you’re remote, hybrid, or on-site, we offer flexible options that fit your lifestyle.
#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language