Principal DevOps Engineer
Listed on 2026-02-28
-
IT/Tech
AI Engineer, Systems Engineer, Cloud Computing
Location: Plano, TX (Onsite with 4 days a week in office)
Employment Type: Full-time
Total Compensation Range
: $150,000 to $180,000
Applicants must be currently authorized to work in the United States on a full‑time basis and must not require sponsorship now or in the future.
About Service LinkService Link is modernizing the mortgage services industry through AI‑accelerated engineering, intelligent automation, and next‑generation software delivery practices. We empower the nation’s top lenders and financial institutions with advanced technology, data‑driven insights, and high‑velocity product development models.
We’re not just evolving legacy workflows - we’re redefining how software is designed, built, tested, and deployed. Generative AI, autonomous systems, and continuous delivery are core to how we operate. Innovation isn’t optional here - it’s the expectation.
If you’re passionate about transforming engineering organizations and operationalizing AI‑driven development models at enterprise scale, you’ll thrive at Service Link.
About the RoleWe’re hiring a Principal Dev Ops Engineer to lead our cloud platform, CI/CD, security, and reliability for both traditional services and agentic AI platform
. You’ll own tooling and automation that help product teams ship faster and safer standardizing Git Hub/Azure Dev Ops, Azure pipelines, infrastructure as code, runtime security,
LLMOps
, and agentic AI deployment/observability.
- Operationalize Agentic AI using industry leading Agentic AI Frameworks to deploy/operate single- and multi-agent systems with guardrails, state, memory, tool use (MCP), and workflow orchestration
, including long‑running agents hosted on Azure (e.g., App Service/Functions/Container Apps/AKS) following async patterns for durability and scale, - Partner with AI engineers to select and integrate the right orchestration SDKs
- Agent Framework
, Semantic Kernel (agent orchestration patterns), and Auto Gen (asynchronous, event‑driven multi‑agent), and guide teams on when to use which for production vs. experimentation. - Integration of regression tests (groundedness, relevance, safety), and wire evaluations to CI (Git Hub Actions) so model/prompt changes must pass quality gates before release.
- Observability for agents
: implement tracing/logging, metrics, and incident automation for multi‑agent workflows - Operational Excellence & Reliability - Define and govern KPIs for system reliability, deployment performance, cost efficiency, and platform stability. Drive continuous improvement using DORA‑aligned metrics and AI specific indicators (e.g., evaluation quality, agent reliability).
- AI & Agentic Workload Quality - Introduce measurable standards for AI readiness and safety ensuring prompt flows, agents, and retrieval systems meet defined quality gates before release.
- Governance & Security Metrics - Implement executive dashboards for policy compliance, environment governance, and security posture across pipelines, IaC, and AI systems.
- Developer Experience & Platform Efficiency - Track and improve developer onboarding, reuse of platform components, and reduction in operational toil via automation and self‑service.
- Design, build, and govern reusable CI/CD via Git Hub Actions (and/or Azure Dev Ops as needed): multi‑stage builds, environment promotions, approvals, matrix testing, dependency caching, and rollout strategies.
- Artifact security
: integrate image/artifact signing and Azure Container Registry.
- Embed Dev Sec Ops controls:
Cred scans, Fortify, dependency scanning and secret scanning, policy checks in PRs, and break‑glass governance for production environments - Enforce responsible AI safeguards for generative/agentic workloads by configuring Azure OpenAI
content filters and safety system messages
, version these policies and test in CI using evaluation datasets - Recommend new Dev Sec Ops tools to the platform engineering team.
- 7+ years in enterprise devops and cloud engineering
- Proven ownership of cloud platform engineering on Azure at scale.
- Deep hands‑on with CI/CD pipelines.
- Knowledge of IaC with Terraform or Bicep
;…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).