Lead Software Engineer – Decision Integrity & Observability
Listed on 2026-01-30
-
IT/Tech
Cybersecurity
Overview
Become a part of our caring community and help us put health first
We are developing an enterprise-scale Next Best Action (NBA) platform that delivers real-time, compliant, and explainable decisions across both digital and assisted channels. As we expand, our focus extends beyond speed to encompass decision integrity: ensuring deterministic outcomes, verifiable explainability, and audit-ready replay.
The Lead Engineer — Decision Integrity & Observability is responsible for the core runtime infrastructure that guarantees every decision is traceable, explainable, and defensible. This role will embed integrity and observability throughout the decision kernel, rules arbitration, Action Library integration, and state/outbox execution boundaries, empowering teams to innovate with confidence. This is a hands-on individual contributor role with organization-wide impact.
Key Responsibilities Decision Integrity (NBA Core)- Define and enforce determinism guarantees for the decision kernel (Context → Rules → Scores → Reconciliation → Response).
- Govern the use of artifacts such as rule packs, reason-code taxonomies, model versions, and objective configurations to ensure decisions are fully explained (including plain-English reasons, rule/policy references, and model stamps).
- Enforce contact policy controls (consent, quiet hours, frequency caps) and event-driven suppress ions, ensuring all surfaced channels are compliant with policy.
- Design end-to-end traceability, establishing lineage, causality, and cross-service correlation across the Decision Engine, Rules, Action Library, State Machine, Outbox, and channel executors.
- Develop append-only decision ledgers (input summaries, context hashes, rule/model versions, explanations) with hash-verified replay paths for auditability.
- Implement semantic guards to detect and alert on business logic errors (e.g., unauthorized channel offers), extending beyond traditional infrastructure monitoring.
- Integrate Open Telemetry instrumentation; provide operational dashboards displaying key indicators (p50/p95 latencies, rule-denial distributions, model latency, cache hit rates, replay health).
- Develop determinism validation harnesses using golden datasets to ensure consistent outputs for identical inputs and versions (integrated into CI/CD pipelines).
- Ensure contract compliance for NBA interfaces via OpenAPI, schema validation, and consumer-driven contracts across all platform modules.
- Apply property-based validation to rigorously test rules, scoring, and reconciliation under various policy and event scenarios.
- Automate performance and resilience checks, including timeouts, circuit breakers, fallback logic, and chaos testing to maintain p95 SLOs.
- Maintain explainability regression controls, ensuring that changes to rules or models do not degrade the quality of explanations.
- Collaborate with rules and policy teams on reason-code coverage and pre-release simulation/backtesting.
- Work with machine learning teams on model versioning, feature parity (training vs. serving), latency objectives, and safe rollout strategies (shadow/canary deployments).
- Coordinate with state management and outbox teams to ensure exactly-once intent, idempotent consumers, legal state transitions, and invariant validation.
- Support orchestration and integration needs for Node.js/Type Script surfaces as required.
Required Qualifications
- Minimum of 8 years in backend/platform service development; at least 3 years in a senior or lead role overseeing mission-critical systems.
- Proven experience delivering backend/platform systems with a focus on correctness, reliability, and traceability.
- Demonstrated ability to reason about stateful, distributed flows and identify failure modes across interconnected services.
- Hands-on implementation of advanced observability (structured logging, tracing)
- Core Platform & Runtime (Java): Advanced Java backend development, with a focus on deterministic execution and explicit in variants. Familiarity with rules/policy engines or DSLs; JVM-compiled DSL…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).