Senior Reliability Engineer; Backend Focus
Listed on 2026-01-12
-
IT/Tech
Systems Engineer, IT Support
Location: New York
Senior Reliability Engineer (Backend Focus)
NYC
About our teamAccrue is a fintech company that powers modern customer loyalty at the payment layer. Centered around its branded stored value wallet solution, Accrue’s platform helps brands take control of their payments and refunds, reward customer loyalty, and bypass traditional payment giants.
Build a deeper, more durable customer base while optimizing your bottom line – all with Accrue.
About the RoleWe're looking for a senior backend engineer who builds reliability through elegant, production-ready code architecture. You'll have significant authority to rearchitect critical systems, replacing homegrown solutions with industry-standard tooling and patterns that handle 10k+ req/sec at scale.
This is primarily a programming role focused on building robust, observable systems through code. You'll spend most of your time architecting and implementing reliability improvements, not managing infrastructure.
What makes this role unique:
- Architectural Authority
:
Drive decisions on adopting technologies like Temporal.io for durable execution vs. maintaining custom retry logic - Production Scale
:
Design systems that handle high-throughput payment and loyalty processing with strict SLA requirements - Code-First Reliability
:
Improve system reliability by writing better application code, not just adding monitoring - Industry Standards Over NIH
:
Replace internal implementations with proven, production-ready solutions
System Architecture & Reliability Engineering
- Rearchitect core reliability patterns
:
Replace custom retry mechanisms with durable execution engines like Temporal.io - Implement robust event processing
:
Migrate direct webhook handling to reliable delivery systems like Hookdeck with proper delivery semantics - Build behavioral monitoring
:
Integrate time-series databases to detect and alert on changing system behavioral patterns - Eliminate technical debt
:
Systematically replace "not invented here" solutions with industry-standard, battle-tested alternatives - Design and implement systems that maintain performance and reliability at 10k+ requests/second
- Write production-grade code for payment processing, wallet operations, and loyalty program mechanics
- Build comprehensive error handling, circuit breakers, and graceful degradation patterns
- Implement distributed system patterns for fault tolerance and observability
- Instrument deep observability into application code using existing frameworks (Datadog)
- Design monitoring that provides actionable insights into system behavior and business metrics
- Build alerting that proactively identifies reliability issues before they impact users
- Lead incident response with focus on permanent architectural fixes rather than band-aid solutions
- Evaluate and recommend new technologies and architectural patterns for production readiness
- Collaborate with product engineering teams to embed reliability patterns into new feature development
- Drive technical decisions around system architecture, scaling, and reliability patterns
- Mentor engineers on production best practices and reliable system design
Required
- 5+ years backend engineering experience building high-throughput, production systems (10k+ req/sec)
- Strong programming skills in modern languages - our stack uses Type Script, but we value polyglot engineers
- Production architecture experience with distributed systems, microservices, and reliability patterns
- Systems thinking
:
Ability to identify when to build vs. buy vs. adopt existing solutions - Cloud-native development with AWS services (ECS, RDS, ELB) and modern deployment patterns
- Technical leadership
:
Experience making architectural decisions and driving technical improvements independently
- Experience with durable execution systems (Temporal.io, Step Functions, etc.)
- Background in fintech, payments, or high-reliability systems
- Knowledge of event-driven architectures and reliable message processing
- Experience with time-series databases and behavioral analytics
- Track record replacing legacy systems with modern, scalable alternatives
- Startup or high-growth experience where you've scaled systems through rapid growth
- Traditional "infrastructure-first" SRE background
- Focus on Kubernetes administration or infrastructure provisioning
- Scripting-heavy operational work
- Basic monitoring setup (connecting Datadog to ECS is table stakes, not the role)
- Backend
:
Type Script/Node.js, REST APIs, high-throughput transaction processing - Infrastructure
: AWS (ECS, RDS, ELB), Cloudflare - Observability
:
Datadog (existing), custom instrumentation and analytics - Scale
: 10k+ requests/second, real-time payment and loyalty processing - Architecture
:
Distributed microservices, event-driven systems
- Identifying and replacing fragile custom implementations with industry-standard solutions
- Architecting reliable event processing where current approaches show brittleness
- Building proactive monitoring for behavioral…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).