Senior Reliability Engineer; Backend Focus Job New York New York USA,IT/Tech

Position: Senior Reliability Engineer (Backend Focus)
Location: New York

Senior Reliability Engineer (Backend Focus)

NYC

About our team

Accrue is a fintech company that powers modern customer loyalty at the payment layer. Centered around its branded stored value wallet solution, Accrue’s platform helps brands take control of their payments and refunds, reward customer loyalty, and bypass traditional payment giants.

Build a deeper, more durable customer base while optimizing your bottom line – all with Accrue.

About the Role

We're looking for a senior backend engineer who builds reliability through elegant, production-ready code architecture. You'll have significant authority to rearchitect critical systems, replacing homegrown solutions with industry-standard tooling and patterns that handle 10k+ req/sec at scale.

This is primarily a programming role focused on building robust, observable systems through code. You'll spend most of your time architecting and implementing reliability improvements, not managing infrastructure.

What makes this role unique:

Architectural Authority
:
Drive decisions on adopting technologies like Temporal.io for durable execution vs. maintaining custom retry logic
Production Scale
:
Design systems that handle high-throughput payment and loyalty processing with strict SLA requirements
Code-First Reliability
:
Improve system reliability by writing better application code, not just adding monitoring
Industry Standards Over NIH
:
Replace internal implementations with proven, production-ready solutions

What you'll do
System Architecture & Reliability Engineering

Rearchitect core reliability patterns
:
Replace custom retry mechanisms with durable execution engines like Temporal.io
Implement robust event processing
:
Migrate direct webhook handling to reliable delivery systems like Hookdeck with proper delivery semantics
Build behavioral monitoring
:
Integrate time-series databases to detect and alert on changing system behavioral patterns
Eliminate technical debt
:
Systematically replace "not invented here" solutions with industry-standard, battle-tested alternatives
Design and implement systems that maintain performance and reliability at 10k+ requests/second
Write production-grade code for payment processing, wallet operations, and loyalty program mechanics
Build comprehensive error handling, circuit breakers, and graceful degradation patterns
Implement distributed system patterns for fault tolerance and observability
Instrument deep observability into application code using existing frameworks (Datadog)
Design monitoring that provides actionable insights into system behavior and business metrics
Build alerting that proactively identifies reliability issues before they impact users
Lead incident response with focus on permanent architectural fixes rather than band-aid solutions

Technical Leadership

Evaluate and recommend new technologies and architectural patterns for production readiness
Collaborate with product engineering teams to embed reliability patterns into new feature development
Drive technical decisions around system architecture, scaling, and reliability patterns
Mentor engineers on production best practices and reliable system design

What you'll need
Required

5+ years backend engineering experience building high-throughput, production systems (10k+ req/sec)
Strong programming skills in modern languages - our stack uses Type Script, but we value polyglot engineers
Production architecture experience with distributed systems, microservices, and reliability patterns
Systems thinking
:
Ability to identify when to build vs. buy vs. adopt existing solutions
Cloud-native development with AWS services (ECS, RDS, ELB) and modern deployment patterns
Technical leadership
:
Experience making architectural decisions and driving technical improvements independently

Highly Valued

Experience with durable execution systems (Temporal.io, Step Functions, etc.)
Background in fintech, payments, or high-reliability systems
Knowledge of event-driven architectures and reliable message processing
Experience with time-series databases and behavioral analytics
Track record replacing legacy systems with modern, scalable alternatives
Startup or high-growth experience where you've scaled systems through rapid growth

What We're NOT Looking For

Traditional "infrastructure-first" SRE background
Focus on Kubernetes administration or infrastructure provisioning
Scripting-heavy operational work
Basic monitoring setup (connecting Datadog to ECS is table stakes, not the role)
Backend
:
Type Script/Node.js, REST APIs, high-throughput transaction processing
Infrastructure
: AWS (ECS, RDS, ELB), Cloudflare
Observability
:
Datadog (existing), custom instrumentation and analytics
Scale
: 10k+ requests/second, real-time payment and loyalty processing
Architecture
:
Distributed microservices, event-driven systems

Types of Challenges You'd Tackle

Identifying and replacing fragile custom implementations with industry-standard solutions
Architecting reliable event processing where current approaches show brittleness
Building proactive monitoring for behavioral…


Increase/decrease your Search Radius (miles)



Job Posting Language