Senior Site Reliability Engineer; SRE
New York City, Richmond County, New York, 10261, USA
Listed on 2025-11-20
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
New York, United States | Posted on 11/13/2025
Title: Senior Site Reliability Engineer (SRE)
Location: Remote
AtJanuary, we’re transforming the lives of borrowers by bringing humanity to consumer finance. Our data-driven products empower financial institutions to streamline collections and help borrowers regain financial stability and control over their lives. We’re not just expanding access to credit — we’re restoring dignity and paving the way for millions to achieve financial freedom.
Aboutthe RoleAs a Senior Site Reliability Engineer (SRE), you will establish SRE practices from the ground up — ensuring reliability, scalability, and performance as January scales from thousands to millions of borrowers. You’ll architect resilient infrastructure, design modern observability solutions, and build sustainable on-call processes that evolve with our rapid growth.
Your work will directly address scaling challenges including database optimization, async workflow infrastructure, and data pipeline reliability — enabling the engineering team to ship confidently and efficiently.
Key Responsibilities- Lead incident response and develop sustainable on-call practices, including runbooks, blameless postmortems, and continuous improvement to reduce MTTR.
- Build and maintain self-service observability tools (Datadog, Prometheus, ELK) for proactive monitoring and troubleshooting.
- Create and maintain Infrastructure as Code (IaC) using Terraform or Cloud Formation for consistent, secure AWS environments.
- Partner with development teams to architect resilient, scalable infrastructure for critical components like databases, networking, async workflows, and data pipelines.
- Design and implement robust CI/CD pipelines (Git Hub Actions) with advanced deployment strategies (blue/green, canary).
- Drive best practices in reliability and performance early in the design phase to future-proof January’s systems.
- Proven experience leading incident response and postmortem processes for high-availability production systems.
- Deep expertise in designing highly available architectures (EC2, Fargate, auto-scaling, health checks, graceful degradation).
- Strong experience with AWS cloud infrastructure and IaC tools (Terraform, Cloud Formation).
- Hands-on experience with CI/CD automation using Git Hub Actions or equivalent tools.
- Proficiency in observability and monitoring stacks (
Datadog, Prometheus, ELK
). - Solid scripting/programming skills in Python (for automation, tooling, and debugging).
- Excellent communication and documentation skills, with the ability to collaborate across engineering and platform teams.
- Cloud: AWS
- IaC: Terraform, Cloud Formation
- CI/CD: Git Hub Actions
- Languages: Python
- Infrastructure: EC2, Fargate
- Remote role (NYC-based preferred for hybrid collaboration).
- Opportunity to build and own the entire SRE practice for a growing Fin Tech startup.
- Fast-paced, innovative environment working on AI-forward consumer finance products.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).