Job Description & How to Apply Below
About Supr Send:
Supr Send is redefining notification infrastructure for businesses, enabling seamless communication platform ensures reliability, scalability, and efficiency in delivering notifications for the world's most demanding applications. We're looking for talented engineers passionate about building robust, high-performing systems to join us on our journey.
Role
Summary:
We are seeking a Dev Ops Engineer/Site Reliability Engineer (SRE) to join our growing team. The ideal candidate will have extensive experience managing high-scale, distributed systems and expertise in modern Dev Ops practices. You'll ensure that our systems are robust, efficient, and maintainable while supporting millions of notifications across our platform.
Key Responsibilities:
Infrastructure & Orchestration
• Build and maintain highly available systems using Kubernetes and Helm .
• Design and implement scalable solutions for real-time event streaming with Kafka or Pulsar .
Data & Storage Systems
• Optimize and manage data pipelines and storage systems like Clickhouse , Postgre
SQL , and Cassandra .
• Implement high-performance data architectures to support analytics and transactional systems.
Cloud & Automation
• Architect, deploy, and manage cloud-based infrastructure on AWS or GCP .
• Automate infrastructure provisioning, scaling, and monitoring using Git Ops and Infrastructure-as-Code tools.
• Previous exposure to building/maintaining BYOC implementation for SaaS solution would be big plus.
CI/CD & Reliability
• Enhance and maintain CI/CD pipelines for reliable, automated deployments.
• Implement observability tools and practices to monitor system performance and detect issues proactively.
Collaboration & Support
• Partner with developers to integrate SRE best practices into the development lifecycle.
• Lead incident management, root cause analysis, and preventive measures for system reliability.
Required
Skills & Experience:
• Core Proficiencies:
• Expert-level understanding of Kubernetes and Helm for managing containerized applications.
• Strong experience with Kafka or Pulsar for real-time data processing.
• In-depth knowledge of databases like Clickhouse , Postgre
SQL , and Cassandra .
• Proven expertise with AWS and/or GCP , including networking, storage, and compute services.
• Dev Ops Practices:
• Extensive experience with CI/CD tools (e.g., Jenkins, Git Lab CI/CD, ArgoCD).
• Proficiency in Git Ops workflows and Infrastructure-as-Code tools (Terraform, Pulumi, etc.).
•
Experience:
• 4+ years of experience in Dev Ops or SRE roles, ideally in high-scale environments.
• Prior experience working on distributed, fault-tolerant systems in high-traffic companies.
•
Soft Skills:
• Strong problem-solving and analytical abilities.
• Excellent communication and teamwork skills.
Preferred Qualifications:
• Familiarity with observability tools like Prometheus , Grafana , or New Relic .
• Experience with disaster recovery planning and implementing RTO/RPO best practices.
• Understanding of security best practices in cloud-native environments.
What We Offer:
• Work on challenging, high-impact projects with a passionate and skilled team.
• Competitive salary and ESOP.
• Professional growth opportunities through training, conferences, and certifications.
• Opportunity to shape the future of notification infrastructure for global businesses.
Join us to solve engineering challenges at scale!
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×