Job Description & How to Apply Below
Drive the design, automation, and reliability of Albert Invent's core platform to support scalable, high-performance AI applications.
You will partner closely with Product Engineering and SRE teams to ensure security, resiliency, and developer productivity while owning end-to-end service operability.
Key Responsibilities
Own the design, reliability, and operability of Albert's mission-critical platform.
Work closely with Product Engineering and SRE to build scalable, secure, and high-performance services.
Plan and deliver core platform capabilities that improve developer velocity, system resilience, and scalability.
Maintain a deep understanding of microservices topology, dependencies, and behavior.
Act as the technical authority for performance, reliability, and availability across services.
Drive automation and orchestration across infrastructure and operations.
Serve as the final escalation point for complex or undocumented production issues.
Lead root-cause analysis, mitigation strategies, and long-term system improvements.
Mentor engineers in building robust, automated, and production-grade systems.
Champion best practices in SRE, reliability, and platform engineering.
Must-Have Requirements
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
4+ years of strong backend coding in Python or Node.js.
4+ years of overall software engineering experience, including 2+ years in an SRE / automation-focused role.
Strong hands-on experience with Infrastructure as Code (Terraform preferred).
Deep experience with AWS cloud infrastructure and distributed systems (microservices, APIs, service-to-service communication).
Experience with observability systems – logs, metrics, and tracing.
Experience using CI/CD pipelines (e.g., Circle
CI).
Performance testing experience using K6 or similar tools.
Strong focus on automation, standards, and operational excellence.
Experience building low-latency APIs ( Ability to work in fast-paced, high-ownership environments.
Proven ability to lead technically, mentor engineers, and influence engineering quality.
Good-to-Have Skills
Kubernetes and container orchestration experience.
Observability tools such as Prometheus, Grafana, Open Telemetry, Datadog.
Experience building Internal Developer Platforms (IDPs) or reusable engineering frameworks.
Exposure to ML infrastructure or data engineering pipelines.
Experience working in compliance-driven environments (SOC2, HIPAA, etc.).
Skills:
- Automation, Terraform, Python, NodeJS (Node.js) and Amazon Web Services (AWS)
Position Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×