Platform Engineer - Reliability & Scale
Listed on 2026-01-12
-
Software Development
Cloud Engineer - Software, DevOps
About Lang Chain
At Lang Chain, our mission is to make intelligent agents ubiquitous. We provide the agent engineering platform and open source frameworks developers need to ship reliable agents fast.
Our open source frameworks, Lang Chain and Lang Graph, see over 90+ million downloads per month and help developers build agents with speed and granular control. Lang Smith offers observability, evaluation, and deployment for rapid iteration, enabling teams to transform LLM systems into dependable production experiences.
Lang Chain is trusted by millions of developers worldwide and powers AI teams at companies like Replit, Clay, Cloudflare, Harvey, Rippling, Vanta, Workday, and more.
About the roleJoin our platform engineering team as we scale Lang Smith and Lang Graph Platform products. You'll architect and operate the critical systems that power our customers' AI observability and Lang Graph app deployments, working directly with cutting‑edge technologies at the intersection of AI and distributed systems.
- Scale critical systems
:
Design and implement high throughput data‑intensive systems supporting our flagship SaaS products (Lang Smith and Lang Graph Platform) - Drive reliability
:
Build monitoring, alerting, and automated recovery systems that maintain high uptime - Solve complex problems
:
Debug performance bottlenecks, optimize database queries, and architect solutions for distributed system challenges - Shape platform strategy
:
Influence technical decisions around infrastructure, tooling, and operational practices as we grow from startup to enterprise scale - Respond to incidents
:
Participate in on‑call rotation with focus on post‑incident learning, automation and prevention
- Experience
: 5+ years building and operating production systems at scale - Database expertise
:
Production experience with OSS data stores (Postgre
SQL, Redis) - Infrastructure expertise
:
Deep knowledge of Cloud Object Storage, Kubernetes, containerized infrastructure, cloud platforms (e.g. GCP) - Observability mastery
:
Hands‑on experience with observability stacks (Datadog, Prometheus/Grafana, Open Telemetry or similar) - Programming proficiency
:
Strong hands‑on software engineering skills (Python, Go, Rust) - Operational mindset
: "You build it, you run it, you own it" philosophy with the focus on sustainable practices
- Knowledge of columnar file and memory formats
- Proficiency with analytical databases
- Background in high‑growth startups
- Previous experience in AI infrastructure
- We offer competitive compensation that includes base salary, meaningful equity, and benefits such as health and dental coverage, flexible vacation, a 401(k) plan, and life insurance. Actual compensation will vary based on role, level, and location. For team members in the EU and UK, we provide locally competitive benefits aligned with regional norms and regulations.
- Annual salary range: $175,000-$225,000 USD for Senior Engineers
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).