Senior Software Engineer
Listed on 2026-03-01
-
Software Development
AI Engineer
About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.
At Nscale, our Software engineers form the backbone of our product offering. We build state-of-the-art AI products allowing our clients to move quickly in an increasingly competitive digital landscape. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you will build trust through openness and transparency, where everyone is inspired to do their best work.
If you join our team, you’ll be contributing to building the technology that powers the future.
Nscale is looking for a Senior Software Engineer to build and scale the control/data plane systems and application services that power our GenAI cloud. You'll work alongside domain experts and experienced engineers across our infrastructure, platform, and product teams to build the foundational systems that enable thousands of AI workloads to run reliably s is a high-impact role where you'll have significant ownership and the opportunity to shape how we build and operate critical platform services.
WhatYou’ll Be Doing
- Build and own the control plane and data plane services that power our cloud platform. You'll contribute to APIs and SDKs for platform consumption, implement reliable distributed state management and storage systems, and create services that coordinate workload scheduling and orchestration across multiple regions.
- Engineer the infrastructure for processing and managing high-throughput workloads and distributed data flows. You'll solve complex challenges around data capture, storage, and accessibility for AI/ML training and inference.
- Drive technical decisions for your systems and champion engineering best practices across the team. You will uphold high standards for reliability, testing, monitoring, and CI/CD in a fast-paced, research-driven environment, and provide technical mentorship to engineers on your team.
- Own the operational health of your systems in production. You'll implement observability, respond to incidents, optimise performance, and continuously improve reliability based on production feedback and metrics.
- You will have the opportunity to develop entirely new platform services and methods, leveraging cloud-native technologies and AI to create novel platform and product capabilities.
- You have extensive hands-on experience designing, building, and operating scalable production systems on or for a major cloud provider (e.g., AWS, GCP), including data-intensive distributed workflows, backend services, and APIs.
- You use AI tools like Claude, Cursor, or similar as a core part of your development workflow – not as a novelty, but as a fundamental multiplier of what you can build. Whether you're already using AI to rapidly prototype complex distributed systems, explore unfamiliar codebases, and architect solutions across new domains, or you're excited to push your AI-assisted development skills to that level, you understand the potential and are committed to mastering how to effectively collaborate with AI while maintaining high code quality and architectural coherence.
- You believe in using the right tool for the job and have strong proficiency with typed languages. Our primary stack is built with Go, with some services in Rust and Python. You're comfortable working across different languages and applying various technical approaches to find the best solution.
- You have delivered multi-service distributed systems from ambiguous requirements to high-adoption operational systems in production, with hands-on experience in day-2 operations including monitoring, alerting, incident response, and…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: