Principal Cloud Native Platform Engineer Job Bangor (Wales) area,Bangor Wales UK,Engineering

Location: Bangor

About Nscale

Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.

We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future.

About

The Role

The Principal Cloud Native Platform Engineer is a senior technical leader responsible for the long-term integrity, coherence, and evolution of Nscale’s cloud-native platform. This role extends beyond individual systems, focusing on architecture, standards, and engineering excellence as the organisation and platform scale.

The role combines deep hands-on engineering with strong architectural stewardship. You will act as a technical escalation point, a mentor to senior engineers, and a trusted advisor to engineering leadership, helping shape the direction of the platform and the practices used to build it.

This role requires Principal engineers to be able to accelerate the delivery of Nscale’s platform service offerings, marrying innovation with efficiency via experience in technical direction. Working closely with the Director of Cloud Native Platform Engineering

What You'll be Doing (Responsibilities)

Own and evolve the core platform architecture across multiple subsystems
Design and review complex, multi-controller Kubernetes-native systems
Maintain a strong bias toward simplicity, explicitness, and long-term maintainability
Act as a technical escalation point for the most complex platform problems

Standardisation & Technical Governance

Define and maintain platform-wide engineering standards
, including:
Controller and operator design patterns
API and CRD design guidelines
Versioning, compatibility, and deprecation strategies
Ensure consistency across teams in:
Reconciliation behavior
Error handling and retry semantics
Review and influence designs to prevent:
Unnecessary divergence
Overlapping abstractions
Establish reference implementations and shared libraries where appropriate

Mentoring & Capability Building

Actively mentor Senior and mid-level engineers in:
Kubernetes internals and control plane design
Distributed systems thinking
Production readiness and failure analysis
Raise the overall technical bar through:
Design reviews
Code reviews focused on correctness and clarity
Knowledge sharing and documentation
Identify skill gaps within the team and contribute to closing them through guidance and example
Serve as a trusted technical advisor to engineering leadership

Cross-Team Influence

Align platform engineering decisions with:
SRE operational requirements
Infrastructure and hardware roadmaps
Product and customer needs
Communicate architectural intent clearly through:
Reviews and technical discussions
Ensure that platform changes are understandable, supportable, and well-documented

About You (Skills / Qualifications)

Demonstrated experience designing and building Kubernetes-native systems, including custom controllers, operators, CRDs, and reconciliation logic that runs reliably in production.
Proven ability to design coherent, multi-component platform architectures that evolve over time without accumulating excessive complexity or technical debt.
Production-Grade Software Engineering in Go
Strong track record of writing maintainable, testable, and resilient Go code for long-lived distributed systems.
Experience designing Kubernetes APIs and internal abstractions that are explicit, stable, and aligned with real operational constraints.
Deep understanding of failure modes in Kubernetes and distributed systems, and the ability to design for graceful degradation, recovery, and operability.
Experienc…


Increase/decrease your Search Radius (miles)



Job Posting Language