More jobs:
Platform Engineer
Job in
Fort Worth, Tarrant County, Texas, 76102, USA
Listed on 2026-01-17
Listing for:
HighLevel
Full Time
position Listed on 2026-01-17
Job specializations:
-
IT/Tech
Systems Engineer, Cybersecurity
Job Description & How to Apply Below
Responsibilities
- Architect and ship multitenant, planet-scale services (checkout, subscriptions, payments orchestration, invoicing, and tax hooks) with clear domain boundaries (DDD) and hard SLOs.
- Be the custodian of API and schema design: own protobuf/Connect
RPC conventions, versioning policy, deprecation playbooks, and Buf breaking change checks so our contracts stand the test of time. - Guarantee resilience and availability of core payment paths: timeouts, retries with jitter, circuit breakers, idempotency keys, outbox/Saga patterns, hedged requests, and graceful degradation.
- Ensure complete auditability: append-only double-entry ledger, immutable event streams, tracelinked entities (OTel trace/span IDs), tamper-evident trails, and reconciliations that tie out to the cent.
- Own error boundaries end-to-end: enumerate failure domains (PSP, network, data, concurrency, quota, browser, device); design uniform error contracts; implement compensations/backfills and automated replay.
- Keep track of every deployed thing: services, workers, triggers, cron, subscriptions, the service catalog, and scorecards (owners, SLOs, runbooks, PDBs, HPA/VPA, budgets, quotas, and timeouts).
- Configuration and limits stewardship: enforce sane defaults across GKE, Pub/Sub, Redis, Firestore/Mongo, Click House connection pools, ACK deadlines, batch sizes, TTLs, memory/FD limits, and GCP quotas.
- Observability as a product: pervasive Open Telemetry, RED/USE metrics, exemplars, trace sampling, SLO dashboards, and alerting that wakes humans only for user-impacting issues.
- Production excellence: canary/blue-green rollouts, automated rollbacks, chaos drills, DR playbooks (RPO/RTO), multiregion failover strategies, and incident command on rotation.
- Security and compliance by design: PCI scope minimization, tokenization/vaulting, secrets/KMS hygiene, data retention/archival, and privacy control embedded checks in CI/CD.
- Developer acceleration: pave golden paths (service templates, ADR/RFC process, linting/formatting, contract tests, ephemeral envs, load/perf harnesses) to make the right thing the easy thing
- Core domain evolution: orchestration ledger reconciliation flows with crisp in variants and consistency guarantees (read-your-writes where needed, eventual where appropriate).
- Reliability strategy: SLIs/SLOs, error budgets, capacity planning, cost/Fin Ops guardrails, multiregion posture, and DR exercises.
- API and data governance: canonical models, schema lifecycle (compatibility matrix, migrations), and data lifecycle (retention, archival, and compliance).
- Practice leadership for High Level: design reviews, postmortems, technical strategy, coding standards, and mentorship across teams raise the bar for the org.
- Hiring and team growth: help us hire, scale, and train the right team; shape interview loops, rubrics, onboarding, and ongoing learning (brown bags, reviews, and pair design).
- Cross-functional partnership: collaborate with Product/Marketing/Support to translate platform capabilities and constraints into roadmaps, GTM narratives, and reliable customer outcomes.
- Risk and roadmap: maintain a technical risk register, make build vs. buy calls, and propose simplifications or deprecations that meaningfully reduce complexity and MTTR.
- 10+ years building and operating backend systems (at least 5+ years in Go), with 2-3+ years acting as a staff/principal-level IC or tech lead for critical paths.
- Deep proficiency with protobuf + Connect
RPC/gRPC and API lifecycle management (versioning, compatibility, contract testing, Buf). - Distributed systems fundamentals: idempotency, exactly once via dedupe/outbox, ordering, consensus basics, back pressure, and concurrency control.
- Event-driven architectures on GCP (Pub/Sub), plus Redis for fast paths; strong schema design in Mongo
DB/Firestore and analytics/reporting patterns on Click House. - Kubernetes/GKE operations at scale: autoscaling (HPA/VPA), PDBs, resource limits/requests, multiregion topologies, CI/CD, and canary/blue-green.
- Reliability engineering: SLIs/SLOs, error budgets, capacity and load testing, incident management, and DR/BCP.
- Security and compliance: secrets/KMS best practices, PCI basics (scope reduction, key rotation), and data governance (retention/archival).
- Testing discipline: unit, integration, contract, property-based, and performance; test data management and deterministic environments.
- Frontend collaboration: solid understanding of Vue.js + Tan Stack Query to shape clean API surfaces and performance budgets across the boundary.
- Exceptional technical writing and communication: design docs, ADRs/RFCs, postmortems, and stakeholder updates.
- Hands-on integrations with major PSPs/local rails (e. g., UPI, wallets, BNPL, cards/3DS2) and reconciliation at scale.
- Experience with active-active or multiregion designs, chaos engineering, and traffic management.
- Observability leadership with Open Telemetry at org scale (tail-based sampling, exemplars).
- Fin Ops experience: cost baselining, quotas, budget…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×