×
Register Here to Apply for Jobs or Post Jobs. X

DevOps Engineer

Job in New York, New York County, New York, 10261, USA
Listing for: Dune Security
Full Time position
Listed on 2026-01-13
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Location: New York

Base Pay Range

$/yr - $/yr

Company Overview

Dune Security is the world’s first User Adaptive Risk Management solution. Powered by AI, we quantify employee risk with comprehensive data and automatically deliver user‑adaptive training and intervention. For higher‑risk users, our platform integrates seamlessly with the broader security stack to dynamically implement controls. Backed by Craft Ventures, Toba Capital, Mass Mutual Ventures, Alumni Ventures, Fire streak Ventures, and Antler, we empower CISOs to proactively manage user risk – the leading cause of cybersecurity breaches – and build safer, more resilient organizations.

Role Overview

Dune Security is seeking a Senior Dev Ops Engineer to own and operate highly reliable, scalable, production‑grade infrastructure and developer platforms. This role carries direct responsibility for availability, deployment safety, incident response, platform design, and long‑term infrastructure quality for customer‑facing web systems.

The ideal candidate has operated web platforms at or above 99.999% availability, responds decisively to live production incidents, and proactively designs and improves systems to reduce operational risk and accumulated technical debt.

Key Responsibilities
  • Own production reliability for customer‑facing web platforms
    , with demonstrated experience meeting (e.g. 99.9%+ uptime SLOs)
  • Serve as an on‑call escalation owner for P0/P1 incidents, driving rapid mitigation and high‑quality post‑incident analysis
  • Proactively design resilient systems to eliminate classes of failures before they occur
  • Maintain deployment safety
    , including low change‑failure rates and zero manual production changes outside emergency procedures
  • Build and operate infrastructure via infrastructure‑as‑code (Terraform), minimizing manual toil
  • Continuously reduce infrastructure and platform technical debt
    , including refactoring brittle systems, improving automation coverage, and simplifying operational complexity
  • Operate and scale identity and access platforms (e.g., Keycloak), enforcing MFA and production access hygiene
  • Design and operate production compute environments spanning CPU, GPU, TPU, and FPGA workloads
  • Design and operate AI‑capable infrastructure
    , including model serving and batch or real‑time inference pipelines
  • Partner with application engineers to improve service operability, resilience, deployment safety, and long‑term maintainability
  • Document systems, incidents, and decisions clearly using Jira and Confluence
Qualifications & Experience
  • 5+ years of Dev Ops or SRE experience owning real production systems, including customer‑facing web platforms operating at ≥99.999% uptime
  • Proven ownership of incident response, MTTR, and reliability metrics
    , with hands‑on responsibility during live P0/P1 incidents
  • Strong experience designing and operating AWS cloud infrastructure across multiple environments
    , using Terraform, Docker
    , and infrastructure‑as‑code practices
  • Deep CI/CD and release engineering experience with Git Lab CI/CD and Jenkins
    , including safe, automated production deployments
  • Advanced Linux systems administration and Linux internals; kernel‑level tuning and performance optimization preferred
  • Experience designing and operating AI and data platforms in production
    , including Airflow, Databricks, Snowflake
    , and AI deployment environments
  • Experience operating and scaling heterogeneous compute environments
    , including CPU, GPU, TPU, FPGA, and QPU workloads
  • Experience operating and scaling Mongo

    DB Atlas
    in production environments
  • Strong experience with observability and analytics tooling
    , including Grafana, Splunk, Elasticsearch / Open Search, Kibana
    , and Open Telemetry
    , and using telemetry directly during incident response
  • Experience operating identity and access platforms such as Keycloak
    , and working with cloud security and posture tooling including AWS Guard Duty and Wiz
  • Hands‑on experience using Jira for incident and work tracking and Confluence for operational and architectural documentation
  • Bachelor’s degree in Computer Science, Computer Engineering, or a related field (required)
  • Cloud or Dev Ops certifications preferred
What You’ll Bring

We are looking for a proactive and…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary