Software Engineer, Enterprise Data Platform
Listed on 2026-03-01
-
IT/Tech
Data Engineer, Cloud Computing
About Us:
Notion helps you build beautiful tools for your life’s work. In today's world of endless apps and tabs, Notion provides one place for teams to get everything done, seamlessly connecting docs, notes, projects, calendar, and email—with AI built in to find answers and automate work. Millions of users, from individuals to large organizations like Toyota, Figma, and OpenAI, love Notion for its flexibility and choose it because it helps them save time and money.
In-person collaboration is essential to Notion's culture. We require all team members to work from our offices on Mondays and Thursdays, our designated Anchor Days. Certain teams or positions may require additional in-office workdays.
Join Notion’s Data Platform team as we scale our infrastructure for enterprise customers. You’ll help design and build the core data platform that powers Notion’s AI, analytics, and search while meeting stringent security, privacy, and compliance requirements. This role focuses on the data platform layer (storage, compute, pipelines, governance) and partners closely with Security, Search Platform, AI, and Data Engineering.
What You’ll Do:Design and evolve the data lakehouse
Build and operate core lakehouse components (e.g., Iceberg/Hudi/Delta tables, catalogs, schema management) that serve as the source of truth for analytics, AI, and search.
Own critical data pipelines and services
Design, implement, and harden batch and streaming pipelines (Spark, Kafka, EMR, etc.) that move and transform data reliably across regions and cells.
Advance EKM and encryption-by-design
Work with Security and platform teams to integrate Enterprise Key Management (EKM) into data workflows, including file- and record-level encryption and safe key handling in Spark and storage systems.
Improve data access, auditability, and residency
Build primitives for fine-grained access control, auditing, and data residency so customers can see who accessed what, where, and under which guarantees.
Drive reliability and observability
Raise the operational bar for our data stack: improve on-call experience, debugging, and alerting for data jobs and services.
Optimize large-scale performance and cost
Tackle performance and cost challenges across Kafka, Spark, and storage for very large work spaces (20k+ users, multi-cell deployments), including cluster migrations and workload tuning.
Enable ML and search workflows
Build infrastructure to support training and inference pipelines, ranking workflows, and embedding infrastructure on top of the shared data platform.
Shape the platform roadmap
Contribute to design docs and evaluations that influence our long-term platform direction and vendor choices.
Experience: 5+ years building and operating data platforms or large-scale data infrastructure for SaaS or similar environments.
Programming: Strong skills in at least one of Python, Java, or Scala; comfortable working with SQL for analytics and data modeling.
Distributed data systems: Hands-on experience with Spark or similar distributed processing systems, including debugging and performance tuning.
Streaming & ingestion: Experience with Kafka or equivalent streaming systems; familiarity with CDC/ingestion patterns (e.g., Debezium, Fivetran, custom connectors).
Lakehouse / storage: Experience with data lakes and table formats (Iceberg, Hudi, or Delta) and/or data catalogs and schema evolution.
Security & governance: Practical understanding of access control, encryption at rest/in transit, and auditing as they apply to data platforms.
Cloud infrastructure: Experience with at least one major cloud provider (AWS, GCP, or Azure) and managed data/compute services (e.g., EMR, Dataproc, Kubernetes-based compute).
Operations: Comfortable owning services and pipelines in production, including on-call, incident response, and reliability improvements.
Experience working directly with enterprise customers or on features like data residency, EKM, or compliance-driven auditing.
Prior work on Databricks, Unity Catalog, Lake Formation
, or similar catalog/governance systems.Background implementing multi-region / multi-cell data architectures.
E…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).