Data Engineer
Listed on 2026-01-14
-
IT/Tech
Data Engineer, Data Analyst, Data Science Manager
Location: New York
About the Company
Gemini is a global crypto and Web3 platform founded by Cameron and Tyler Winklevoss in 2014, offering a wide range of simple, reliable, and secure crypto products and services to individuals and institutions in over 70 countries. Our mission is to unlock the next era of financial, creative, and personal freedom by providing trusted access to the decentralized future. We envision a world where crypto reshapes the global financial system, internet, and money to create greater choice, independence, and opportunity for all — bridging traditional finance with the emerging cryptoeconomy in a way that is more open, fair, and secure.
As a publicly traded company, Gemini is poised to accelerate this vision with greater scale, reach, and impact.
Data
At Gemini, our Data Team is the engine that powers insight, innovation, and trust across the company. We bring together world‑class data engineers, platform engineers, machine‑learning engineers, analytics engineers, and data scientists — all working in harmony to transform raw information into secure, reliable, and actionable intelligence. From building scalable pipelines and platforms, to enabling cutting‑edge machine learning, to ensuring governance and cost efficiency, we deliver the foundation for smarter decisions and breakthrough products.
We thrive at the intersection of crypto, technology, and finance, and we’re united by a shared mission: to unlock the full potential of Gemini’s data to drive growth, efficiency, and customer impact.
Staff Data Engineer
The Data team is responsible for designing and operating the data infrastructure that powers insight, reporting, analytics, and machine learning across the business. As a Staff Data Engineer, you will lead architectural initiatives, mentor others, and build high‑scale systems that impact the entire organization. You will partner closely with product, analytics, ML, finance, operations, and engineering teams to move, transform, and model data reliably, with observability, resilience, and agility.
This role is required to be in person twice a week at either our San Francisco, CA or New York City, NY office.
Responsibilities- Lead the architecture, design, and implementation of data infrastructure and pipelines, spanning both batch and real‑time / streaming workloads
- Build and maintain scalable, efficient, and reliable ETL/ELT pipelines using languages and frameworks such as Python, SQL, Spark, Flink, Beam, or equivalents
- Work on real‑time or near‑real‑time data solutions (e.g. CDC, streaming, micro‑batch) for use cases that require timely data
- Partner with data scientists, ML engineers, analysts, and product teams to understand data requirements, define SLAs, and deliver coherent data products that others can self‑serve
- Establish data quality, validation, observability, and monitoring frameworks (data auditing, alerting, anomaly detection, data lineage)
- Investigate and resolve complex production issues: root cause analysis, performance bottlenecks, data integrity, fault tolerance
- Mentor and guide more junior and mid‑level data engineers: lead code reviews, design reviews, and best‑practice evangelism
- Stay up to date on new tools, technologies, and patterns in the data and cloud space, bringing proposals and proof‑of‑concepts when appropriate
- Document data flows, data dictionaries, architecture patterns, and operational runbooks
- 8+ years of experience in data engineering (or similar) roles
- Strong experience in ETL/ELT pipeline design, implementation, and optimization
- Deep expertise in Python and SQL writing production‑quality, maintainable, testable code
- Experience with large‑scale data warehouses (e.g. Databricks, Big Query, Snowflake)
- Solid grounding in software engineering fundamentals, data structures, and systems thinking
- Hands‑on experience in data modeling (dimensional modeling, normalization, schema design)
- Experience building systems with real‑time or streaming data (e.g. Kafka, Kinesis, Flink, Spark Streaming), and familiarity with CDC frameworks
- Experience with orchestration / workflow frameworks (e.g. Airflow)
- Familiarity with data governance,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).