Senior Data Engineer,Data Lakehouse Infrastructure Job San Francisco area,California USA,IT/Tech

Senior Software Engineer - Data Lakehouse Infrastructure Senior Software Engineer - Data Lakehouse Infrastructure

2 weeks ago Be among the first 25 applicants

Get AI-powered advice on this job and more exclusive features.

TRM Labs is a blockchain intelligence company committed to fighting crime and creating a safer world. By leveraging blockchain data, threat intelligence, and advanced analytics, our products empower governments, financial institutions, and crypto businesses to combat illicit activity and global security threats. At TRM, you'll join a mission-driven, fast-paced team made up of experts in law enforcement, data science, engineering, and financial intelligence, tackling complex global challenges daily.

Whether analyzing blockchain data, developing cutting-edge tools, or collaborating with global organizations, you'll have the opportunity to make a meaningful and lasting impact.

We’re building the foundational data infrastructure powering next-generation analytics part of our mission, we’re architecting a modern data lakehouse to support complex workloads, real-time data pipelines, and secure data governance—at petabyte scale.

We are looking for a Senior Data Engineer to help us design, implement, and scale core components of our lakehouse architecture. You will have ownership over data modeling, ingestion, query performance optimization, and metadata management using cutting-edge tools and frameworks like Apache Spark, Trino, Hudi, Iceberg, and Snowflake. We’re looking for engineers with deep expertise in at least one area and a solid understanding of the trade-offs among different technologies.

The impact you’ll have here:

Architect and scale a high-performance data lakehouse on GCP, leveraging technologies like Star Rocks, Apache Iceberg, GCS, Big Query, Dataproc, and Kafka.
Design, build, and optimize distributed query engines such as Trino, Spark, or Snowflake to support complex analytical workloads.
Implement metadata management in open table formats like Iceberg and data discovery frameworks for governance and observability using Iceberg compatible catalogs.
Develop and orchestrate robust ETL/ELT pipelines using Apache Airflow, Spark, and GCP-native tools (e.g., Dataflow, Composer).
Collaborate across departments, partnering with data scientists, backend engineers, and product managers to design and implement

What we’re looking for:

5+ years of experience in data or software engineering, with a focus on distributed data systems and cloud-native architectures.
Proven experience building and scaling data platforms on GCP, including storage, compute, orchestration, and monitoring.
Strong command of one or more query engines such as Trino, Presto, Spark, or Snowflake.
Experience with modern table formats like Apache Hudi, Iceberg, or Delta Lake.
Exceptional programming skills in Python, as well as adeptness in SQL or Spark

SQL.
Hands-on experience orchestrating workflows with Airflow and building streaming/batch pipelines using GCP-native services.

About the Team:

The Data Platform team is the funnel between all of TRM's data world and product world. We care about all layers of stack including petabyte of data stores, pipelines, and processes.
We have quite a big scope as a the team with new and exciting projects every quarter. As a result, we collaborate across the board with most teams at TRM.
We believe in async communication and are also not afraid to jump on a quick huddle if that helps to move things faster. We are both scrappy when the situation demands and also process-oriented when we need to achieve our OKRs.
We are always looking for people who can elevate the quality our tech and our execution. If you enjoy a remote-first and async friendly environment to achieve efficacy and efficiency at petabyte scale, our team could be a great pick for you!
Team members are based in the US across almost all timezones! Our on-call tends to be in EST/PST shift, whatever suits you the best.
We do try to reserve some overlap in the day for meetings. Our north star - no IC spends more than 3-4 hours/week in meetings.

Learn about TRM Speed in this position:

Build scalable engines to optimize routine…


Increase/decrease your Search Radius (miles)



Job Posting Language

Senior Data Engineer, Data Lakehouse Infrastructure