Data Infrastructure Engineer Sunnyvale, California, US Remote
Sunnyvale, Santa Clara County, California, 94087, USA
Listed on 2026-01-12
-
IT/Tech
Data Engineer, AI Engineer
Headquartered in
Silicon Valley
, Meshy is the leading 3D generative AI company on a mission to
Unleash 3D Creativity by transforming the content creation pipeline. Meshy makes it effortless for both professional artists and hobbyists to create unique 3D assets—turning text and images into stunning 3D models in just minutes. What once took weeks and cost $1,000 now takes just 2 minutes and $1.
Our world-class team of top experts in computer graphics, AI, and art includes alumni from MIT, Stanford, and Berkeley, as well as veterans from Nvidia and Microsoft. Our talent spans the globe, with team members distributed across
North America, Asia, and Oceania
, fostering a diverse and innovative multi-regional culture focused on solving global 3D challenges. Meshy is trusted by top developers, backed by premiere venture capital firms like
Sequoia and GGV
, and has successfully raised$52 Million
in funding.
Meshy is the market leader, recognized as the
No.1in popularity among 3D AI tools (according to 2024 A16Z Games) and
No.1in website traffic (according to Similar Web, with3 Million
monthly visits). The platform boasts over5 Million users
and has generated40 Million m
odels
.
Founder and CEO
Yuanming (Ethan) Hu earned his Ph.D. in graphics and AI from MIT, where he developed the acclaimed Taichi GPU programming language (27K stars on Git Hub, used by 300+ institutes). His work is highly influential, including an honorable mention for the SIGGRAPH 2022 Outstanding Doctoral Dissertation Award and over 2,700 research citations.
We are seeking a Data Infrastructure Engineer to join our growing team. In this role, you will design, build, and operate distributed data systems that power large-scale ingestion, processing, and transformation of datasets used for AI model training. These datasets span traditional structured data as well as unstructured assets such as images and 3D models, which often require specialized preprocessing for pretraining and fine-tuning workflows.
This is a versatile role: you’ll own end-to-end pipelines (from ingestion to transformation), ensure data quality and scalability, and collaborate closely with ML researchers to prepare diverse datasets for cutting-edge model training. You’ll thrive in our fast-paced startup environment, where problem-solving, adaptability, and wearing multiple hats are the norm.
What You’ll Do:Core Data Pipelines
Design, implement, and maintain distributed ingestion pipelines for structured and unstructured data (images, 3D/2D assets, binaries).
Build scalable ETL/ELT workflows to transform, validate, and enrich datasets for AI/ML model training and analytics.
Distributed Systems & Storage
Architect pipelines across cloud object storage (S3, GCS, Azure Blob), data lakes, and metadata catalogs.
Optimize large-scale processing with distributed frameworks (Spark, Dask, Ray, Flink, or equivalents).
Implement partitioning, sharding, caching strategies, and observability (monitoring, logging, alerting) for reliable pipelines.
Pretrain Data Processing
Support preprocessing of unstructured assets (e.g., images, 3D/2D models, video) for training pipelines, including format conversion, normalization, augmentation, and metadata extraction.
Implement validation and quality checks to ensure datasets meet ML training requirements.
Collaborate with ML researchers to quickly adapt pipelines to evolving pretraining and evaluation needs.
Infrastructure & Dev Ops
Use infrastructure-as-code (Terraform, Kubernetes, etc.) to manage scalable and reproducible environments.
Data Governance & Collaboration
Maintain data lineage, reproducibility, and governance for datasets used in AI/ML pipelines.
Work cross-functionally with ML researchers, graphics/vision engineers, and platform teams.
Embrace versatility: switch between infrastructure-level challenges and asset/data-level problem solving.
Contribute to a culture of fast iteration, pragmatic trade-offs, and collaborative ownership.
Technical Background
5+ years of experience in data engineering, distributed systems, or similar.
Solid skills in SQL for analytics, transformations, and warehouse/lakehouse integration.
Proficiency with distributed…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).