Sr Data Scientist
Indiana Borough, Indiana County, Pennsylvania, 15705, USA
Listed on 2026-03-02
-
IT/Tech
Data Scientist, Data Analyst
Our R&D teams at Lucasfilm and ILM are seeking a Sr Data Scientist to join a strategic R&D initiative focused on Generative AI. The goal of this project is to develop a robust data curation pipeline that can help identify and leverage our most useful assets and media for technical model training.
You will play a critical role in bridging the gap between raw visual data and advanced machine learning applications. You will be responsible for the statistical analysis, sampling strategies, and evaluation metrics required to ensure our training data is diverse, relevant, and optimized for next-generation image and video synthesis.
Please note:
This is a fixed term project position for 6 months.
This role is considered Hybrid, but may be open to remote.
What you’ll doData Strategy & Diversity Analysis- Independently design and implement statistical methods to ensure curated datasets retain representative coverage across various visual attributes, stylistic choices, and subject matter.
- Develop logic to identify and down‑weight low‑variance or repetitive data points to maximize training efficiency.
- Collaborate with key stakeholders on algorithms for de‑duplication to automatically eliminate redundant or near‑identical assets from the training corpus.
- Design and lead implementation of automated metrics to assess the quality of generative images and videos.
- Validate automated quantitative metrics by correlating them against qualitative feedback provided by senior creative stakeholders.
- Establish success criteria for model fidelity, accuracy, and stylistic consistency.
- Work closely with the engineering team to integrate data cleaning, normalization, and sampling modules into a scalable automated pipeline.
- Assist in defining taxonomy and metadata standards to systematically organize unstructured visual assets.
This is a fast‑paced, 6‑month fixed term initiative. You will move through rapidly iterating phases:
- Phase 1: defining data taxonomy and establishing baseline automated metrics.
- Phase 2: refining metrics for temporal consistency and validating against initial model fine‑tuning runs.
- Phase 3: final validation of metrics and delivery of fully curated, optimized datasets for cold storage.
- 5+ years experience in related field
- Education - Bachelor’s degree in Data Science, Computer Science, or a related field of study, and/or equivalent work experience. Master’s Degree preferred
- Experience:
Proven background in Data Science with a strong emphasis on Computer Vision, Generative AI, or Deep Learning. - Technical
Skills:
Proficiency in statistical analysis and dataset curation (distribution analysis, sampling techniques). Experience working with large‑scale unstructured media data is a plus. - Evaluation Expertise:
Familiarity with standard and novel metrics for evaluating generative models (e.g., FID, FVD, or similar). - Communication:
Ability to translate complex statistical insights for engineering partners and non‑technical creative leads.
The hiring range for this remote position is $131,900-$208,400 per year, which factors in various geographic regions. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job‑related knowledge, skills, and experience among other factors. A bonus and/or long‑term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).