Machine Learning Scientist/Sr Scientist, Federated Benchmarking & Validation Engineering
Listed on 2026-01-12
-
IT/Tech
Data Scientist, Data Engineer
At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first.
We’re looking for people who are determined to make life better for people around the world.
Lilly Tune Lab is an AI‑powered drug discovery platform that provides biotech companies with access to machine learning models trained on Lilly's extensive proprietary pharmaceutical research data. Through federated learning, the platform enables Lilly to build models on broad, diverse datasets from across the biotech ecosystem while preserving partner data privacy and competitive advantages. This collaborative approach accelerates drug discovery by creating continuously improving AI models that benefit both Lilly and our biotech partners.
Key ResponsibilitiesFederated Test Set Design :
Architect and implement privacy‑preserving protocols for constructing representative test sets across distributed partner datasets, ensuring statistical validity while maintaining data isolation.Benchmark Suite Development :
Create comprehensive benchmark suites covering small molecules (ADMET, solubility, permeability), antibodies (affinity, stability, immunogenicity), and RNA therapeutics (stability, delivery, off‑target effects).Cross‑Domain Validation :
Develop validation strategies that assess model generalization across different experimental protocols, cell lines, species, and therapeutic indications while respecting partner data boundaries.Public Dataset Integration :
Systematically benchmark federated models against public datasets (ChEMBL, Pub Chem, PDB, Therapeutic Antibody Database) to establish performance baselines and identify gaps.Validation Frameworks :
Implement time‑split or proper scaffold‑split validation protocols that assess model performance on prospective data, simulating real‑world deployment scenarios and detecting concept drift.Reproducibility Infrastructure :
Build robust MLOps pipelines ensuring complete reproducibility of federated experiments, including versioning of data snapshots, model checkpoints, and hyperparameter configurations.Statistical Rigor :
Design statistically powered validation studies accounting for multiple testing, hierarchical data structures, and non‑independent observations common in drug discovery datasets.Performance Profiling :
Develop comprehensive performance profiling across diverse molecular scaffolds, target classes, and property ranges, identifying systematic biases and failure modes.Platform Integration :
Collaborate with engineering teams to integrate validation frameworks with the Tune Lab federated learning platform built on NVIDIA FLARE, ensuring scalable and automated testing across partner networks.
PhD in Computational Biology, Bioinformatics, Cheminformatics, Computer Science, Statistics, or related field from an accredited college or university
Minimum of 2 years of experience in the biopharmaceutical industry or related fields, with demonstrated expertise in drug discovery and early development
Strong foundation in experimental design, statistical validation, and hypothesis testing
Experience with ML model validation, cross‑validation strategies, and performance metrics
Proficiency in data engineering, pipeline development, and automation
Experience with federated learning platforms and distributed computing
Knowledge of regulatory requirements for AI/ML in pharmaceutical development
Expertise in ADMET assay development and validation
Understanding of antibody engineering and characterization methods
Familiarity with RNA therapeutic design and delivery systems
Experience with clinical biomarker validation and translational research
Proficiency in workflow orchestration tools (Airflow, Kubeflow, Prefect)
Strong knowledge of containerization and cloud computing…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).