Data Architect II
Listed on 2026-01-12
-
IT/Tech
Data Engineer, AI Engineer
Data Architect II
Location:
GSK, Cambridge MA, USA (and other locations)
Join to apply for this role s position supports the research data ecosystem and enables scientists to accelerate medical discovery through modern data architecture.
OverviewThe Onyx Research Data Tech organization is a full‑stack shop that powers data and analytics at scale, partnering with scientists to deliver tailored solutions.
Onyx focuses on:
- Building a metadata‑enabled data experience for scientists, engineers, and decision‑makers
- Providing AI/ML and data analysis environments to accelerate predictive capabilities
- Engineering data at scale as a unified asset to unlock real‑time value
- Partner with Scientific Knowledge Engineering to develop physical data models for fit‑for‑purpose products
- Design data architecture aligned with enterprise standards to promote interoperability
- Collaborate with platform teams and data engineers to maintain architecture principles, standards, and guidelines
- Design foundations that support GenAI workflows, including RAG, vector databases, and embedding pipelines
- Work across business areas and stakeholders to ensure consistent implementation of architecture standards
- Lead reviews and maintain architecture documentation and best practices for Onyx and stakeholders
- Adopt a security‑first design with robust authentication and resilient connectivity
- Provide leadership, subject matter expertise, and GSK knowledge to architecture and engineering teams, partners, and vendors
- Bachelor’s degree in computer science, engineering, data science, or similar discipline
- 5+ years of data architecture or engineering in pharma, healthcare, or life sciences R&D
- 3+ years defining architecture standards on Big Data platforms
- 3+ years experience with data warehouse, lake, and enterprise big data platforms
- 3+ years enterprise cloud data architecture (Azure or GCP) at scale
- 3+ years hands‑on relational, dimensional, and analytic experience with RDBMS, No
SQL, ETL, and ingestion protocols
- Master’s or PhD in relevant discipline
- Deep knowledge of at least one programming language (Python, Scala, Java)
- Experience with AI/ML data workflows: feature stores, vector databases, embedding pipelines, model serving architectures
- Familiarity with GenAI/LLM patterns: RAG, prompt engineering, data preparation
- Experience with GCP data/analytics stack:
Spark, Dataflow, Dataproc, GCS, Big Query - Experience with enterprise data tools:
Ataccama, Collibra, Acryl - Experience with Agile frameworks: SAFe, Jira, Confluence, Azure Dev Ops
- Experience applying CI/CD principles to data solutions
- Strong communication skills to explain technical concepts to non‑technical stakeholders
- Pharmaceutical, healthcare, or life sciences background
- Annual base salary: $109,725‑$182,875 (region dependent)
- Annual bonus and long‑term incentive program (share‑based)
- Health care and other insurance benefits for employee and family
- Retirement benefits, paid holidays, vacation, paid caregiver/parental and medical leave
GSK is an Equal Opportunity Employer. All qualified applicants will receive equal consideration for employment without regard to race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), parental status, national origin, age, disability, genetic information, military service, or any basis prohibited under federal, state or local law.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).