Lead DevOps Engineer; GCP & AWS — ML/AI Medical Imaging; No OPT or H1B
Listed on 2026-03-03
-
IT/Tech
Data Engineer, Cloud Computing, Systems Engineer
Job Description Summary
The GEHC Advanced Visualization Solutions (AVS) segment, a fast-growing business in GE Health Care, is the global leader in ultrasound medical devices and solutions. The portfolio spans the continuum of care to enable customers with ultrasound screening, diagnosis, treatment and monitoring of diseases. Our customers are seeking to improve efficiency in radiology and beyond and increase user confidence to provide better clinical outcomes continues to grow.
Consequently, the need for AI, digital solutions, and automation, connecting devices and software in one seamless ecosystem continues to proliferate.
The Lead Dev Ops Engineer architects, secures, and operates multi-cloud infrastructure (GCP and AWS) that powers ML research, model training/inference, and production software for ultrasound image analysis. This Engineer is the technical owner for our cloud platform—designing scalable environments, enabling high-throughput data operations, optimizing cost/performance, and partnering closely with ML researchers, data engineers, and application teams. This role combines hands-on engineering with technical leadership, with strong emphasis on data governance, security/compliance (e.g., HIPAA), and ML platform reliability.
Job Description* No sponsorship or OPT for this role
* Essential Responsibilities
Partner with ML research, data engineering, and application teams to translate requirements into reliable, secure, and cost-effective platform capabilities.
Lead design reviews, RFCs, and proof-of-concepts; mentor team members on cloud, Kubernetes, and data best practices.
Own incident response for platform components and drive continuous improvement through automation and standards.
Design and implement secure, scalable, multi-cloud (GCP + AWS) configurations
Establish and maintain infrastructure as code (IaC) standards with Terraform
Lead cloud-to-cloud data migration (e.g., GCS ↔ S3) including secure transfer planning, checksum/manifest validation, parallelization, and cutover strategy.
Implement robust ingestion pipelines for medical images and metadata into structured data stores (e.g., Big Query/Redshift/Postgres) with schema management, versioning, and data lineage.
Create tools/services for dataset definition, preprocessing, curation, de-identification, and data quality checks.
Architect and manage GPU/CPU clusters for distributed training and batch inference using managed services (e.g., Sage Maker) and/or Kubernetes (EKS with autoscaling).
Optimize storage tiers (S3/GCS, Glacier/Archive, Filestore/FSx, EBS/Persistent Disk) and caching strategies for high-throughput image workloads.
Establish cost observability (per team/project/workload) with budgets, alerts, showback/chargeback, and automated idle resource cleanup.
Right-size compute/storage, leverage reserved/committed usage, spot/preemptible strategies, and data lifecycle policies.
Partner with ML teams to optimize training job efficiency (e.g., mixed precision, checkpointing strategies, data locality, sharding) and autoscaling.
Own permissions and access management across clouds (AWS IAM, GCP IAM) with least privilege, role/attribute-based access, and service identities.
Implement secrets management (e.g., AWS Secrets Manager, GCP Secret Manager, Hashi Corp Vault) and key management (KMS).
Support compliance and security controls relevant to healthcare/PHI (e.g., HIPAA, SOC
2): encryption in transit/at rest, audit logging, VPC Service Controls, private endpoints, and incident response runbooks.
Plan and execute winddown and exit from prior cloud providers: data egress, dependency mapping, app cutover, contract/savings plan termination, and archival with retention policies.
Validate post-migration integrity and performance; document the final state and reduce operational surface area.
Vertex AI / Sage Maker / Kubernetes
Stand up and maintain managed ML platforms (Vertex AI, Sage Maker) or managed Kubernetes…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).