Reliability Engineering L2-2
Job in
500016, Prakāshamnagar, Telangana, India
Listed on 2026-02-08
Listing for:
Sanofi
Full Time
position Listed on 2026-02-08
Job specializations:
-
IT/Tech
Cloud Computing, Data Engineer, SRE/Site Reliability, Systems Engineer
Job Description & How to Apply Below
About The Job
We are seeking an exceptional Site Reliability Engineer to drive reliability excellence within Sanofi's R&D Data Platform team. This role serves as a critical bridge between Digital Technology and R&D Data Product teams, ensuring the reliability, scalability, and efficiency of data platforms that power scientific innovation and drug discovery.
Strategic Importance
This role directly enables Sanofi's data-driven drug discovery mission by establishing the reliability foundation for R&D data platforms. You'll accelerate scientific innovation by optimizing data platform reliability, reducing data pipeline latency, and ensuring cost-effective cloud operations.
Role Description
Data Platform Reliability & Operations
Design and implement SLOs for data pipelines, data lakes, and analytics platforms
Establish reliability metrics for data freshness, quality, and availability
Implement error budget policies that balance data platform stability with research agility
Monitor and optimize performance of data processing workloads
Cloud Infrastructure & Fin Ops
Lead AWS infrastructure automation and optimization
Implement Fin Ops practices for R&D cloud spend management
Design cost-effective architectures for data workloads
Optimize resource utilization for research computing environments
Integration Platform Reliability
Ensure reliability of data integration platforms (Amazon S3, PostGreSQL, Dynamo
DB and Snow Flake)
Monitor and optimize ETL/ELT pipeline performance
Implement automated recovery procedures for integration failures
Design scalable integration architectures
Cloud Observability & Monitoring
Implement comprehensive observability using Datadog, Prometheus, and Grafana
Design monitoring strategies for data workloads and pipelines
Create dashboards for data platform KPIs and SLOs
Enable end-to-end visibility of data flows
Dev Ops Automation
Implement Infrastructure as Code using Terraform for cloud provisioning
Establish Git Hub Actions to implement code pipelines
Automate deployment of data platform components
Create self-service capabilities for research teams
Data Quality & Pipeline Reliability
Implement data quality monitoring and alerting
Design automated data validation checks
Establish data pipeline reliability metrics
Create recovery procedures for data pipeline failures
Key Performance Indicators
Platform Reliability (40%)
Data pipeline SLO compliance rate: 99.5%+
Data freshness SLA achievement
Integration platform uptime
Mean time to recovery for data incidents
Engineering Efficiency (30%)
Automation coverage for cloud provisioning
Self-service capability adoption
Infrastructure as Code implementation
Reduction in manual operations
Cost Optimization (20%)
Fin Ops savings achieved
Resource utilization optimization
Cloud cost reduction
Workload efficiency improvements
Stakeholder Success (10%)
Research team satisfaction scores
Data platform user feedback
Cross-team collaboration effectiveness
Knowledge sharing impact
Required Experience
Must Have:
8+ years of experience in Site Reliability Engineering or similar roles
Expert-level AWS cloud infrastructure experience
Strong background in data platforms (Snowflake, Amazon S3, Amazon RDBMS)
Experience with observability platforms (Datadog)
Proficiency in Infrastructure as Code (Terraform) and Git Hub Actions for CI/CD
Understanding of data integration platforms and ETL/ELT processes
Preferred
Experience supporting scientific or research computing environments
Knowledge of Fin Ops practices and cloud cost optimization
Familiarity with pharmaceutical R&D processes
Experience with streaming platforms (Kafka, Kinesis)
Background in data quality and governance
Cultural Attributes
Passion for enabling scientific innovation through technology
Strong collaboration skills with research teams
Balance between reliability and research agility
Data-driven decision making
Continuous improvement mindset
Ability to translate technical concepts for research audiences
Technologies
** Cloud Platforms:
** AWS (primary), S3, Glue
** Data Platforms:
** Snowflake, Databricks, PostGreSQL, DynamoDB
** Integration:
** Informatica, Amazon API Gateway
** Observability:
** Datadog
** Automation:
** Terraform, Git Hub Actions
** Containers:
** EKS, Kubernetes
** Streaming:
** Kafka, Kinesis
** Orchestration:
** Airflow, Step Functions
Success Timeline
First 30 Days
Understand R&D data platform architecture and dependencies
Assess current reliability practices and monitoring coverage
Identify immediate optimization opportunities
Build relationships with R&D and Digital Technology teams
60-90 Days
Implement initial observability improvements
Establish baseline SLOs for critical data pipelines
Begin Fin Ops analysis and optimization
Create automation roadmap
90-180 Days
Deploy comprehensive monitoring solution
Implement key automation workflows
Establish Fin Ops practices
Deliver first wave of self-service capabilities
6+ Months
Lead strategic reliability initiatives
Drive continuous platform optimization
Establish best practices and standards
Enable accelerated research capabilities…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×