Big Data/Data Platform Site Reliability Engineer
in
10115, Berlin, Berlin, Deutschland
Verfasst am 2026-01-30
Unternehmen:
Tides Digital
Vollzeit
position Verfasst am 2026-01-30
Berufliche Spezialisierung:
-
IT/Informationstechnik
Systemingenieur, Site Reliability Ingenieur/in, Cloud Computing, Netzwerkingenieur
Stellenbeschreibung
Overview
We"re partnering with a fast-growing, data-intensive technology organisation to hire a Site Reliability Engineer focused on large-scale data platforms. This role sits at the heart of a mission-critical data environment, with responsibility for reliability, scalability and operational excellence across complex distributed systems. This is a senior, hands-on role for an engineer who enjoys owning infrastructure, improving system behaviour over time and operating close to production in high-throughput environments.
Therole
- Deploy, configure, monitor and maintain multiple large-scale data stores across distributed environments. Reliability, performance and availability are core to the role, with a strong focus on lifecycle management of critical data infrastructure.
- Manage and evolve large Linux-based systems, ensuring predictable performance and high uptime. Define and document configuration standards, operational procedures and best practices that support long-term stability.
- Perform performance and reliability testing, review system configuration, software choices and hardware decisions to identify improvement opportunities. Actively participate in incident response, root cause analysis and drive lasting reliability improvements across the platform.
- Influence the direction of the technology stack by contributing ideas that improve resilience, observability and operational efficiency.
- Strong hands-on experience operating large-scale Linux infrastructure in production environments. Comfortable owning complex systems and debugging issues across storage, compute and networking layers.
- Deep, practical experience with Hadoop-based data platforms (HDFS architecture, security models and operational lifecycle management such as upgrades, scaling and recovery). Experience running Kafka clusters in production environments is also key.
- Experience designing or improving automation and deployment workflows, with proficiency in scripting or automation using Python or shell scripting. Solid understanding of networking fundamentals (TCP/IP, DNS, load balancing, basic network security).
- Comfortable taking technical ownership, contributing to on-call and incident processes, and driving continuous reliability improvement.
- Position operates on East Coast US working hours and is suitable for engineers working remotely.
- Experience with large-scale analytical query engines, distributed storage systems or high-availability databases is beneficial. Familiarity with observability platforms, configuration management tools, containerisation and Kubernetes in production environments is valuable.
- Mentoring others and helping establish operational standards are desirable.
Bitte beachten Sie, dass derzeit keine Bewerbungen aus Ihrem Zuständigkeitsbereich für diese Stelle über diese Jobseite akzeptiert werden. Die Präferenzen der Kandidaten liegen im Ermessen des Arbeitgebers oder des Personalvermittlers und werden ausschließlich von diesen bestimmt.
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
Suchen Sie hier nach weiteren Stellen:
×