×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in Los Angeles, Los Angeles County, California, 90079, USA
Listing for: Mango Inc.
Full Time position
Listed on 2026-03-04
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Mango, Inc. Senior Site Reliability Engineer Los Angeles, CA
· Full time

We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on‑premise instruments, data systems, and machine learning pipelines. This role combines systems‑level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.

About Mango, Inc.

Mango is a new type of microscope for rapid bioburden testing.

Description

We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on‑premise instruments, data systems, and machine learning pipelines. This role combines systems‑level engineering with software craftsmanship
, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.

You will be the go‑to expert for diagnosing performance issues in our on‑prem system. This could be from kernel‑level I/O bottlenecks to distributed service latency. In addition to building robust automation that keeps our systems consistent and observable.

Key Responsibilities

Infrastructure Design & Reliability

Design, deploy, and maintain our on‑premise and hybrid infrastructure which includes Dell Power Edge and Power Vault servers, prosumer NAS units, and high‑throughput data processing clusters. Implement fault‑tolerant systems with reproducible deployments and clear observability.

Performance & Systems Analysis

Investigate complex performance issues across hardware, OS, and software boundaries. You will be using Linux toolin addition to in‑house application‑level metrics to uncover root causes in file systems, caching layers, or I/O scheduling.

Automation & Tooling

Build automation for system provisioning, configuration management, and software deployment using Python, Go, Ansible, or similar frameworks. Develop lightweight services and tools that make reliability visible and maintainable.

Work closely with our software and hardware teams to co‑design systems that meet the needs of high‑resolution imaging and ML inference workloads. Translate hardware realities into software reliability guarantees.

Observability & Incident Response

Develop and maintain monitoring, alerting, and logging systems to ensure early detection of issues. Lead incident response and post‑mortem efforts with a focus on learning and prevention.

Documentation & Communication

Produce clear documentation and communicate findings effectively to the broader team — from network topology diagrams to kernel tuning rationales.

General Qualifications
  • Deep understanding of Linux systems and performance (I/O schedulers, RAID, caching, NUMA, kernel parameters).
  • Hands‑on experience designing and managing on‑premise servers, storage arrays, or HPC clusters.
  • Comfort with automation and software development (Python, Go, Bash, or similar).
  • Strong diagnostic and analytical skills: ability to decompose performance problems across multiple layers.
  • Proven track record of improving system reliability, throughput, and maintainability in a fast‑paced environment.
  • Excellent written and verbal communication skills for cross‑disciplinary collaboration.
  • Self‑driven, curious, and motivated by understanding systems deeply rather than just maintaining them.
Bonus Qualities (Not Required)
  • 5–10 years of relevant industry experience in systems engineering, SRE, or infrastructure software roles.
  • Experience tuning Linux file systems (ext4, btrfs) and software RAID (mdadm).
  • Familiarity with containerization and orchestration (Docker, Compose, Kubernetes).
  • Knowledge of networking fundamentals (VLANs, bonding, LACP, 10 GbE/40 GbE).
  • Experience supporting data‑heavy scientific or ML workloads.
  • Demonstrated technical leadership — mentoring others in debugging, reliability, or performance analysis.
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary