×
Register Here to Apply for Jobs or Post Jobs. X

Senior​/Site Reliability Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Fal
Full Time position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, Network Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Senior/Staff Site Reliability Engineer

You are a seasoned SRE who keeps production infrastructure running  own the reliability and availability of customer-facing systems — from Kubernetes clusters to deployment pipelines to the networking layer that connects it all. You think in SLOs, automate ruthlessly, and treat every incident as a chance to make the system better.

Key Responsibilities
  • Own and operate our Kubernetes infrastructure: cluster lifecycle, upgrades, networking, and multi-tenant isolation for customer workloads
  • Build and maintain CI/CD pipelines and deployment infrastructure
  • Leverage AI to an extreme level to automate analysis and resolution of production issues, and improve software development speed, reliability and maintainability
  • Build dashboards, alerting, and anomaly detection across our systems
  • Define and enforce SLOs and build out incident response processes
  • Manage and improve our networking, load balancing, and service mesh configurations
  • Drive reliability improvements across the stack through automation, runbooks, and chaos engineering
Requirements
  • 5+ years experience in managing critical production systems and software development workflows
  • Strong production experience setting up and operating Kubernetes at scale, using infrastructure-as-code (Terraform, Ansible)
  • Deep knowledge of Linux networking, container networking (CNI plugins, VXLAN, BGP), and DNS
  • Experience building CI/CD systems and Git Ops workflows (FluxCD, ArgoCD)
  • Proficiency in Python and either Go or Bash for tooling and automation
  • Strong experience with logging, monitoring and alerting (Prometheus, Grafana, Loki, Thanos, Victoria Metrics, Datadog)
  • Excellent communication and ability to drive technical decisions across teams
  • Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Nice to have
  • Experience with managing GPU and AI/ML workloads
  • Experience with kernel-based monitoring and routing (eBPF, XDP)
  • Experience with security tooling (Falco, Coroot, SIEM)
  • Experience with bare metal Kubernetes networking (Calico, Cilium, Metal

    LB)
  • Experience with distributed storage systems (Ceph, Longhorn, etc.)
Compensation
  • $,000 plus equity + benefits
Location What we offer at fal
  • Interesting and challenging work
  • A lot of learning and growth opportunities
  • We are currently hiring in downtown San Francisco.
  • We offer visa sponsorship and will help you relocate to San Francisco.
  • Health, dental, and vision insurance (US)

Regular team events and offsites

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary