×
Register Here to Apply for Jobs or Post Jobs. X

Infra SRE Consultant​/Engineer

Job in Santa Clara, Santa Clara County, California, 95053, USA
Listing for: Diverse Lynx
Full Time position
Listed on 2025-12-28
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Position: Infra SRE Consultant / Engineer

Job Title:

Infra SRE Consultant / Engineer (Need SRE Profiles do not share common Dev Ops or Cloud Profiles)

Location

Santa Clara, CA 95051 - Onsite

Top Skills
  • Strong focus on observability tools like Prometheus, Grafana and practices
  • Excellent problem-solving and troubleshooting skills, capable of handling escalations from L1 up to L4.
  • Able to identify feature level issues and work with development teams to resolve them.
  • Kubernetes expertise, should be able to manage deployments, perform deep-level debugging, and handle Kubernetes administration tasks.
  • Strong Linux/Unix fundamentals, including system-level operations/engineering skills.
Job Description /Responsibilities
  • On-prem infrastructure management:
    Manage on-prem infrastructure. Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data centers.
  • Guard SLAs:
    Guard service level agreements (SLAs) for critical engineering services. Implement monitoring, alerting, and incident response procedures to ensure adherence to defined performance targets. Perform root cause analysis and post-mortems of incidents for any threshold breaches.
  • Observability:
    Set up and manage monitoring and logging tools such as Prometheus, Grafana, or the ELK Stack to oversee system health and performance. Maintain KPI pipelines using Jenkins, Python and ELK. Improve monitoring systems by adding custom alerts based on business needs.
  • Automation & Optimization:
    Help in capacity planning, optimization and better utilization efforts.
  • Day-to-Day Support:
    Support user reported issues & issues. Monitor alerts and take necessary action. Actively participate in WAR room for critical issues.
  • Collaboration & Documentation:
    Create and maintain documentation for operational procedures, configurations, and troubleshooting guides.
Tech Stack
  • Baremetal data center machine management tools like IPMI, Redfish, KVM etc.
  • Automation using Jenkins, Python, Go, Bash.
  • Infrastructure tools like Kubernetes, MySQL, Prometheus, Grafana and ELK.
  • Any familiarity with Nvidia hardware like GPU & Tegras is a plus.

Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary