×
Register Here to Apply for Jobs or Post Jobs. X

Sr. Dev-ops Engineer

Job in Sunnyvale, Santa Clara County, California, 94087, USA
Listing for: The New York Consulting Group
Full Time position
Listed on 2025-12-01
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below

Responsibilities

  • Ensure system reliability and availability – Monitor system issues, create strategies to detect issues, address those issues, design automated systems to troubleshoot, write and review post-mortems.
  • Mitigate Operational risks – Collaborate with development teams and other stakeholders to identify potential risks, perform risk assessments, implement risk mitigation strategies, continuously monitor and review the effectiveness of risk strategies.
  • Monitor system health.
  • Minimize emergency response (MTTR).
  • Maintain CI/CD pipelines, etc.
  • Continuous improvement by collaborating with various teams.
  • Automation of processes.
Must have/Required

Experience and Skills:
  • 8+ years of experience on Dev Ops and Site Reliability Engineering.
  • Hands-on with containerization and orchestration:
    Docker, Kubernetes/EKS.
  • Proficiency in infrastructure as code tools:
    Terraform, Ansible, or Cloud Formation.
  • Experience setting up and managing services running on Kubernetes.
  • In-depth understanding of SRE principals including monitoring, alerting, error budgets, fault analysis, and automation.
  • In-depth knowledge of monitoring and observability tools:
    Apache Splunk
  • Knowledge of Linux operating system principles, networking fundamentals, and systems management
  • Demonstrable fluency in at least one of the following languages:
    Java or Python
  • Ability to identify and communicate technical and architectural problems, while working with partners and their team to iteratively find solutions.
  • Building and managing CI/CD pipeline – gatekeeping production deployments, develop and implement GIT branching strategies, branch protection rules, network policies, scale up/scale down the load on AWS.
  • Strong problem-solving and analytical skills
  • Solve performance issues and scalability issues in the system.
Technical

Skills:
  • Dev Ops and SRE
  • AWS Kubernetes/EKS, Docker
  • Terraform, Ansible, or Cloud Formation
  • Apache Splunk, Apache Flink
  • Programming/Scripting using Java or Python
  • CI/CD
  • Database – Vertica, Snowflake.
Behavioral

Skills:
  • Excellent Communication skills and collaboration skills
  • Ability to propose and implement improvements in the system
  • Ability to work with cross-functional stakeholders
  • Adaptability and a willingness to learn new technologies and techniques.
  • Proactive approach to issues, ability to provide prompt resolution/work around.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary