×
Register Here to Apply for Jobs or Post Jobs. X

DevOps Engineer Lead

Job in Richmond, Henrico County, Virginia, 23214, USA
Listing for: Tek Spikes
Full Time position
Listed on 2026-01-15
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Job Title:

Dev Ops Engineer - Lead

Job
-1, 94329-1 & 94503-1

Only-EX-Capital one ,C2C

Client:
Capital One

Location:

15075 Capital One Drive Richmond, VA 23238 (Hybrid)

Duration: 12+ Months with possible of extension

Key Skills & Tools:

Observability Tools:
Proficiency in monitoring, logging, and tracing tools, including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, New Relic, and cloud-native solutions like AWS Cloud Watch.

Programming

Languages:

Expertise in languages such as Python and Go for scripting and automation.

Infrastructure & Cloud Platforms:
Experience with cloud platforms (AWS, GCP, Azure) and container orchestration systems like Kubernetes.

Infrastructure as Code (IaC):
Familiarity with Terraform and Ansible for managing infrastructure and configurations.

CI/CD & Automation:
Experience with CI/CD pipelines and automation tools like Jenkins.

System & Software Engineering: A strong background in both system operations and software development.

Optimize cloud agent instrumentation, with cloud certifications being a plus.

Datadog Fundamental, APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory)

Strong understanding of Observability concepts (Logs, Metrics, Tracing)

Expertise in security & vulnerability management in observability

Possesses 2 years of experience in cloud-based observability solutions, specializing in monitoring, logging, and tracing across AWS, Azure, and GCP environments.

Job Description:

Design & Implement Solutions:
Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces.

System Instrumentation:
Instrument applications, infrastructure, and services to collect telemetry data using frameworks like Open Telemetry.

Data Analysis & Visualization:
Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues.

Collaboration:

Work with development, SRE, and Dev Ops teams to integrate observability best practices and align monitoring with business and operational goals.

Automation:
Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection.

Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services.

Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring.

Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications.

Configure and integrate Datadog with third-party services such as Service Now, SSO enablement, and other

ITSMtools.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary