DevOps Architect
Listed on 2026-03-09
-
IT/Tech
Systems Engineer, Cloud Computing, Cybersecurity, IT Support
Monitoring & Observability Architect (Tools Architect)
Location: Raritan, NJ
About the RoleWe are seeking a highly experienced Monitoring & Observability Architect (Tools Architect) to design and mature our enterprise observability ecosystem across distributed, cloud, hybrid, and on‑premise systems. This role, ideal for a technical leader who excels at building end‑to‑end visibility across platforms and enabling engineering teams with actionable insights for reliability and performance.
You will architect solutions for metrics, logs, traces, APM, RUM, synthetics
, and distributed systems observability. You will define best practices, select tools, implement instrumentation, and work with cross‑functional teams to weave observability into CI/CD pipelines, Kubernetes platforms, and mission‑critical applications.
This role is perfect for someone with a passion for scalable observability
, cloud‑native monitoring
, and enabling Dev Ops, SRE, and Product teams with high‑quality telemetry that speeds incident resolution and improves customer experience.
1) Architecture & Strategy
- Define an enterprise‑wide observability strategy aligned with business and technology goals.
- Develop reference architectures and governance frameworks for monitoring and telemetry across multi‑cloud and hybrid systems.
- Standardize how metrics, logs, traces, and events are collected, processed, visualized, and alerted on.
- Evaluate, recommend, and rationalize observability tools to reduce redundancy, optimize spending, and enhance coverage.
- Build and communicate an observability roadmap focused on maturity, operational excellence, and business impact.
- Implement and operate observability tools such as:
- Commercial: Datadog, Splunk (incl. Observability Cloud), Dynatrace, New Relic, App Dynamics.
- Design and deploy centralized logging pipelines, indexing strategies, and log retention policies.
- Build reusable dashboard frameworks and standardized visualization templates.
- Establish instrumentation guidelines for developers, platform teams, and SREs.
- Embed observability tooling directly into CI/CD pipelines for testing, validation, and telemetry‑driven release quality.
- Enable IaC‑based monitoring management using Terraform, Cloud Formation, ARM/Bicep
, and Ansible
. - Architect robust Kubernetes observability including node, cluster, pod, service, mesh, ingress, storage, and autoscaling insights.
- Instrument microservices, serverless platforms, APIs, messaging systems, databases, and distributed workloads.
- Ensure comprehensive observability coverage across AWS, Azure, and GCP with cloud‑native insights.
- Partner with SRE and engineering teams to define SLIs, SLOs, SLAs
, and error budgets. - Build intelligent alerting strategies to reduce noise while increasing actionable detection.
- Support major incident response and drive deep‑dive root cause analysis (RCA).
- Improve MTTR/MTTD through dashboarding, correlation, enrichment, and telemetry‑driven triage processes.
- Develop runbooks, playbooks, auto‑remediation workflows, and operational guardrails.
- Ensure observability platforms align with US compliance standards including SOC 2, PCI, HIPAA, and Fed Ramp (as applicable).
- Implement RBAC, encryption, data masking, and secure logging practices.
- Establish guardrails for handling sensitive US data (PII/PHI/PCI).
- Integrate observability systems with SIEM platforms where necessary.
- Work closely with engineering teams, product owners, Dev Ops, SRE, Info Sec, and business stakeholders.
- Serve as a mentor and coach for engineers adopting observability tooling and best practices.
- Provide architectural oversight for telemetry integration into application and platform designs.
- Present designs, findings, and roadmaps to senior leadership and cross‑functional teams.
- Monitor ingestion volumes, retention policies, data tiering, and storage costs.
- Optimize dashboards, high‑cardinality metrics, and observability pipelines to manage cost-to‑value.
- Conduct vendor evaluations and help negotiate licensing agreements based on usage patterns.
- 10+ years i…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).