Infrastructure/DevOps Engineer
Listed on 2026-01-24
-
IT/Tech
Systems Engineer, Cloud Computing, Cybersecurity, SRE/Site Reliability
Job Description
Veritas Automata is a technology consulting and software development company dedicated to delivering innovative solutions that drive business success. We combine expertise in automation, AI, Dev Ops, and advanced distributed systems to enhance operational efficiency and streamline complex processes. Our teams build modern, intelligent, and scalable solutions that empower clients across regulated industries, enterprise platforms, and next-generation cloud-native ecosystems. We are committed to innovation, ownership, and measurable outcomes that elevate our clients and partners.
Job Description
The Infrastructure Engineer - Dev Ops Engineer (L5) is a senior technical engineer responsible for architecting, implementing, and optimizing cloud and on-premises infrastructure that supports distributed, Kubernetes-based, and high-availability systems. With 5–8 years of experience, this role provides deep technical leadership across infrastructure automation, observability, networking, platform reliability, and Dev Sec Ops enablement.
This role partners closely with software engineering, platform engineering, SRE, security, and product teams to deliver stable, performant, and scalable infrastructure environments. The Dev Ops Engineer (L5) leads complex technical implementations, removes blockers, and ensures the organization’s infrastructure foundation can support growth, resilience, and innovation.
Core Responsibilities- Design, deploy, and maintain Kubernetes clusters (K3s, RKE2, AKS, EKS, GKE) across cloud and hybrid environments.
- Implement infrastructure-as-code solutions using Terraform, Pulumi, Ansible, or equivalent automation tools.
- Engineer secure, scalable networking architectures including VPCs, subnets, VPNs, firewalls, service meshes, load balancers, and cross-region connectivity.
- Architect and maintain CI/CD pipelines, Git Ops tooling, and automated delivery workflows using Git Hub Actions, ArgoCD, Flux, or Git Lab CI.
- Configure and operate observability platforms including Prometheus, Grafana, Loki, Tempo, Open Telemetry, and Thanos for full-stack visibility.
- Collaborate with SRE and platform teams to improve reliability, reduce operational toil, and optimize performance and cost.
- Implement and maintain cloud security best practices including IAM, RBAC, secrets management, encryption, and compliance controls.
- Participate in on-call rotation, incident response, and root cause analysis for platform-related production issues.
- Develop and document runbooks, architecture diagrams, operational standards, and troubleshooting guides.
- Mentor junior engineers and contribute to capability-building around modern infrastructure practices.
- 5–8 years of experience in infrastructure engineering, Dev Ops, SRE, platform engineering, or cloud operations.
- Hands-on experience with Kubernetes cluster administration, operators, workloads, storage, and networking.
- Strong proficiency with infrastructure-as-code, cloud provisioning, and automated configuration management.
- Deep understanding of at least one major cloud platform (AWS, Azure, or GCP) including compute, networking, IAM, and managed services.
- Experience deploying and managing observability stacks including metrics, logs, traces, dashboards, and alerting.
- Familiarity with containers, networking, service meshes, ingress controllers, and distributed application architectures.
- Strong scripting abilities (Bash, Python, Power Shell) for automation and operational efficiency.
- Bachelor’s degree in Engineering, Computer Science, or equivalent practical experience.
- Experience supporting regulated or high-compliance environments such as healthcare, life sciences, financial services, or critical infrastructure.
- Exposure to edge computing, multi-cluster orchestration, zero-trust networking, and global failover routing.
- Experience with storage platforms like Longhorn, Ceph, EBS, Azure Disk, or GCP Persistent Disks.
- Background in SRE practices including SLO/SLI design, error budgets, and resilience engineering.
- Certifications in cloud technologies, Kubernetes (CKA/CKS/CKAD), or Dev Ops methodologies.
- Kubevirt or OLVM (Oracle Linux…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).