Job Description & How to Apply Below
This role is both strategic and hands‑on — responsible for designing scalable architectures, improving automation, ensuring system reliability, and leading the Tech Ops team.
Key Responsibilities Architect and manage secure, scalable, and highly available infrastructure on AWS.
Design multi-account AWS environments using AWS Organizations.
Implement VPC architecture, IAM policies, networking, and security best practices.
Oversee EC2, ECS/EKS, Lambda, RDS, S3, Cloud Front, and related AWS services.
Optimize AWS cost management and resource utilization.
2. Reliability & Production Operations Implement Site Reliability Engineering (SRE) best practices.
Define SLIs, SLOs, and error budgets.
Manage monitoring and alerting (Cloud Watch, Datadog, Prometheus, Grafana).
Lead incident response, root cause analysis (RCA), and postmortems.
Ensure 24/7 uptime and operational resilience.
3. Security & Compliance Implement IAM best practices and least‑privilege access controls.
Manage secrets and key management (AWS KMS, Secrets Manager).
Conduct vulnerability management and patching.
Support compliance initiatives (SOC 2, ISO 27001, GDPR as applicable).
Lead disaster recovery planning and backup strategies.
4. Leadership & Strategy Lead and mentor a team of Dev Ops / Tech Ops/Tools/ Service Now / Okta engineers.
Establish operational KPIs and performance benchmarks.
Manage on‑call rotations and escalation processes.
Collaborate with Engineering, Product, Security, and Data teams.
Contribute to long‑term infrastructure strategy and cloud roadmap.
Required Qualifications Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
10+ years in Dev Ops, Cloud Engineering, or Infrastructure roles.
3+ years leading technical teams.
Strong hands‑on experience with AWS services (EC2, EKS, RDS, S3, IAM, VPC, Lambda).
Deep knowledge of networking, Linux systems, and distributed systems.
Experience with Infrastructure-as-Code (Terraform or Cloud Formation).
Strong scripting skills (Python, Bash, or similar).
Experience with containerization (Docker) and Kubernetes (EKS preferred).
Strong architectural thinking
Hands‑on technical leadership
Crisis and incident management
Strategic planning and execution
Excellent cross‑functional communication
Success Metrics Reduced deployment lead time
Reduced incident frequency and MTTR
Improved cost efficiency
High‑performing and scalable Tech Ops function
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×