Sr. DevOps/AI Engineer
Listed on 2026-02-28
-
IT/Tech
Cloud Computing, Systems Engineer, Data Engineer, IT Project Manager
Location: Iselin
Are you ready to shape the future of authentication? Join 1
Kosmos and help lead the next wave in identity assurance and passwordless innovation.
1
Kosmos is driving the future of identity security, empowering organizations to eliminate passwords and establish trust at every step of the identity lifecycle. As a vibrant team of innovators, we develop advanced authentication solutions trusted by some of the world’s leading brands. Join us as we create a passwordless world and set new standards for digital identity assurance.
Your primary responsibilities are to design, build and scale solutions that power our custom agent/LLM integrations for our most important customers. You will also help scale complex workloads like building agents, LLM Orchestration, vector databases, and event-driven systems.
This will involve automating our processes, ensuring system reliability, integrating CI/CD pipelines, and collaborating closely with engineering and product teams to optimize the delivery of our cutting-edge solutions. You will be instrumental in bridging the gap between development and operations, fostering a culture of collaboration and continuous improvement.
Key Responsibilities:
- Design, implement, and manage CI/CD pipelines for automated testing and deployment of applications.
- Build internal developer platforms solutions that streamline CI/CD, environment provisioning, and observability across teams.
- Monitor systems and applications for performance, availability, and reliability, troubleshooting and resolving issues as they arise.
- Develop automation scripts (Python, Bash, Go) to streamline the entire lifecycle of custom integration services, from creation to decommissioning.
- Architect for fault tolerance, auto-scaling, and zero-downtime deployments for distributed microservices and AI pipelines.
- Manage a Kubernetes, Helm, and service mesh (istio) environment tailored for hosting a multitude of diverse integration services, focusing on security, isolation, and resource management, and resiliency
- Collaborate with development teams to improve software deployment and development processes.
- Manage infrastructure in cloud environments (AWS, Azure, GCP, OCI) and ensure best practices for security and performance.
- Build and optimize agents, LLM workflows, caching strategies, and retrieval pipelines for low-latency inference.
- Own and extend Terraform/Crossplane configurations to standardize provisioning across environments.
- Prepare and maintain documentation of systems, processes, and configurations.
- Stay up to date with the latest Dev Ops tools, techniques, and trends to support continuous improvement in our operations.
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in a Dev Ops role or similar capacity.
- Strong experience with major cloud providers (AWS, Azure, GCP, or OCI).
- Proficiency in scripting languages such as Python, Bash, or Go.
- Proven experience creating flexible CI pipelines with tools like Jenkins, Git Hub Actions, Harness, and CD/Git Ops workflows with tools like Argo CD.
- Extensive hands-on with containerization and orchestration, specifically Kubernetes, Helm, and Docker.
- Strong proficiency in Infrastructure as Code tools (Terraform highly preferred; Pulumi, Cloud Formation, or similar)
- Exposure to AI/ML or data-intensive systems, including model serving, vector databases, or RAG pipelines.
- Knowledge of networking, service mesh, and security controls in production environments.
- Experience with monitoring tools (Grafana, New Relic, Datadog) and incident response.
- Excellent problem-solving skills and a proactive attitude towards improving processes.
- Strong debugging and performance tuning skills; ability to reason about failure modes and resilience.
- Strong communication skills and ability to work collaboratively in a team environment.
- Experience in identifying and implementing security best practices in a Dev Ops environment.
- Knowledge of database technologies (SQL, No
SQL) and associated best practices. - Background in Agile methodologies and working in Agile teams.
- Cutting-Edge Tech Stack:
Build with decentralized identity protocols, Fed Ramp High, FIDO2-certified cryptography, and NIST-compliant biometric systems.
- Accelerated Growth:
Receive annual stipends for certifications and attend key conferences like Identiverse or EIC. - Ownership & Impact:
We move fast and will enable you to make a big impact with large customers in US & Canada. - Flexibility First:
Unlimited PTO, and 2 days WFH
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).