Sr. DevOps Engineer
Listed on 2025-12-01
-
IT/Tech
Systems Engineer
Overview
What we're doing isn't easy, but nothing worth doing ever is. We envision a future powered by robots that work seamlessly with human teams. We build artificial intelligence that enables service robots to collaborate with people and adapt to dynamic human environments. Join our mission-driven, venture-backed team as we build out current and future generations of humanoid robots.
MLOps Engineer works with engineering teams, IT, and Security to address business challenges through comprehensive solutions while considering system uptime, reliability, and maintainability. Instrument and monitor the breadth of our full platform stack (hosts, applications, and performance). You will work closely with our engineering and information security teams to enhance automated system provisioning and deployment subsystems within codified infrastructure. You will collaborate with developers to create more robust and scalable services independent of cloud implementations.
You will help isolate, trap, and respond to system failures and develop strategies for continuous monitoring and analysis to reduce downtime and required manual intervention. You will participate in on-call rotation to maintain platform SLAs.
- Analyze our current operational toolset for shortcomings and product improvements; provide and implement recommendations.
- Create, configure and maintain cloud-based infrastructure and services for the rapid development and monitoring of complex robotics and analytics applications.
- Build tools to automate monitoring and management of robot fleets.
- Build tools to automate and improve ML Ops tooling and workflow.
- Build tools to automate and improve data workflows for ML training and simulation.
- Triage issues as they arise, both on robots and in deployed software.
- Automate common operations to allow Diligent's robotic fleet to scale exponentially.
- Collaborate with the software engineering team to improve the organization’s SDLC process and minimize time from code-complete to production.
- Mentor engineers in SRE best practices and modern software engineering.
- Occasional off-hours on-call work required.
- Bachelor's degree in Computer Science, related field, or equivalent experience
- 5+ years of combined experience in MLOps, Dev Ops or Software Engineering or related technical roles
- Deep experience in modern cloud infrastructure (AWS, Azure, GCP) especially managed ML/AI services
- Experience with modern data stores at small to medium scale (Firestore, Redshift, Postgres, Mongo, distributed queues like Kafka, Mosquitto
MQ) - Experience automating system provisioning, configuration, and Infrastructure as Code (Terraform, Ansible, etc)
- Management of hosting environment, including database administration and scaling an application to support load changes
- Experience soliciting systems requirements, designing, and implementing new platform components leveraging infrastructure or SaaS services
- Experience working with distributed, fault tolerant systems
- Experience with converting monolithic applications to microservices and service discovery technology
- Solid Linux skills and proficiency in at least one high-level language (e.g., Python)
- Experience working in an agile development lifecycle
- Mid-Senior level
- Full-time
- Engineering and Information Technology
- Hospitals and Health Care
Austin, TX — $100,000 - $150,000
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).