ML Infrastructure Engineer Job Vancouver area,BC Canada,IT/Tech

Later is the world’s most intelligent influencer marketing company, built to give brands the confidence to create unforgettable campaigns. By combining real creator relationships, trusted intelligence, and expert guidance, Later removes fear and guesswork from one of marketing’s most visible investments. Built on a native, AI-powered platform and more than a decade of proprietary data—including billions of social interactions, impressions, and $2.4B+ in verified influencer-driven purchases—Later helps teams understand what will work before they launch.

By combining trusted insight with expert guidance, Later removes guesswork from influencer marketing, enabling brands to choose the right creators, execute fully managed campaigns, and drive meaningful growth across awareness, engagement, and revenue. Trusted by leading enterprise brands including Nike, Wayfair, Unilever, and Southwest Airlines, Later bridges creativity and performance so campaigns don’t just look good—they deliver results. Learn more at

About this position:

We’re looking for a Machine Learning Infrastructure Engineer to join our growing Data & Platform team and build the foundation that powers our AI and machine learning capabilities across Later’s product portfolio. As our first dedicated ML Infrastructure Engineer, you will own the systems that support model experimentation, training, deployment, and monitoring at scale.

This role is critical to accelerating our data science initiatives and enabling future AI innovation. You’ll design and operate reliable, secure, and scalable ML infrastructure that empowers data scientists and engineers to ship high-impact models with confidence. If you’re excited about building robust ML systems in a fast-moving environment—and want to define the standard for ML Ops at Later—this is your opportunity.

What

you'll be doing:

Define and own the long-term ML infrastructure roadmap, ensuring it supports both current experimentation needs and future AI initiatives.
Establish best practices for model lifecycle management, deployment standards, monitoring, and governance.
Identify infrastructure gaps and proactively design scalable solutions to enable high-velocity ML development.
Contribute to cross-functional technical planning, ensuring ML systems align with product and platform strategy.

Technical/Execution

Design, build, and maintain production-grade model deployment and inference systems using CI/CD pipelines, containerized services (Docker), and API frameworks (e.g., Flask).
Automate end-to-end ML lifecycle workflows including training pipelines, model validation, registry management, deployment, and rollback strategies.
Implement robust monitoring systems for model performance, latency, drift detection, and infrastructure health using tools such as Cloud Watch, Prometheus, and Grafana.
Operate across AWS and GCP environments to manage training and inference workloads, including GPU-based infrastructure and Big Query datasets.
Develop and maintain infrastructure-as-code (Terraform, Cloud Formation) to ensure scalable, repeatable, and secure cloud environments.
Implement and optimize CI/CD workflows (e.g., Git Hub Actions, Git Lab CI, Bitbucket Pipelines) for ML and infrastructure automation.
Partner closely with Data Scientists, Analysts, Platform Engineers, and Product Engineers to support end-to-end ML workflows.
Translate data science experimentation needs into production-ready infrastructure solutions.
Serve as the technical bridge between ML experimentation and productized deployment.
Share knowledge and best practices to elevate ML maturity across teams.

Research/Best Practices

Stay current on emerging ML Ops practices, tools, and frameworks to continuously improve system reliability and efficiency.
Evaluate and implement model-serving frameworks (e.g., Torch Serve, Seldon, Tensor

RT) where appropriate.
Contribute to governance, reproducibility, and auditability standards for ML systems.
Experiment with new tooling and workflows to improve reproducibility, performance, and developer velocity.

What success looks like:

ML models move from experimentation to production quickly and reliably, with minimal manual…


Increase/decrease your Search Radius (miles)



Job Posting Language