Member of Technical Staff,Synthetic Data Job NT Canada,Software Development

Member of Technical Staff, Synthetic Data

5 days ago Be among the first 25 applicants

About Cohere

Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises that are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we build and each of us is responsible for contributing to increasing our models’ capabilities.

We like to work hard and move fast to best serve our customers.

Cohere is a team of researchers, engineers, designers, and more, focused on building great products. Diversity of perspectives is required for success.

Why this role?

As a Machine Learning Engineer specializing in synthetic data, you will develop the synthetic data pipeline that is crucial to Cohere’s advanced language models. You will manage end-to-end synthetic data, maintain and optimize the pipeline, conduct data ablations and model evaluation to gauge data quality, and transform web and code data using generative models to improve token efficiency and model quality.

You will bridge raw data and cutting‑edge AI models, contributing directly to improvements in throughput and accelerator utilization.

Responsibilities

Design and build scalable inference pipelines that run on large GPU clusters.
Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance.
Research and implement innovative synthetic data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing.
Collaborate with cross‑functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting‑edge language models.

Who you are

Strong software engineering skills, with proficiency in Python and experience building data pipelines.
Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.
Experience working with LLMs through projects, open-source contributions or personal experimentation.
Familiarity with LLM inference frameworks such as vLLM and Tensor

RT.
Experience working with large-scale datasets, including web data, code data, and multilingual corpora.
A passion for bridging research and engineering to solve complex data‑related challenges in AI model training.

Bonus: paper at top‑tier venues (e.g., NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).

We value and celebrate diversity and strive to create an inclusive work environment for all. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form.

Remote‑friendly

We have offices in London, Paris, Toronto, San Francisco, and New York but also embrace remote work. No restrictions on where you can be located, within the EST and EU windows.

Perks for Full‑Time Employees

🤝 Open and inclusive culture
🧑💻 Work closely with a team on the cutting edge of AI research
🍽 Weekly lunch stipend, in‑office lunches & snacks
🦷 Full health and dental benefits, including a separate budget for mental health
🐣 100% parental leave top‑up for up to 6 months
🎨 Personal enrichment benefits towards arts and culture, fitness and well‑being, quality time, and workspace improvement
🏙 Remote‑flexible, offices in Toronto, New York, San Francisco, London and Paris, plus a co‑working stipend
✈️ 6 weeks of vacation (30 working days)

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language