Speech Generation Consultant Job Delhi area,Delhi India,IT/Tech

Sony Research India is driving cutting-edge research and development in various locations around the globe, including laboratories in Japan, the United States, Europe, and Asia. We endeavor to create new technology, products, and services while sustaining Sony Group’s diverse businesses in electronics, entertainment, and financial fields. For our research Centre to blaze a trail in the latest technologies, we seek to foster the growth of a diverse pool of research and engineering talent and create a technology talent bank to drive research excellence worldwide.

Sony Research India is offering outstanding career opportunities around frontline technologies such as AI and data analytics.

What we are looking for:
Sony Research India is seeking a dynamic and motivated Speech Synthesis Consultant to join our research team. In this role, you will work on real-world challenges in Text-to-Speech (TTS) and speech generation, focusing on multilingual and emotionally expressive speech synthesis. You will work with state-of-the-art neural TTS and Speech

LLM-based TTS models and contribute to improving naturalness, emotional expressiveness, and cross-lingual generalization in synthesized speech. The role involves hands-on experimentation with modern deep learning architectures and collaboration with researchers and engineers on impactful speech AI projects.

Key Responsibilities:

Develop and evaluate techniques for multilingual, cross-lingual and emotionally expressive speech synthesis.
Improve naturalness, prosody, and emotional cont rollability in neural TTS systems.
Experiment with recent TTS architectures such as Speech LLM-based, flow-based, diffusion-based speech models.
Conduct experiments on large-scale speech datasets and evaluate synthesis quality using objective and subjective metrics.
Implement and extend open-source TTS frameworks and research repositories.
Contribute to research publications, technical reports, or open-source tools.
Support business-related tasks on a day-to-day basis as required.

Work Location:

Remote within India.

Duration of the paid one year contract:
The annual paid direct contractual tenure is extendable.
Ideally this position will start from April first week of 2026.
The working hours are from 9:00 to 18:00 (Monday to Friday) full-time.

Qualifications:

Master’s degree (Research) with some industry experience in deep learning or machine learning, or a PhD candidate in the final stage of their program.
Hands-on experience with speech synthesis, speech processing, or generative AI for speech.

Must-Have kills:
Strong programming skills in Python and PyTorch.

Experience with speech processing and speech synthesis libraries.
Familiarity with SOTA TTS architectures.
Understanding of prosody modeling, speech representations, and neural audio generation.
Ability to read and implement academic research papers.
Strong foundation in machine learning, signal processing, and deep learning.

Good-to-Have

Skills:

Experience with multilingual or cross-lingual TTS systems.

Experience with emotional speech synthesis or style transfer in speech.
Familiarity with diffusion models or generative models for speech.

Experience with voice cloning or speaker adaptation techniques.
Publications in conferences such as ICASSP, Interspeech, NeurIPS, ICLR, AAAI, or ACL.

Our Values:

Dreams & Curiosity: Pioneer the future with dreams and curiosity.
Diversity: Pursue the creation of the very best by harnessing diversity and varying viewpoints.
Integrity & Sincerity: Earn the trust for Sony brand through ethical and responsible conduct.
Sustainability: Fulfil our stakeholder responsibilities through disciplined business practices.

Sony Research India is committed to equal opportunity in all its employment practices, policies and procedures and to ensuring that no worker or potential worker will receive less favourable treatment due to any characteristic protected under applicable local laws.


Increase/decrease your Search Radius (miles)



Job Posting Language