More jobs:
Job Description & How to Apply Below
Sony Research India is offering outstanding career opportunities around frontline technologies such as AI and data analytics.
What we are looking for:
Sony Research India is seeking a dynamic and motivated Speech Synthesis Consultant to join our research team. In this role, you will work on real-world challenges in Text-to-Speech (TTS) and speech generation, focusing on multilingual and emotionally expressive speech synthesis. You will work with state-of-the-art neural TTS and Speech
LLM-based TTS models and contribute to improving naturalness, emotional expressiveness, and cross-lingual generalization in synthesized speech. The role involves hands-on experimentation with modern deep learning architectures and collaboration with researchers and engineers on impactful speech AI projects.
Key Responsibilities:
Develop and evaluate techniques for multilingual, cross-lingual and emotionally expressive speech synthesis.
Improve naturalness, prosody, and emotional cont rollability in neural TTS systems.
Experiment with recent TTS architectures such as Speech LLM-based, flow-based, diffusion-based speech models.
Conduct experiments on large-scale speech datasets and evaluate synthesis quality using objective and subjective metrics.
Implement and extend open-source TTS frameworks and research repositories.
Contribute to research publications, technical reports, or open-source tools.
Support business-related tasks on a day-to-day basis as required.
Work Location:
Remote within India.
Duration of the paid one year contract:
The annual paid direct contractual tenure is extendable.
Ideally this position will start from April first week of 2026.
The working hours are from 9:00 to 18:00 (Monday to Friday) full-time.
Qualifications:
Master’s degree (Research) with some industry experience in deep learning or machine learning, or a PhD candidate in the final stage of their program.
Hands-on experience with speech synthesis, speech processing, or generative AI for speech.
Must-Have kills:
Strong programming skills in Python and PyTorch.
Experience with speech processing and speech synthesis libraries.
Familiarity with SOTA TTS architectures.
Understanding of prosody modeling, speech representations, and neural audio generation.
Ability to read and implement academic research papers.
Strong foundation in machine learning, signal processing, and deep learning.
Good-to-Have
Skills:
Experience with multilingual or cross-lingual TTS systems.
Experience with emotional speech synthesis or style transfer in speech.
Familiarity with diffusion models or generative models for speech.
Experience with voice cloning or speaker adaptation techniques.
Publications in conferences such as ICASSP, Interspeech, NeurIPS, ICLR, AAAI, or ACL.
Our Values:
Dreams & Curiosity: Pioneer the future with dreams and curiosity.
Diversity: Pursue the creation of the very best by harnessing diversity and varying viewpoints.
Integrity & Sincerity: Earn the trust for Sony brand through ethical and responsible conduct.
Sustainability: Fulfil our stakeholder responsibilities through disciplined business practices.
Sony Research India is committed to equal opportunity in all its employment practices, policies and procedures and to ensuring that no worker or potential worker will receive less favourable treatment due to any characteristic protected under applicable local laws.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×