'Tacotron2 traning new languages for speech synthesis using Pytorch

I wanted to see if it's possibe to train the Tacotron2 model for languages other than English (LJ Speech Dataset) using Pytorch.

If so, how do I train the model for a completely new language? What are the steps that I need to make, and is it documented anywhere so I could be able to follow steps on how to do it? And what should I need in order to train it other than audio samples and their text equivalent?



Solution 1:[1]

It is definitely possible. I wouldn't try with less than 10 hours of parallel data though (see here: http://aidanpine.ca/static/cv/pdfs/acl2022.pdf). FastSpeech2 will be better on less data.

Here is a good Tacotron2 implementation to use with a description of the steps needed: https://github.com/NVIDIA/tacotron2/issues/321#issuecomment-603894212

Here is a configuration template for using a new language with FastSpeech2: https://github.com/roedoejet/FastSpeech2/tree/master/config/YourLanguage

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 A. Pine