'Tacotron2 traning new languages for speech synthesis using Pytorch
I wanted to see if it's possibe to train the Tacotron2 model for languages other than English (LJ Speech Dataset) using Pytorch.
If so, how do I train the model for a completely new language? What are the steps that I need to make, and is it documented anywhere so I could be able to follow steps on how to do it? And what should I need in order to train it other than audio samples and their text equivalent?
Solution 1:[1]
It is definitely possible. I wouldn't try with less than 10 hours of parallel data though (see here: http://aidanpine.ca/static/cv/pdfs/acl2022.pdf). FastSpeech2 will be better on less data.
Here is a good Tacotron2 implementation to use with a description of the steps needed: https://github.com/NVIDIA/tacotron2/issues/321#issuecomment-603894212
Here is a configuration template for using a new language with FastSpeech2: https://github.com/roedoejet/FastSpeech2/tree/master/config/YourLanguage
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | A. Pine |