Whispered and Lombard neural speech synthesis
These samples are recorded by one of the authors. Readers can create your own whisper/lombard voice by the method introduced in the paper.
Natural samples from normal, Lombard and whisper style voices
Generated samples from adaptation
Normal style
Lombard style
Whisper style
Whisper style from post-processing of Normal style
Generated samples from SV model
Normal style
Lombard style
Whisper style
Generated samples for testing intelligibility
Normal style
Normal style with postprocessing
Lombard style
Lombard style with postprocessing
Neural Text-to-Speech Adaptation from Low Quality Public Recordings
Due to confidentiality reasons, we cannot put samples from the paper on the webpage.
Here another speaker's voice is put in this link using similar adaptation methods.
The character recording was based on a set of conversational speech recorded by Voice Memo from iPhone at a sampling frequency of 16kHz and then denoised. 1.5 hours speech were collected in total, which is in the same amount of speech for the character from the paper. For efficiency reason, texts are recognized by ASR system. A multi-speaker Tacotron system is first trained based on VCTK corpus, then the model is fine-tuned to learn the target speaker's voice. Speaker representation derived from the Personalized Hey Siri (PHS) system is applied here.Samples from adaptation
Last updated: 30th April, 2020