Biblioteca Digital

1 resultado para Speech segmentation

em Universidad Politécnica de Madrid

A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.

Veja mais

1 resultado para Speech segmentation

em Universidad Politécnica de Madrid

Filtro por publicador

A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis