Biblioteca Digital

**Autoria(s):** Gallardo Antolín, Ascensión; Montero Martínez, Juan Manuel; King, Simon
Data(s)	2014
Resumo	Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.
Formato	application/pdf
Identificador	http://oa.upm.es/37500/
Idioma(s)	eng
Publicador	E.T.S.I. Telecomunicación (UPM)
Relação	http://oa.upm.es/37500/1/INVE_MEM_2014_193698.pdf info:eu-repo/grantAgreement/EC/FP7/287678
Direitos	http://creativecommons.org/licenses/by-nc-nd/3.0/es/ info:eu-repo/semantics/openAccess
Fonte	Proceedings 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) \| 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) \| 14/09/2014 - 18/09/2014 \| Singapore
Palavras-Chave	#Telecomunicaciones
Tipo	info:eu-repo/semantics/conferenceObject Ponencia en Congreso o Jornada PeerReviewed

Acesso ao item digital