Biblioteca Digital

**Autoria(s):** Lorenzo Trueba, Jaime; Martínez González, Beatriz; Lopez Ludeña, Veronica; Barra Chicote, Roberto; Ferreiros López, Javier; Yamagishi, J.; Montero Martínez, Juan Manuel
Data(s)	2012
Resumo	Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).
Formato	application/pdf
Identificador	http://oa.upm.es/20407/
Idioma(s)	eng
Publicador	E.T.S.I. Telecomunicación (UPM)
Relação	http://oa.upm.es/20407/1/INVE_MEM_2012_134434.pdf info:eu-repo/semantics/altIdentifier/doi/null
Direitos	http://creativecommons.org/licenses/by-nc-nd/3.0/es/ info:eu-repo/semantics/openAccess
Fonte	InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association \| InterSpeech 2012 - 13th Annual Conference of the International Speech Communication Association \| 09/09/2012 - 13/09/2012 \| Portland, Oregon
Palavras-Chave	#Telecomunicaciones
Tipo	info:eu-repo/semantics/conferenceObject Ponencia en Congreso o Jornada PeerReviewed

Acesso ao item digital