Multilingual number transcription for text-to-speech conversion


Autoria(s): San Segundo Hernández, Rubén; Montero Martínez, Juan Manuel; Giurgiu, M.; Muresan, I.; King, Simon
Data(s)

2013

Resumo

This paper describes the text normalization module of a text to speech fully-trainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrase-based translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect in the text normalization problem.

Formato

application/pdf

Identificador

http://oa.upm.es/30110/

Idioma(s)

eng

Publicador

E.T.S.I. Telecomunicación (UPM)

Relação

http://oa.upm.es/30110/1/INVE_MEM_2013_163244.pdf

info:eu-repo/semantics/altIdentifier/doi/null

Direitos

http://creativecommons.org/licenses/by-nc-nd/3.0/es/

info:eu-repo/semantics/openAccess

Fonte

8th ISCA Speech Synthesis Workshop | 8th ISCA Speech Synthesis Workshop | 31/08/2013 - 02/09/2013 | Barcelona, Spain

Palavras-Chave #Telecomunicaciones
Tipo

info:eu-repo/semantics/conferenceObject

Ponencia en Congreso o Jornada

PeerReviewed