The Unsupervised Acquisition of a Lexicon from Continuous Speech


Autoria(s): Marcken, Carl de
Data(s)

20/10/2004

20/10/2004

18/01/1996

Resumo

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.

Formato

27 p.

310643 bytes

555774 bytes

application/postscript

application/pdf

Identificador

AIM-1558

CBCL-129

http://hdl.handle.net/1721.1/7191

Idioma(s)

en_US

Relação

AIM-1558

CBCL-129

Palavras-Chave #AI #MIT #Artificial Intelligence #induction #unsupervised learning #language acquisition #lexical acquisition #continuous speech