Guessers for Finite-State Transducer Lexicons


Autoria(s): Lindén, Krister
Contribuinte(s)

University of Helsinki, Department of Modern Languages

Data(s)

01/03/2009

Resumo

Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.

Formato

12

Identificador

http://hdl.handle.net/10138/29365

Idioma(s)

eng

Relação

Computational Linguistics and Intelligent Text Processing 10th International Conference, CICLing 2009

Fonte

Lindén , K 2009 , ' Guessers for Finite-State Transducer Lexicons ' in Computational Linguistics and Intelligent Text Processing : 10th International Conference, CICLing 2009 , pp. 158-169 .

Palavras-Chave #612 Languages and Literature #113 Computer and information sciences
Tipo

A4 Article in conference publication (refereed)

info:eu-repo/semantics/conferencePaper

http://purl.org/eprint/status/NonPeerReviewed