62 resultados para 612


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.

Relevância:

10.00% 10.00%

Publicador: