2 resultados para morphological information

em Aston University Research Archive


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We describe the case of a dysgraphic aphasic individual-S.G.W.-who, in writing to dictation, produced high rates of formally related errors consisting of both lexical substitutions and what we call morphological-compound errors involving legal or illegal combinations of morphemes. These errors were produced in the context of a minimal number of semantic errors. We could exclude problems with phonological discrimination and phonological short-term memory. We also excluded rapid decay of lexical information and/or weak activation of word forms and letter representations since S.G.W.'s spelling showed no effect of delay and no consistent length effects, but, instead, paradoxical complexity effects with segmental, lexical, and morphological errors that were more complex than the target. The case of S.G.W. strongly resembles that of another dysgraphic individual reported in the literature-D.W.-suggesting that this pattern of errors can be replicated across patients. In particular, both patients show unusual errors resulting in the production of neologistic compounds (e.g., "bed button" in response to "bed"). These patterns can be explained if we accept two claims: (a) Brain damage can produce both a reduction and an increase in lexical activation; and (b) there are direct connections between phonological and orthographic lexical representations (a third spelling route). We suggest that both patients are suffering from a difficulty of lexical selection resulting from excessive activation of formally related lexical representations. This hypothesis is strongly supported by S.G.W.'s worse performance in spelling to dictation than in written naming, which shows that a phonological input, activating a cohort of formally related lexical representations, increases selection difficulties. © 2014 Taylor & Francis.