997 resultados para Lexical Resources


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper sets out to report on findings about features of task-specific reformulation observed in university students in the middle stretch of the Psychology degree course (N=58) and in a reference group of students from the degree courses in Modern Languages, Spanish and Library Studies (N=33) from the National University of La Plata (Argentina). Three types of reformulation were modeled: summary reformulation, comprehensive and productive reformulation.The study was based on a corpus of 621 reformulations rendered from different kinds of text. The versions obtained were categorised according to the following criteria: presence or absence of normative, morphosyntactic and semantic difficulties. Findings show that problems arise particularly with paraphrase and summary writing. Observation showed difficulties concerning punctuation, text cohesion and coherence , and semantic distortion or omission as regards extracting and/or substituting gist, with limited lexical resources and confusion as to suitability of style/register in writing. The findings in this study match those of earlier, more comprehensive research on the issue and report on problems experienced by a significant number of university students when interacting with both academic texts and others of a general nature. Moreover, they led to questions, on the one hand, as to the nature of such difficulties, which appear to be production-related problems and indirectly account for inadequate text comprehension, and on the other hand, as to the features of university tuition when it comes to text handling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper sets out to report on findings about features of task-specific reformulation observed in university students in the middle stretch of the Psychology degree course (N=58) and in a reference group of students from the degree courses in Modern Languages, Spanish and Library Studies (N=33) from the National University of La Plata (Argentina). Three types of reformulation were modeled: summary reformulation, comprehensive and productive reformulation.The study was based on a corpus of 621 reformulations rendered from different kinds of text. The versions obtained were categorised according to the following criteria: presence or absence of normative, morphosyntactic and semantic difficulties. Findings show that problems arise particularly with paraphrase and summary writing. Observation showed difficulties concerning punctuation, text cohesion and coherence , and semantic distortion or omission as regards extracting and/or substituting gist, with limited lexical resources and confusion as to suitability of style/register in writing. The findings in this study match those of earlier, more comprehensive research on the issue and report on problems experienced by a significant number of university students when interacting with both academic texts and others of a general nature. Moreover, they led to questions, on the one hand, as to the nature of such difficulties, which appear to be production-related problems and indirectly account for inadequate text comprehension, and on the other hand, as to the features of university tuition when it comes to text handling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper sets out to report on findings about features of task-specific reformulation observed in university students in the middle stretch of the Psychology degree course (N=58) and in a reference group of students from the degree courses in Modern Languages, Spanish and Library Studies (N=33) from the National University of La Plata (Argentina). Three types of reformulation were modeled: summary reformulation, comprehensive and productive reformulation.The study was based on a corpus of 621 reformulations rendered from different kinds of text. The versions obtained were categorised according to the following criteria: presence or absence of normative, morphosyntactic and semantic difficulties. Findings show that problems arise particularly with paraphrase and summary writing. Observation showed difficulties concerning punctuation, text cohesion and coherence , and semantic distortion or omission as regards extracting and/or substituting gist, with limited lexical resources and confusion as to suitability of style/register in writing. The findings in this study match those of earlier, more comprehensive research on the issue and report on problems experienced by a significant number of university students when interacting with both academic texts and others of a general nature. Moreover, they led to questions, on the one hand, as to the nature of such difficulties, which appear to be production-related problems and indirectly account for inadequate text comprehension, and on the other hand, as to the features of university tuition when it comes to text handling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Extracting opinions and emotions from text is becoming increasingly important, especially since the advent of micro-blogging and social networking. Opinion mining is particularly popular and now gathers many public services, datasets and lexical resources. Unfortunately, there are few available lexical and semantic resources for emotion recognition that could foster the development of new emotion aware services and applications. The diversity of theories of emotion and the absence of a common vocabulary are two of the main barriers to the development of such resources. This situation motivated the creation of Onyx, a semantic vocabulary of emotions with a focus on lexical resources and emotion analysis services. It follows a linguistic Linked Data approach, it is aligned with the Provenance Ontology, and it has been integrated with the Lexicon Model for Ontologies (lemon), a popular RDF model for representing lexical entries. This approach also means a new and interesting way to work with different theories of emotion. As part of this work, Onyx has been aligned with EmotionML and WordNet-Affect.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

EmotiBlog is a corpus labelled with the homonymous annotation schema designed for detecting subjectivity in the new textual genres. Preliminary research demonstrated its relevance as a Machine Learning resource to detect opinionated data. In this paper we compare EmotiBlog with the JRC corpus in order to check the EmotiBlog robustness of annotation. For this research we concentrate on its coarse-grained labels. We carry out a deep ML experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC demonstrating the EmotiBlog validity as a resource for the SA task.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the chemical textile domain experts have to analyse chemical components and substances that might be harmful for their usage in clothing and textiles. Part of this analysis is performed searching opinions and reports people have expressed concerning these products in the Social Web. However, this type of information on the Internet is not as frequent for this domain as for others, so its detection and classification is difficult and time-consuming. Consequently, problems associated to the use of chemical substances in textiles may not be detected early enough, and could lead to health problems, such as allergies or burns. In this paper, we propose a framework able to detect, retrieve, and classify subjective sentences related to the chemical textile domain, that could be integrated into a wider health surveillance system. We also describe the creation of several datasets with opinions from this domain, the experiments performed using machine learning techniques and different lexical resources such as WordNet, and the evaluation focusing on the sentiment classification, and complaint detection (i.e., negativity). Despite the challenges involved in this domain, our approach obtains promising results with an F-score of 65% for polarity classification and 82% for complaint detection.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In a group of adult dyslexics word reading and, especially, word spelling are predicted more by what we have called lexical learning (tapped by a paired-associate task with pictures and written nonwords) than by phonological skills. Nonword reading and spelling, instead, are not associated with this task but they are predicted by phonological tasks. Consistently, surface and phonological dyslexics show opposite profiles on lexical learning and phonological tasks. The phonological dyslexics are more impaired on the phonological tasks, while the surface dyslexics are equally or more impaired on the lexical learning tasks. Finally, orthographic lexical learning explains more variation in spelling than in reading, and subtyping based on spelling returns more interpretable results than that based on reading. These results suggest that the quality of lexical representations is crucial to adult literacy skills. This is best measured by spelling and best predicted by a task of lexical learning. We hypothesize that lexical learning taps a uniquely human capacity to form new representations by recombining the units of a restricted set.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Este trabalho tem por objetivo investigar as histórias de sair do armário sob seu aspecto organizacional, lexical e discursivo. Busca-se apontar que padrões organizacionais prevalecem nas histórias, que léxicos significativos predominam e como os léxicos predominantes são avaliados em termos de Afeto, Julgamento e Apreciação. Para o exame organizacional, a análise se apropria do quadro analítico de Labov e colaboradores (1967; 1972), por ser pioneiro nos estudos sobre narrativas orais de experiência pessoal; em seguida, lança mão dos elementos do Padrão Problema-Solução (PPS), proposto por Hoey (1983; 2001), por iluminar o aspecto cíclico das referidas histórias. Para a análise lexical, esse estudo se ampara nos preceitos e técnicas da investigação eletrônica de textos da Linguística de Corpus (TOGNINI BONELILI, 2001; SINCLAIR, 2004; BERBER-SARDINHA, 2004; McENERY e HARDIE, 2011), conjugado ao conjunto de programas WordSmith Tool 5.0 (SCOTT, 2010). Já sobre o aspecto discursivo, em especial sobre a linguagem da avaliação, a análise privilegiou a metafunção interpessoal da Linguística Sistêmico Funcional (Halliday, 2004) e lançou mão das categorias do subsistema da ATITUDE da Teoria da Avaliatividade, proposto por Martin (2000) e Martin e White (2005). O Corpus analisado consiste de sete narrativas, coletadas pelo método da Entrevista Narrativa, à qual se voluntariaram homossexuais do sexo masculino, entre (20) vinte a (30) anos de idade, oriundos da zona norte do Rio de Janeiro. Os resultados da análise da organização da narrativa mostraram que as histórias de sair do armário são episódicas, são contadas com muitos recursos avaliativos como descritos por Labov e se organizam por meio do Padrão Problema-Solução. Os resultados da análise lexical revelaram a predominância do item eu e mãe/ela nas sete histórias coletivamente. Por fim a análise discursiva, sob a ótica da linguagem atitudinal, aponta que os itens eu e mãe/ela, que apontam para o narrador e suas mães, são marcados por Afeto (emoções) e Julgamento (comportamento). A dissertação em seu final combina as três linhas de análise para fazer uma reflexão sobre o peso social do que significa sair do armário para o sujeito gay

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article argues that a native-speaker baseline is a neglected dimension of studies into second language (L2) performance. If we investigate how learners perform language tasks, we should distinguish what performance features are due to their processing an L2 and which are due to their performing a particular task. Having defined what we mean by “native speaker,” we present the background to a research study into task features on nonnative task performance, designed to include native-speaker data as a baseline for interpreting nonnative-speaker performance. The nonnative results, published in this journal (Tavakoli & Foster, 2008) are recapitulated and then the native-speaker results are presented and discussed in the light of them. The study is guided by the assumption that limited attentional resources impact on L2 performance and explores how narrative design features—namely complexity of storyline and tightness of narrative structure— affect complexity, fluency, accuracy, and lexical diversity in language. The results show that both native and nonnative speakers are prompted by storyline complexity to use more subordinated language, but narrative structure had different effects on native and nonnative fluency. The learners, who were based in either London or Tehran, did not differ in their performance when compared to each other, except in lexical diversity, where the learners in London were close to native-speaker levels. The implications of the results for the applicability of Levelt’s model of speaking to an L2 are discussed, as is the potential for further L2 research using native speakers as a baseline.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

No âmbito do Processamento Automático de Línguas Naturais (PLN), o desenvolvimento de recursos léxico-semânticos é premente. Ao conceber os sistemas de PLN como um exercício de engenharia da linguagem humana, acredita-se que o desenvolvimento de tais recursos pode ser beneficiado pelos modelos de representação do conhecimento, desenvolvidos pela Engenharia do Conhecimento. Esses modelos, em particular, fornecem simultaneamente o arcabouço teórico-metodológico e a metalinguagem formal para o tratamento computacional do significado das unidades lexicais. Neste artigo, após a apresentação da concepção linguístico-computacional de léxico, elucidam-se os principais paradigmas de representação do conhecimento, enfatizando a abordagem do significado e a metalinguagem formal vinculadas a cada um deles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980's. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important initiative, relying on a slightly different method of building multilingual wordnets, is the MultiWordNet project, where the key strategy is building language specific wordnets keeping as much as possible of the semantic relations available in the WN.Pr. This paper, in particular, stresses that the additional advantage of using WN.Pr lexical database as a resource for building wordnets for other languages is to explore possibilities of implementing an automatic procedure to map the WN.Pr conceptual relations as hyponymy, co-hyponymy, troponymy, meronymy, cause, and entailment onto the lexical database of the wordnet under construction, a viable possibility, for those are language-independent relations that hold between lexicalized concepts, not between lexical units. Accordingly, combining methods from both initiatives, this paper presents the ongoing implementation of the WN.Br lexical database and the aforementioned automation procedure illustrated with a sample of the automatic encoding of the hyponymy and co-hyponymy relations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a methodology for legacy language resource adaptation that generates domain-specific sentiment lexicons organized around domain entities described with lexical information and sentiment words described in the context of these entities. We explain the steps of the methodology and we give a working example of our initial results. The resulting lexicons are modelled as Linked Data resources by use of established formats for Linguistic Linked Data (lemon, NIF) and for linked sentiment expressions (Marl), thereby contributing and linking to existing Language Resources in the Linguistic Linked Open Data cloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Language resources, such as multilingual lexica and multilingual electronic dictionaries, contain collections of lexical entries in several languages. Having access to the corresponding explicit or implicit translation relations between such entries might be of great interest for many NLP-based applications. By using Semantic Web-based techniques, translations can be available on the Web to be consumed by other (semantic enabled) resources in a direct manner, not relying on application-specific formats. To that end, in this paper we propose a model for representing translations as linked data, as an extension of the lemon model. Our translation module represents some core information associated to term translations and does not commit to specific views or translation theories. As a proof of concept, we have extracted the translations of the terms contained in Terminesp, a multilingual terminological database, and represented them as linked data. We have made them accessible on the Web both for humans (via a Web interface) and software agents (with a SPARQL endpoint).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.