996 resultados para Corpora (Linguistics)


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information-theoretic techniques, with traditional symbolic methods. This work is made possible by the recent availability of linguistic databases that add rich linguistic annotation to corpora of natural language text. Already, these methods have led to a dramatic improvement in the performance of a variety of NLP systems with similar improvement likely in the coming years. This paper focuses on these trends, surveying in particular three areas of recent progress: part-of-speech tagging, stochastic parsing, and lexical semantics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For references, please quote the full paper as published in the above journal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The importance of the new textual genres such as blogs or forum entries is growing in parallel with the evolution of the Social Web. This paper presents two corpora of blog posts in English and in Spanish, annotated according to the EmotiBlog annotation scheme. Furthermore, we created 20 factual and opinionated questions for each language and also the Gold Standard for their answers in the corpus. The purpose of our work is to study the challenges involved in a mixed fact and opinion question answering setting by comparing the performance of two Question Answering (QA) systems as far as mixed opinion and factual setting is concerned. The first one is open domain, while the second one is opinion-oriented. We evaluate separately the two systems in both languages and propose possible solutions to improve QA systems that have to process mixed questions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The exponential growth of the subjective information in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. They require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents EmotiBlog – a fine-grained annotation scheme for subjectivity. We show the manner in which it is built and demonstrate the benefits it brings to the systems using it for training, through the experiments we carried out on opinion mining and emotion detection. We employ corpora of different textual genres –a set of annotated reported speech extracted from news articles, the set of news titles annotated with polarity and emotion from the SemEval 2007 (Task 14) and ISEAR, a corpus of real-life self-expressed emotion. We also show how the model built from the EmotiBlog annotations can be enhanced with external resources. The results demonstrate that EmotiBlog, through its structure and annotation paradigm, offers high quality training data for systems dealing both with opinion mining, as well as emotion detection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Estudi de l’evolució semàntica de enze, enza i de les unitats fraseològiques en què participa, des de les primeres atestacions escrites fins als usos contemporanis en català. S’hi té en compte l’evolució dels altres derivats romànics del llatí INDEX, -ICIS. S’hi aplica una anàlisi d’orientació cognitiva i es fonamenta l’estudi en l’aprofitament de corpora textuals (antics i contemporanis), dins dels quals hi ha l’obra literària i gramatical d’Enric Valor.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is no question nowadays as to the international and powerful status of English at a global scale and, consequently, as to its presence in non-English speaking countries at different levels. Linguistically speaking, English is one of the languages which have mostly influenced Spanish throughout its history and especially from the late 1960s. In this study, the impact of English on Spanish is considered in the language of sports; particularly, sports Anglicisms and false Anglicisms are analysed. Due attention is paid to the different forms that an Anglicism may adopt and to which of those forms are more widely accepted or rejected by prescriptivists and speakers at large, in the light of a contrastive analysis of their appearance in the Nuevo diccionario de anglicismos, the Diccionario de la Real Academia Española and the Corpus de Referencia del Español Actual.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aquest article presenta una mostra dels resultats de l’anàlisi detallada de locucions, col·locacions i altres elements fraseològics i d’ordre de mots significatius quant a la caracterització del cabal de llenguatge literari de Joan Roís de Corella. Aquesta anàlisi es fa amb metodologia interdisciplinar de base de lingüistica de corpus i de diacronia lingüistica, i amb el concurs de les tecnologies de la informació i la comunicació (humanitats digitals), que s’apliquen a l’anàlisi de l’aportació lèxica i estilística d’un autor clau com és Roís de Corella a fide calibrar el grau de sintonia i, alhora, d’especificitat del seu llenguatge literari; en quin grau coincideix el seu llenguatge literari amb el d’altres grans clàssics culturals de la Corona d’Aragó, i en què basa, alhora, Roís de Corella la clau de la seua mestria estilística.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this paper is to describe the use that professional translators make of corpora as translation resources. First, we briefly review the literature on translation practitioners’ use of corpora in the contexts of both translation training and professional translation. Then we present our survey-based study, analyse the uptake of corpora among Spanish translators and describe the use of this kind of translation resource. The results show that even if corpora are not as frequently used as other kinds of resources, such as dictionaries, there are professional translators who do use corpora, in a variety of ways, in their work. Additionally, non-users do not seem entirely sceptical about corpora. Against that backdrop, translator trainers are invited to continue to report on how corpora can be used as translation resources.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The reprise evidential conditional (REC) is nowadays not very usual in Catalan: it is restricted to journalistic language and to some very formal genres (such as academic or legal language), it is not present in spontaneous discourse. On the one hand, it has been described among the rather new modality values of the conditional. On the other, the normative tradition tended to reject it for being a gallicism, or to describe it as an unsuitable neologism. Thanks to the extraction from text corpora, we surprisingly find this REC in Catalan from the beginning of the fourteenth century to the contemporary age, with semantic and pragmatic nuances and different evidence of grammaticalization. Due to the current interest in evidentiality, the REC has been widely studied in French, Italian and Portuguese, focusing mainly on its contemporary uses and not so intensively on the diachronic process that could explain the origin of this value. In line with this research, that we initiated studying the epistemic and evidential future in Catalan, our aim is to describe: a) the pragmatic context that could have been the initial point of the REC in the thirteenth century, before we find indisputable attestations of this use; b) the path of semantic change followed by the conditional from a ‘future in the past’ tense to the acquisition of epistemic and evidential values; and c) the role played by invited inferences, subjectification and intersubjectification in this change.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this dissertation is to give a contribution to the translation of the terminology of Cycle and Bike Polo into European Portuguese and hence call the attention of a wide Portuguese public to this fairly new sport, whose roots go back to Elephant and Horse Polo in India and in other parts of the world. Sequencing a characterization of technical translation, translation issues of Bike and Cycle Polo´s terminological units have been dealt with in the light of the Cognitive Linguistics framework and hence intimately associated both with physical experiences and historical facts. In fact, sports terminology coinage in this field is highly motivated by metaphorical and metonymical conceptualization mapped from physical reality dimensions, as well as from already existing sports terminology from other sports modalities. In order to render this research unique, a glossary of technical terms from Bike and Cycle Polo has been gathered, since most of them had not yet undergone translation from English into European Portuguese. For validation of my translations I have resorted to Portuguese bike polo players, with special reference to Catarina Almeida, who introduced me to Bike Polo’s terminology.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When analysing software metrics, users find that visualisation tools lack support for (1) the detection of patterns within metrics; and (2) enabling analysis of software corpora. In this paper we present Explora, a visualisation tool designed for the simultaneous analysis of multiple metrics of systems in software corpora. Explora incorporates a novel lightweight visualisation technique called PolyGrid that promotes the detection of graphical patterns. We present an example where we analyse the relation of subtype polymorphism with inheritance and invocation in corpora of Smalltalk and Java systems and find that (1) subtype polymorphism is more likely to be found in large hierarchies; (2) as class hierarchies grow horizontally, they also do so vertically; and (3) in polymorphic hierarchies the length of the name of the classes is orthogonal to the cardinality of the call sites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mode of access: Internet.