18 resultados para Lexical Semantics

em Universidad de Alicante


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the most important factors of recognition, belonging and identification in scientific communities is their specialized language: doctors, mathematicians and anthropologists feel they are part of a group with which they can interact because they share a common “language”. While ideology is present in all academic registers, it is in human sciences where its presence (or absence) leads to more visible linguistic phenomena. An interesting example is that of lesbian studies: as non-heterosexual members of society have become less stigmatized, lesbian studies have developed a language of their own. In our paper, we shall explore the mechanisms used in the creation of specific vocabulary in this academic area, paying special attention to the refashioning or deconstruction of meaning of established terms as a result of changes in social perception or the challenging of pre-determined meanings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a whole Natural Language Processing (NLP) system for Spanish. The core of this system is the parser, which uses the grammatical formalism Lexical-Functional Grammars (LFG). Another important component of this system is the anaphora resolution module. To solve the anaphora, this module contains a method based on linguistic information (lexical, morphological, syntactic and semantic), structural information (anaphoric accessibility space in which the anaphor obtains the antecedent) and statistical information. This method is based on constraints and preferences and solves pronouns and definite descriptions. Moreover, this system fits dialogue and non-dialogue discourse features. The anaphora resolution module uses several resources, such as a lexical database (Spanish WordNet) to provide semantic information and a POS tagger providing the part of speech for each word and its root to make this resolution process easier.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

EmotiBlog is a corpus labelled with the homonymous annotation schema designed for detecting subjectivity in the new textual genres. Preliminary research demonstrated its relevance as a Machine Learning resource to detect opinionated data. In this paper we compare EmotiBlog with the JRC corpus in order to check the EmotiBlog robustness of annotation. For this research we concentrate on its coarse-grained labels. We carry out a deep ML experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC demonstrating the EmotiBlog validity as a resource for the SA task.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Uno de los problemas actuales en el dominio de la salud es reutilizar y compartir la información clínica entre profesionales, ya que ésta se encuentra escrita usando terminologías específicas. Una posible solución es usar un recurso de conocimiento común sobre el que mapear la información existente. Nuestro objetivo es comprobar si la adición de conocimiento semántico superficial puede mejorar los mapeados establecidos. Para ello experimentamos con un conjunto de etiquetas de NANDA-I y con un conjunto de descripciones de SNOMED-CT en castellano. Los resultados obtenidos en los experimentos muestran que la inclusión de conocimiento semántico superficial mejora significativamente el mapeado léxico entre los dos recursos estudiados.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article analyzes the solutions given in Spanish translations to the morphological creativity shown in the names of Marvel comic book characters. The English versions almost invariably provide a full description of the hero (or villain) by means of a wide variety of word-formation mechanisms leading to highly expressive charactonyms. Indeed, examples shall be listed of names of comic book heroes created through compounding, derivation, including prefixation or suffixation (both classical and Anglo-Saxon but also from other origins), lexical blending, abbreviation, clipping, onomatopoeia, and borrowings from Spanish or from other languages. Early translations into Spanish seemed to be slightly less expressive than the original, even when the same word-formation mechanism was used, usually due to either problems of transparency mainly in some of the word parts or to translation constraints. In later periods, a number of factors, including the influence from other media featuring the same characters and the general trend towards globalization through English, have led translators to choose repetition as the most frequent strategy, which has almost eliminated the creative power of wordformation mechanisms in Spanish and their ability to convey the stylistic effects found in the English versions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Unlike traditional approaches, new communicative trends disregard the role of word-formation mechanisms. They tend to focus on syntax and/or vocabulary without analyzing the mechanisms involved in the creation of lexical items. In this paper, based on the analysis of the use of prefixes by L2 learners in oral and written productions, as provided by the SULEC, we emphasize the advantages that word-formation awareness and knowledge may have for the learners in terms of production, creativity, understanding, autonomy, and proficiency. Through the teaching of word-formation learners may more easily decipher, decode and/or encode messages, create words they have never seen before, etc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the chemical textile domain experts have to analyse chemical components and substances that might be harmful for their usage in clothing and textiles. Part of this analysis is performed searching opinions and reports people have expressed concerning these products in the Social Web. However, this type of information on the Internet is not as frequent for this domain as for others, so its detection and classification is difficult and time-consuming. Consequently, problems associated to the use of chemical substances in textiles may not be detected early enough, and could lead to health problems, such as allergies or burns. In this paper, we propose a framework able to detect, retrieve, and classify subjective sentences related to the chemical textile domain, that could be integrated into a wider health surveillance system. We also describe the creation of several datasets with opinions from this domain, the experiments performed using machine learning techniques and different lexical resources such as WordNet, and the evaluation focusing on the sentiment classification, and complaint detection (i.e., negativity). Despite the challenges involved in this domain, our approach obtains promising results with an F-score of 65% for polarity classification and 82% for complaint detection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The English language and the Internet, both separately and taken together, are nowadays well-acknowledged as powerful forces which influence and affect the lexico-grammatical characteristics of other languages world-wide. In fact, many authors like Crystal (2004) have pointed out the emergence of the so-called Netspeak, that is, the language used in the Net or World Wide Web; as Crystal himself (2004: 19) puts it, ‘a type of language displaying features that are unique to the Internet […] arising out of its character as a medium which is electronic, global and interactive’. This ‘language’, however, may be differently understood: either as an adaptation of the English language proper to internet requirements and purposes, or as a new and rapidly-changing and developing language as a result of a rapid evolution or adaptation to Internet requirements of almost all world languages, for whom English is a trendsetter. If the second and probably most plausible interpretation is adopted, there are three salient features of ‘Netspeak’: (a) the rapid expansion of all its new linguistic developments thanks to the Internet itself, which may lead to the generalization and widespread acceptance of new words, coinages, or meanings, hundreds of times faster than was the case with the printed media. As said above, (b) the visible influence of English, the most prevalent language on the Internet. Consequently, (c) this new language tends to reduce the ‘distance’ between English and other languages as well as the ignorance of the former by speakers of other languages, since the ‘Netspeak’ version of the latter adopts grammatical, syntactic and lexical features of English. Thus, linguistic differences may even disappear when code-switching and/or borrowing occurs, as whole fragments of English appear in other language contexts. As a consequence of the new situation, an ideal context appears for interlanguage or multilingual word formation to thrive: puns, blends, compounds and word creativity in general find in the web the ideal place to gain rapid acceptance world-wide, as a result of fashion, coincidence, or sheer merit of the new linguistic proposals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The great amount of text produced every day in the Web turned it as one of the main sources for obtaining linguistic corpora, that are further analyzed with Natural Language Processing techniques. On a global scale, languages such as Portuguese - official in 9 countries - appear on the Web in several varieties, with lexical, morphological and syntactic (among others) differences. Besides, a unified spelling system for Portuguese has been recently approved, and its implementation process has already started in some countries. However, it will last several years, so different varieties and spelling systems coexist. Since PoS-taggers for Portuguese are specifically built for a particular variety, this work analyzes different training corpora and lexica combinations aimed at building a model with high-precision annotation in several varieties and spelling systems of this language. Moreover, this paper presents different dictionaries of the new orthography (Spelling Agreement) as well as a new freely available testing corpus, containing different varieties and textual typologies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Par leur caractère polyphonique, de nombreux romans contemporains posent des problèmes lexicaux au traducteur en mélangeant lexique standard, argot et termes techniques. La question qui se pose est alors de savoir si les dictionnaires peuvent être utiles au praticien. Nous verrons que pour des raisons théoriques et pratiques, l’aide qu’ils apportent est limitée, un dictionnaire réellement utile devrait changer ses présupposés conceptuels, donc devenir un dictionnaire culturel et adopter une forme électronique.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El artículo presenta una investigación en la que se analizan, desde una perspectiva lexicométrica y factorial, los aspectos lingüísticos y paralingüísticos más relevantes de la escritura digital síncrona del adolescente español, en uno de los programas de mensajería instantánea más utilizados en la actualidad (WhatsApp©). La escritura en soportes digitales móviles (smartphones y tabletas) es una de las actividades más realizadas en nuestra sociedad y constituye un componente esencial de la competencia comunicativa en la Sociedad de la Información. La comunicación digital forma parte de nuestras vidas y el análisis del uso comunicativo digital y ubicuo con dispositivos y programas tiene amplias repercusiones sociales, lingüísticas y pedagógicas. La investigación se ha contextualizado en una muestra de 417 conversaciones de WhatsApp de estudiantes de enseñanza secundaria, de entre 13 y 16 años, en cuatro provincias españolas. La metodología de investigación ha sido de corte cuantitativa para abordar el análisis lexicométrico del corpus lingüístico-digital con referencia a los elementos lingüísticos y paralingüísticos más relevantes; para, posteriormente, realizar el análisis de las correlaciones entre diferentes variables independientes que expliquen patrones lingüísticos y de uso en la escritura digital. Los resultados muestran que la escritura digital en este tipo de programas tiene una serie de características específicas ortotipográficas y audiovisuales condicionadas por variables de uso, el tamaño de la pantalla del dispositivo, la horas de conversación y la relación establecida entre los interlocutores.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aquest article presenta una mostra dels resultats de l’anàlisi detallada de locucions, col·locacions i altres elements fraseològics i d’ordre de mots significatius quant a la caracterització del cabal de llenguatge literari de Joan Roís de Corella. Aquesta anàlisi es fa amb metodologia interdisciplinar de base de lingüistica de corpus i de diacronia lingüistica, i amb el concurs de les tecnologies de la informació i la comunicació (humanitats digitals), que s’apliquen a l’anàlisi de l’aportació lèxica i estilística d’un autor clau com és Roís de Corella a fide calibrar el grau de sintonia i, alhora, d’especificitat del seu llenguatge literari; en quin grau coincideix el seu llenguatge literari amb el d’altres grans clàssics culturals de la Corona d’Aragó, i en què basa, alhora, Roís de Corella la clau de la seua mestria estilística.