47 resultados para Language processing

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Desarrollo de un sistema capaz de procesar consultas en lenguaje natural introducidas por el usuario mediante el teclado. El sistema es capaz de responder a consultas en castellano, relacionadas con un dominio de aplicación representado mediante una base de datos relacional.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Language Resources are a critical component for Natural Language Processing applications. Throughout the years many resources were manually created for the same task, but with different granularity and coverage information. To create richer resources for a broad range of potential reuses, nformation from all resources has to be joined into one. The hight cost of comparing and merging different resources by hand has been a bottleneck for merging existing resources. With the objective of reducing human intervention, we present a new method for automating merging resources. We have addressed the merging of two verbs subcategorization frame (SCF) lexica for Spanish. The results achieved, a new lexicon with enriched information and conflicting information signalled, reinforce our idea that this approach can be applied for other task of NLP.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Language switching is omnipresent in bilingual individuals. In fact, the ability to switch languages (code switching) is a very fast, efficient, and flexible process that seems to be a fundamental aspect of bilingual language processing. In this study, we aimed to characterize psychometrically self-perceived individual differences in language switching and to create a reliable measure of this behavioral pattern by introducing a bilingual switching questionnaire. As a working hypothesis based on the previous literature about code switching, we decomposed language switching into four constructs: (i) L1 switching tendencies (the tendency to switch to L1; L1-switch); (ii) L2 switching tendencies (L2-switch); (iii) contextual switch, which indexes the frequency of switches usually triggered by a particular situation, topic, or environment; and (iv) unintended switch, which measures the lack of intention and awareness of the language switches. A total of 582 SpanishCatalan bilingual university students were studied. Twelve items were selected (three for each construct). The correlation matrix was factor-analyzed using minimum rank factor analysis followed by oblique direct oblimin rotation. The overall proportion of common variance explained by the four extracted factors was 0.86. Finally, to assess the external validity of the individual differences scored with the new questionnaire, we evaluated the correlations between these measures and several psychometric (language proficiency) and behavioral measures related to cognitive and attentional control. The present study highlights the importance of evaluating individual differences in language switching using self-assessment instruments when studying the interface between cognitive control and bilingualism.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This PhD project aims to study paraphrasing, initially understood as the different ways in which the same content is expressed linguistically. We will go into that concept in depth trying to define and delimit its scope more accurately. In that sense, we also aim to discover which kind of structures and phenomena it covers. Although there exist some paraphrasing typologies, the great majority of them only apply to English, and focus on lexical and syntactic transformations. Our intention is to go further into this subject and propose a paraphrasing typology for Spanish and Catalan combining lexical, syntactic, semantic and pragmatic knowledge. We apply a bottom-up methodology trying to collect evidence of this phenomenon from the data. For this purpose, we are initially using the Spanish Wikipedia as our corpus. The internal structure of this encyclopedia makes it a good resource for extracting paraphrasing examples for our investigation. This empirical approach will be complemented with the use of linguistic knowledge, and by comparing and contrasting our results to previously proposed paraphrasing typologies in order to enlarge the possible paraphrasing forms found in our corpus. The fact that the same content can be expressed in many different ways presents a major challenge for Natural Language Processing (NLP) applications. Thus, research on paraphrasing has recently been attracting increasing attention in the fields of NLP and Computational Linguistics. The results obtained in this investigation would be of great interest in many of these applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En el treball es realitza una transcripció de dos programes de televisió, amb la idea de saber quin és el tipus de llenguatge que usen aquests mitjans per adreçar-se al seu públic. Però seria absurd ignorar altres canals per als quals la llengua és imprescindible. Em refereixo al cinema, sobretot. I malgrat que no es considera un mitjà de comunicació, també és un element importantíssim pel que fa al tractament i transmissió lingüístics. I molts productes del cinema acaben sortint per televisió. La premsa escrita i, com a cas especial, Internet, també hi tenen força a dir.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El treball té com a objectiu l'estudi de les propietats semàntiques d'un grup de verbs de desplaçament i els seus corresponents arguments. La informació sobre el tipus de complement que demana cada verb és important de cara a conèixer l'estructura sintàctica de la frase i oferir solucions pràctiques en tasques de Processament del Llenguatge Natural. L'anàlisi se centrarà en els verbs conduir, navegar i volar, a partir dels sentits bàsics que el Diccionari d'ús dels verbs catalans (DUVC) descriu per a cadascun d'aquests verbs i de les seves restriccions selectives. Comprovarem, mitjançant un centenar de frases extretes del Corpus d'Ús del Català a la Web de la Universitat Pompeu Fabra i del Corpus Textual Informatitzat de la Llengua Catalana de l'Institut d'Estudis Catalans, si en la llengua es donen només els sentits i usos descrits en el DUVC i quins són els més freqüents. Finalment, descriurem els noms que fan de nucli dels arguments en termes de trets semàntics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Peer-reviewed

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El presente trabajo proporciona un nivel de ayuda para los usuarios de aplicaciones sociales, brindándoles un criterio adicional al momento de contactar a otra persona, apoyando al usuario con advertencias sobre algún comportamiento de riesgo de sus contactos y fomentando así la prevención de enfermedades.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper analyzes and evaluates, in the context of Ontology learning, some techniques to identify and extract candidate terms to classes of a taxonomy. Besides, this work points out some inconsistencies that may be occurring in the preprocessing of text corpus, and proposes techniques to obtain good terms candidate to classes of a taxonomy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present the theoretical and methodologicalfoundations for the development of a multi-agentSelective Dissemination of Information (SDI) servicemodel that applies Semantic Web technologies for specializeddigital libraries. These technologies make possibleachieving more efficient information management,improving agent–user communication processes, andfacilitating accurate access to relevant resources. Othertools used are fuzzy linguistic modelling techniques(which make possible easing the interaction betweenusers and system) and natural language processing(NLP) techniques for semiautomatic thesaurus generation.Also, RSS feeds are used as “current awareness bulletins”to generate personalized bibliographic alerts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to have richer resources with a broad range of potential uses for a significant number of languages.With the objective of reducing cost byeliminating human intervention, we present a new method for automating the merging of resources,with special emphasis in what we call the mapping step. This mapping step, which converts the resources into a common format that allows latter the merging, is usually performed with huge manual effort and thus makes the whole process very costly. Thus, we propose a method to perform this mapping fully automatically. To test our method, we have addressed the merging of two verb subcategorization frame lexica for Spanish, The resultsachieved, that almost replicate human work, demonstrate the feasibility of the approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to obtain richer resources and a broader range of potential uses for a significant number of languages. With the objective of reducing cost by eliminating human intervention, we present a new method towards the automatic merging of resources. This method includes both, the automatic mapping of resources involved to a common format and merging them, once in this format. This paper presents how we have addressed the merging of two verb subcategorization frame lexica for Spanish, but our method will be extended to cover other types of Lexical Resources. The achieved results, that almost replicate human work, demonstrate the feasibility of the approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This research investigates the phenomenon of translationese in two monolingual comparable corpora of original and translated Catalan texts. Translationese has been defined as the dialect, sub-language or code of translated language. This study aims at giving empirical evidence of translation universals regardless the source language.Traditionally, research conducted on translation strategies has been mainly intuition-based. Computational Linguistics and Natural Language Processing techniques provide reliable information of lexical frequencies, morphological and syntactical distribution in corpora. Therefore, they have been applied to observe which translation strategies occur in these corpora.Results seem to prove the simplification, interference and explicitation hypotheses, whereas no sign of normalization has been detected with the methodology used.The data collected and the resources created for identifying lexical, morphological and syntactic patterns of translations can be useful for Translation Studies teachers, scholars and students: teachers will have more tools to help students avoid the reproduction of translationese patterns. Resources developed will help in detecting non-genuine or inadequate structures in the target language. This fact may imply an improvement in stylistic quality in translations. Translation professionals can also take advantage of these resources to improve their translation quality.