1000 resultados para Llenguatge natural (Informàtica) -- Processament


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A crucial step for understanding how lexical knowledge is represented is to describe the relative similarity of lexical items, and how it influences language processing. Previous studies of the effects of form similarity on word production have reported conflicting results, notably within and across languages. The aim of the present study was to clarify this empirical issue to provide specific constraints for theoretical models of language production. We investigated the role of phonological neighborhood density in a large-scale picture naming experiment using fine-grained statistical models. The results showed that increasing phonological neighborhood density has a detrimental effect on naming latencies, and re-analyses of independently obtained data sets provide supplementary evidence for this effect. Finally, we reviewed a large body of evidence concerning phonological neighborhood density effects in word production, and discussed the occurrence of facilitatory and inhibitory effects in accuracy measures. The overall pattern shows that phonological neighborhood generates two opposite forces, one facilitatory and one inhibitory. In cases where speech production is disrupted (e.g. certain aphasic symptoms), the facilitatory component may emerge, but inhibitory processes dominate in efficient naming by healthy speakers. These findings are difficult to accommodate in terms of monitoring processes, but can be explained within interactive activation accounts combining phonological facilitation and lexical competition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present ClInt (Clinical Interview), a bilingual Spanish-Catalan spoken corpus that contains 15 hours of clinical interviews. It consists of audio files aligned with multiple-level transcriptions comprising orthographic, phonetic and morphological information, as well as linguistic and extralinguistic encoding. This is a previously non-existent resource for these languages and it offers a wide-ranging exploitation potential in a broad variety of disciplines such as Linguistics, Natural Language Processing and related fields.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El presente trabajo proporciona un nivel de ayuda para los usuarios de aplicaciones sociales, brindándoles un criterio adicional al momento de contactar a otra persona, apoyando al usuario con advertencias sobre algún comportamiento de riesgo de sus contactos y fomentando así la prevención de enfermedades.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El treball té com a objectiu l'estudi de les propietats semàntiques d'un grup de verbs de desplaçament i els seus corresponents arguments. La informació sobre el tipus de complement que demana cada verb és important de cara a conèixer l'estructura sintàctica de la frase i oferir solucions pràctiques en tasques de Processament del Llenguatge Natural. L'anàlisi se centrarà en els verbs conduir, navegar i volar, a partir dels sentits bàsics que el Diccionari d'ús dels verbs catalans (DUVC) descriu per a cadascun d'aquests verbs i de les seves restriccions selectives. Comprovarem, mitjançant un centenar de frases extretes del Corpus d'Ús del Català a la Web de la Universitat Pompeu Fabra i del Corpus Textual Informatitzat de la Llengua Catalana de l'Institut d'Estudis Catalans, si en la llengua es donen només els sentits i usos descrits en el DUVC i quins són els més freqüents. Finalment, descriurem els noms que fan de nucli dels arguments en termes de trets semàntics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Herramienta de transformación de sentencias en lenguaje natural en consultas SQL, realización de peticiones a la base de datos y muestra de los resultados.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The automatic acquisition of lexical associations from corpora is a crucial issue for Natural Language Processing. A lexical association is a recurrent combination of words that co-occur together more often than expected by chance in a given domain. In fact, lexical associations define linguistic phenomena such as idiomes, collocations or compound words. Due to the fact that the sense of a lexical association is not compositionnal, their identification is fundamental for the realization of analysis and synthesis that take into account all the subtilities of the language. In this report, we introduce a new statistically-based architecture that extracts from naturally occurring texts contiguous and non contiguous. For that purpose, three new concepts have been defined : the positional N-gram models, the Mutual Expectation and the GenLocalMaxs algorithm. Thus, the initial text is fisrtly transformed in a set of positionnal N-grams i.e ordered vectors of simple lexical units. Then, an association measure, the Mutual Expectation, evaluates the degree of cohesion of each positional N-grams based on the identification of local maximum values of Mutual Expectation. Great efforts have also been carried out to evaluate our metodology. For that purpose, we have proposed the normalisation of five well-known association measures and shown that both the Mutual Expectation and the GenLocalMaxs algorithm evidence significant improvements comparing to existent metodologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Amb l'evolució de la tecnologia les capacitats de còmput es van incrementant i problemes irresolubles del passat deixen de ser-ho amb els recursos actuals. La majoria d'aplicacions que s'enfronten a aquests problemes són complexes, ja que per aconseguir taxes elevades de rendiment es fa necessari utilitzar el major nombre de recursos possibles, i això les dota d'una arquitectura inherentment distribuïda. Seguint la tendència de la comunitat investigadora, en aquest treball de recerca es proposa una arquitectura per a entorns grids basada en la virtualització de recursos que possibilita la gestió eficient d'aquests recursos. L'experimentació duta a terme ha permès comprovar la viabilitat d'aquesta arquitectura i la millora en la gestió que la utilització de màquines virtuals proporciona.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the execution of the scientific applications, different methods have been proposed to dynamically provide execution environments for such applications that hide the complexity of underlying distributed and heterogeneous infrastructures. Recently virtualization has emerged as a promising technology to provide such environments. Virtualization is a technology that abstracts away the details of physical hardware and provides virtualized resources for high-level scientific applications. Virtualization offers a cost-effective and flexible way to use and manage computing resources. Such an abstraction is appealing in Grid computing and Cloud computing for better matching jobs (applications) to computational resources. This work applies the virtualization concept to the Condor dynamic resource management system by using Condor Virtual Universe to harvest the existing virtual computing resources to their maximum utility. It allows existing computing resources to be dynamically provisioned at run-time by users based on application requirements instead of statically at design-time thereby lay the basis for efficient use of the available resources, thus providing way for the efficient use of the available resources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Actualmente existen muchas aplicaciones paralelas/distribuidas en las cuales SPMD es el paradigma más usado. Obtener un buen rendimiento en una aplicación paralela de este tipo es uno de los principales desafíos dada la gran cantidad de aplicaciones existentes. Este objetivo no es fácil de resolver ya que existe una gran variedad de configuraciones de hardware, y también la naturaleza de los problemas pueden ser variados así como la forma de implementarlos. En consecuencia, si no se considera adecuadamente la combinación "software/hardware" pueden aparecer problemas inherentes a una aplicación iterativa sin una jerarquía de control definida de acuerdo a este paradigma. En SPMD todos los procesos ejecutan el mismo código pero computan una sección diferente de los datos de entrada. Una solución a un posible problema del rendimiento es proponer una estrategia de balance de carga para homogeneizar el cómputo entre los diferentes procesos. En este trabajo analizamos el benchmark CG con cargas heterogéneas con la finalidad de detectar los posibles problemas de rendimiento en una aplicación real. Un factor que determina el rendimiento en esta aplicación es la cantidad de elementos nonzero contenida en la sección de matriz asignada a cada proceso. Determinamos que es posible definir una estrategia de balance de carga que puede ser implementada de forma dinámica y demostramos experimentalmente que el rendimiento de la aplicación puede mejorarse de forma significativa con dicha estrategia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Es presenten les bases teòriques de la visualització de la informació i els models visuals més estesos actualment amb exemples en el camp de la biblioteconomia ila documentació. Es presenta un model simple per crear visualitzacions amb un exemple demostratiu. S'ofereix una selecció de fonts d'informació del tema per a un coneixement bàsic i per estar al dia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En aquest article presentem el projecte ReSim: representació simbòlica de preguntes, que ha consistit en la creació d'un sistema automàtic capaç de representar en un llenguatge formal una pregunta en llenguatge natural. Tenint en compte la complexitat de l'objectiu, el projecte es va centrar en un àmbit temàtic concret. El tipus de representació usada està basada en el model de les Estructures Lèxico-Conceptuals de Jackendoff 1990.