1000 resultados para Tractament del llenguatge natural (Informàtica) -- TFC
Resumo:
Desarrollo de un sistema capaz de procesar consultas en lenguaje natural introducidas por el usuario mediante el teclado. El sistema es capaz de responder a consultas en castellano, relacionadas con un dominio de aplicación representado mediante una base de datos relacional.
Resumo:
Aquest projecte tracta la implementació d’una eina gràfica multiplataforma de creació i edició de gramàtiques electròniques per representar el Llenguatge Natural. És una eina per lingüistes i projectes com Spanish FrameNet Project amb la quan poden representar fàcilment transductors en un format més visual, les transicions es representen en forma de “caixes”, i guardar els resultats. S’han implementat varies opcions per crear una eina còmode i personalitzable per l’usuari amb funcionalitats enfocades a les seves necessitats com importar/exportar autòmats des d’una Expressió Regular. Es tracta l’implementació de tots els components que s’han necessitat per crear la GUI així com la seva funcionalitat.
Resumo:
Peer-reviewed
Resumo:
Estudi realitzat a partir d’una estada al Computer Science and Artificial Intelligence Lab, del Massachusetts Institute of Technology, entre 2006 i 2008. La recerca desenvolupada en aquest projecte se centra en mètodes d'aprenentatge automàtic per l'anàlisi sintàctica del llenguatge. Com a punt de partida, establim que la complexitat del llenguatge exigeix no només entendre els processos computacionals associats al llenguatge sinó també entendre com es pot aprendre automàticament el coneixement per a dur a terme aquests processos.
Resumo:
This PhD project aims to study paraphrasing, initially understood as the different ways in which the same content is expressed linguistically. We will go into that concept in depth trying to define and delimit its scope more accurately. In that sense, we also aim to discover which kind of structures and phenomena it covers. Although there exist some paraphrasing typologies, the great majority of them only apply to English, and focus on lexical and syntactic transformations. Our intention is to go further into this subject and propose a paraphrasing typology for Spanish and Catalan combining lexical, syntactic, semantic and pragmatic knowledge. We apply a bottom-up methodology trying to collect evidence of this phenomenon from the data. For this purpose, we are initially using the Spanish Wikipedia as our corpus. The internal structure of this encyclopedia makes it a good resource for extracting paraphrasing examples for our investigation. This empirical approach will be complemented with the use of linguistic knowledge, and by comparing and contrasting our results to previously proposed paraphrasing typologies in order to enlarge the possible paraphrasing forms found in our corpus. The fact that the same content can be expressed in many different ways presents a major challenge for Natural Language Processing (NLP) applications. Thus, research on paraphrasing has recently been attracting increasing attention in the fields of NLP and Computational Linguistics. The results obtained in this investigation would be of great interest in many of these applications.
Resumo:
In this paper we present the theoretical and methodologicalfoundations for the development of a multi-agentSelective Dissemination of Information (SDI) servicemodel that applies Semantic Web technologies for specializeddigital libraries. These technologies make possibleachieving more efficient information management,improving agent–user communication processes, andfacilitating accurate access to relevant resources. Othertools used are fuzzy linguistic modelling techniques(which make possible easing the interaction betweenusers and system) and natural language processing(NLP) techniques for semiautomatic thesaurus generation.Also, RSS feeds are used as “current awareness bulletins”to generate personalized bibliographic alerts.
Resumo:
In this paper we present a description of the role of definitional verbal patterns for the extraction of semantic relations. Several studies show that semantic relations can be extracted from analytic definitions contained in machine-readable dictionaries (MRDs). In addition, definitions found in specialised texts are a good starting point to search for different types of definitions where other semantic relations occur. The extraction of definitional knowledge from specialised corpora represents another interesting approach for the extraction of semantic relations. Here, we present a descriptive analysis of definitional verbal patterns in Spanish and the first steps towards the development of a system for the automatic extraction of definitional knowledge.
Resumo:
By providing a better understanding of paraphrase and coreference in terms of similarities and differences in their linguistic nature, this article delimits what the focus of paraphrase extraction and coreference resolution tasks should be, and to what extent they can help each other. We argue for the relevance of this discussion to Natural Language Processing.
Resumo:
Finding an adequate paraphrase representation formalism is a challenging issue in Natural Language Processing. In this paper, we analyse the performance of Tree Edit Distance as a paraphrase representation baseline. Our experiments using Edit Distance Textual Entailment Suite show that, as Tree Edit Distance consists of a purely syntactic approach, paraphrase alternations not based on structural reorganizations do not find an adequate representation. They also show that there is much scope for better modelling of the way trees are aligned.
Resumo:
In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present a general, inclusive and computationally oriented typology of the linguistic mechanisms that give rise to form variations between paraphrase pairs.
Resumo:
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.
Resumo:
A crucial step for understanding how lexical knowledge is represented is to describe the relative similarity of lexical items, and how it influences language processing. Previous studies of the effects of form similarity on word production have reported conflicting results, notably within and across languages. The aim of the present study was to clarify this empirical issue to provide specific constraints for theoretical models of language production. We investigated the role of phonological neighborhood density in a large-scale picture naming experiment using fine-grained statistical models. The results showed that increasing phonological neighborhood density has a detrimental effect on naming latencies, and re-analyses of independently obtained data sets provide supplementary evidence for this effect. Finally, we reviewed a large body of evidence concerning phonological neighborhood density effects in word production, and discussed the occurrence of facilitatory and inhibitory effects in accuracy measures. The overall pattern shows that phonological neighborhood generates two opposite forces, one facilitatory and one inhibitory. In cases where speech production is disrupted (e.g. certain aphasic symptoms), the facilitatory component may emerge, but inhibitory processes dominate in efficient naming by healthy speakers. These findings are difficult to accommodate in terms of monitoring processes, but can be explained within interactive activation accounts combining phonological facilitation and lexical competition.
Resumo:
In this paper we present ClInt (Clinical Interview), a bilingual Spanish-Catalan spoken corpus that contains 15 hours of clinical interviews. It consists of audio files aligned with multiple-level transcriptions comprising orthographic, phonetic and morphological information, as well as linguistic and extralinguistic encoding. This is a previously non-existent resource for these languages and it offers a wide-ranging exploitation potential in a broad variety of disciplines such as Linguistics, Natural Language Processing and related fields.
Resumo:
CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition.
Resumo:
Un cercador documental d'arxius locals, en diferents formats, que permet l'extracció i indexació de lemes en diferents idiomes (principalment català, castellà i anglès) en el que s'habilita la cerca a través d'una pàgina web allotjada en un servidor local.