958 resultados para scomputational linguistics
Resumo:
La gran diversitat lingüística i cultural present en la nostra societat sovint no s'ha fet gaire visible. Determinats aspectes més aparents, com la nacionalitat, imposen estereotips que amaguen realitats complexes. D'altra banda, persisteix una actitud assimiladora basada en la idea que la integració ha de voler dir la renúncia a bona part del bagatge cultural dels nouvinguts. Tanmateix, Catalunya té una llengua molt poc coneguda en altres societats, i per això la consciència que és important reconèixer l'altre aquí és molt habitual: la nacionalitat diu massa poc de nosaltres, i el mateix passa amb molts altres. Aquest projecte de recerca pretén reunir en format de fitxes una sèrie de dades molt bàsiques, especialment sobre la diversitat lingüística, i oferirles a qualsevol persona que tingui contacte amb persones immigrades per tal de posar a disposició algun element senzill que permeti expressar reconeixement i sensibilitat cap a la diversitat.
Resumo:
La gran diversitat lingüística i cultural present en la nostra societat sovint no s'ha fet gaire visible. Determinats aspectes més aparents, com la nacionalitat, imposen estereotips que amaguen realitats complexes. D'altra banda, persisteix una actitud assimiladora basada en la idea que la integració ha de voler dir la renúncia a bona part del bagatge cultural dels nouvinguts. Tanmateix, Catalunya té una llengua molt poc coneguda en altres societats, i per això la consciència que és important reconèixer l'altre aquí és molt habitual: la nacionalitat diu massa poc de nosaltres, i el mateix passa amb molts altres. Aquest projecte de recerca pretén reunir en format de fitxes una sèrie de dades molt bàsiques, especialment sobre la diversitat lingüística, i oferirles a qualsevol persona que tingui contacte amb persones immigrades per tal de posar a disposició algun element senzill que permeti expressar reconeixement i sensibilitat cap a la diversitat.
Resumo:
Finding an adequate paraphrase representation formalism is a challenging issue in Natural Language Processing. In this paper, we analyse the performance of Tree Edit Distance as a paraphrase representation baseline. Our experiments using Edit Distance Textual Entailment Suite show that, as Tree Edit Distance consists of a purely syntactic approach, paraphrase alternations not based on structural reorganizations do not find an adequate representation. They also show that there is much scope for better modelling of the way trees are aligned.
Resumo:
In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present a general, inclusive and computationally oriented typology of the linguistic mechanisms that give rise to form variations between paraphrase pairs.
Resumo:
This document describes some of the technological aspects of a project devoted to the creation of a factory for language resources. The project’s objectives are explained, as well as the idea to create a distributed infrastructure of web services. This document focuses on two main topics of the factory: (1) the technological approaches chosen to develop the factory, i.e. software, protocols, servers, etc. (2) and Interoperability as the main challenge is to permit different NLP tools work together in the factory. This document explains why XCES and GrAF are chosen as the main formats used for the linguistic data exchange.
Resumo:
This paper demonstrates a novel distributed architecture to facilitate the acquisition of Language Resources. We build a factory that automates the stages involved in the acquisition, production, updating and maintenance of these resources. The factory is designed as a platform where functionalities are deployed as web services, which can be combined in complex acquisition chains using workflows. We show a case study, which acquires a Translation Memory for a given pair of languages and a domain using web services for crawling, sentence alignment and conversion to TMX.
Resumo:
In recent decades, technological advances have made extensive documentation available to us. But the philologist must be aware of the dangers of poor use of the documentary corpus in order to avoid creating dreaded ghost words. In this paper we recall the main sources of this type of error: folk etymology phenomena among speakers, copyists" errors, transcribers" errors in the interpretation of some abbreviations and graphic variants of the manuscripts, onomastic changes introduced by cartographers" ignorance of linguistic variants, gaps in the dating of some documents, confusion in the processes of lemmatization and the evaluation of texts... All these sources of error contribute, to a greater or lesser degree, to the distortion or to the masking of the data on which the research of philologists is based. Hence the importance of philological rigour in the transmission and study of ancient texts.
Resumo:
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.
Resumo:
Dentro de las ciencias del lenguaje, el Análisis Crítico del Discurso (ACD) aprovecha las aportaciones de los recientes estudios sobre el texto (análisis del discurso, pragmática, sociolingüística, etnografía de la comunicación, teoría de la enunciación, etc.) para definir los propósitos y la metodología del análisis crítico. Ante el mundo multicultural y globalizado en el que vivimos, la única respuesta educativa posible es la necesidad de formar a una ciudadanía, que tenga habilidades críticas de lectura, escritura y pensamiento y participe de modo constructivo en el desarrollo de una comunidad plural, respetuosa y progresista. El artículo explora en la teoría y en la práctica la lectura crítica, para reflexionar sobre la necesidad de fomentar este tipo de práctica lectora en el aula. Para ejemplificar el funcionamiento de la lectura crítica, se trabajan lingüísticamente desde el ACD algunos fragmentos breves de discurso. Se analiza el grado y el tipo de comprensión lectora crítica que muestran 25 exámenes de universitarios españoles de 20 años, estudiantes de Traducción e Interpretación en la Universidad Pompeu Fabra de Barcelona, que habían tenido 80 horas de instrucción sobre lengua escrita y análisis deldiscurso en castellano y catalán (con varias sesiones sobre Análisis Crítico del Discurso), en las que se habían analizado cooperativamente textos parecidos al propuesto en el examen.
Resumo:
This article introduces EsPal: a Web-accessible repository containing a comprehensive set of properties of Spanish words. EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database. Properties available include word frequency, orthographic structure and neighborhoods, phonological structure and neighborhoods, and subjective ratings such as imageability. Subword structure properties are also available in terms of bigrams and trigrams, bi-phones, and bi-syllables. Lemma and part-of-speech information and their corresponding frequencies are also indexed. The website enables users to either upload a set of words to receive their properties, or to receive a set of words matching constraints on the properties. The properties themselves are easily extensible and will be added over time as they become available. It is freely available from the following website: http://www.bcbl.eu/databases/espal
Resumo:
In this demonstration we present our web services to perform Bayesian learning for classification tasks.
Resumo:
D'une certaine manière, la rhétorique est un art cognitif. L'art de discourir en situation concrète dans l'espoir de faire adhérer l'auditoire à une thèse suppose une forte aptitude cognitive: celle de se représenter la façon dont l'auditoire lui-même se représente une situation rhétorique. Or, à partir du moment où agir sur les représentations d'autrui est facilité par des techniques rhétoriques ou sophistiques, la question de la tromperie verbale s'est immiscée dans des affaires de régulation sociale et, avec elle, des enjeux tant de crédibilité que de crédulité. Dans le cadre démocratique rendant encore plus aiguë une forme de dépendance à l'information d'autrui, la nécessité de croire tout comme la possibilité d'être leurré mettent à l'épreuve tant le fonctionnement social de la Cité que l'évaluation des informations et de leurs auteurs. Le but des contributions de cet ouvrage n'est pas de dénoncer les effets de certains schèmes argumentatifs que d'aucuns jugeraient fallacieux ni d'ajouter une couche nouvelle aux critiques des sophismes, mais d'étudier leur fonctionnement et leurs effets cognitifs hic et nunc. Quels sont les mécanismes langagiers et cognitifs qui expliquent la «performance» des arguments réputés fallacieux? Comment fonctionnent les stratégies rhétoriques à l'intersection entre cognition, sciences du langage et société? Cet ouvrage, issu du colloque Communication et Cognition: manipulation, persuasion et biais dans le langage, tenu à Neuchâtel du 26 au 28 janvier 2011, propose plusieurs propositions originales ou hypothèses stimulantes dans l'espoir qu'elles inspireront tant les chercheurs spécialisés en rhétorique et sciences du langage à aller voir du côté de la psychologie cognitive que les spécialistes de ce domaine à mettre en évidence la rhétoricité de leurs recherches. English version: In a way, rhetoric is a cognitive art. The art of speaking in concrete situations in the hope of gaining the audience's consent on a given issue requires the operation of a cognitive ability: that of being able to represent the way an audience represents itself a rhetorical situation. Nonetheless, once we consider that rhetorical or sophistic techniques influence people's representations, verbal deception becomes a matter of social regulation, together with issues of credibility and credulity. In a democratic context fostering a form of dependence towards other people's information, the necessity of believing everything and the possibility of being duped are challenges for both the social management of the City and the evaluation of information and of its sources. The contribution of the chapters of this volume is neither to be found in the condemnation of the fallacious effects of specific argument schemes nor in the addition of yet another layer to fallacy criticism, but in the study of how fallacies work, hic et nunc. What are the linguistic and cognitive mechanisms at play behind the "performance" of fallacious arguments? How do rhetorical strategies work at the interface of cognition, language science and society? This book gathers papers that were presented during the international conference Communication & Cognition: manipulation, persuasion and biases in language, held at the University of Neuchâtel in January 2011. A number of original proposals and stimulating hypotheses emerge from them: we hope that these will inspire researchers in the language sciences who specialise in rhetoric to take on board cognitive scientific insights but also researchers in cognitive science to engage with the rhetoricity of their own research.