Resum en castellà: Partiendo de la necesidad de dar a conocer un rostro nuevo para Alejandra Pizarnik, que no olvidara el resto pero que se singularizara de ellos, en este ensayo se ha buscado sobre todo trazar un recorrido de lectura diferente, donde la voz de Antonin Artaud resuena como intertexto fundamental, y donde las posibilidades de un cuerpo troceado y resignificado constituyen el escenario exclusivo de un tipo de expresividad que no olvida las existencia de un sujeto doble: mujer y escritora, cuerpo y lenguaje. Resum en català: Partint de la necessitat de donar a conèixer un rostre nou d’Alejandra Pizarnik, que no oblidi la resta però que se singularitzi d’ells, en aquest assaig s’ha buscat sobretot traçar un recorregut de lectura diferent, en elq ue la veu d’Antonin Artaud ressona com a intertexto fonamental, i on les possibilitats d’un cos trossejat i resignificat conformen l’escenari exclusiu d’un tipus d’expressivitat que no oblida l’existència d’un subjecte doble: dona i escriptora, cos i llenguatge.
This PhD project aims to study paraphrasing, initially understood as the different ways in which the same content is expressed linguistically. We will go into that concept in depth trying to define and delimit its scope more accurately. In that sense, we also aim to discover which kind of structures and phenomena it covers. Although there exist some paraphrasing typologies, the great majority of them only apply to English, and focus on lexical and syntactic transformations. Our intention is to go further into this subject and propose a paraphrasing typology for Spanish and Catalan combining lexical, syntactic, semantic and pragmatic knowledge. We apply a bottom-up methodology trying to collect evidence of this phenomenon from the data. For this purpose, we are initially using the Spanish Wikipedia as our corpus. The internal structure of this encyclopedia makes it a good resource for extracting paraphrasing examples for our investigation. This empirical approach will be complemented with the use of linguistic knowledge, and by comparing and contrasting our results to previously proposed paraphrasing typologies in order to enlarge the possible paraphrasing forms found in our corpus. The fact that the same content can be expressed in many different ways presents a major challenge for Natural Language Processing (NLP) applications. Thus, research on paraphrasing has recently been attracting increasing attention in the fields of NLP and Computational Linguistics. The results obtained in this investigation would be of great interest in many of these applications.
In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present a general, inclusive and computationally oriented typology of the linguistic mechanisms that give rise to form variations between paraphrase pairs.
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.
This article introduces EsPal: a Web-accessible repository containing a comprehensive set of properties of Spanish words. EsPal is based on an extensible set of data sources, beginning with a 300 million token written database and a 460 million token subtitle database. Properties available include word frequency, orthographic structure and neighborhoods, phonological structure and neighborhoods, and subjective ratings such as imageability. Subword structure properties are also available in terms of bigrams and trigrams, bi-phones, and bi-syllables. Lemma and part-of-speech information and their corresponding frequencies are also indexed. The website enables users to either upload a set of words to receive their properties, or to receive a set of words matching constraints on the properties. The properties themselves are easily extensible and will be added over time as they become available. It is freely available from the following website: http://www.bcbl.eu/databases/espal
Actualmente nadie pone en duda la interrelación de los componentes sintáctico y semántico, y en esta línea las diátesis se han mostrado como un medio eficaz para acceder a la semántica a partir de la sintaxis. En el trabajo que presentamos partimos de la hipótesis de que la semántica verbal condiciona el tipo de estructuras sintagmáticas en que un verbo puede aparecer; asimismo, consideramos que se puede identificar la clase semántica de un verbo en función de las diferentes diátesis en que puede participar. El desarrollo de la investigación requiere, en primer lugar, definir las estructuras de diátesis necesarias para la identificación de las clases semánticas y el establecimiento de los papeles temáticos que caracterizarán cada una de las clases y que permitirán la conexión entre la sintaxis y la semántica. En esta comunicación presentamos la metodología seguida para el establecimiento de las clases verbales de las lenguas implicadas en el proyecto Pirápides, y además aportamos una primera propuesta de organización y especificación de las diátesis generales del español y del catalán.
En este artículo se presenta una clase de predicados, la de cambio, a partir de los elementos que hemos definido como básicos para la descripción del comportamiento verbal (componentes de significado, diátesis y estructura eventual). Se parte de la hipótesis de que los tres aspectos citados interaccionan entre sí y que son fundamentales a la hora de dar cuenta del uso real de los predicados. Esta información ha sido incorporada en la entrada léxica de una base de conocimiento léxico, de la cual presentamos la implementación.
En Riegel v. Medtronic Inc. (552 U.S.__2008; February 20, 2008), el Sr. Riegel tuvo que ser sometido a un by-pass como consecuencia de la rotura del catéter, fabricado por Medtronic, con el que su médico le practicaba una angioplastia. A pesar de que el catéter había obtenido la autorización de comercialización de la FDA y cumplía los requisitos de seguridad previstos por el sistema regulatorio federal, el Sr. Riegel y su mujer interpusieron una acción de daños contra Medtronic –y no contra el médico- conforme a las reglas de responsabilidad civil objetiva y por negligencia del Common Law neoyorquino. Sin embargo, el Tribunal Supremo federal de los EE.UU., en ponencia del Magistrado Antonin Gregory Scalia, votó, por mayoría de ocho magistrados, rechazar el recurso de la Sra. Riegel y confirmar la sentencia de segunda instancia, desestimatoria de la demanda, porque consideró que la regla de primacía del derecho regulatorio federal sobre seguridad de productos sanitarios [Medical Device Amendments de 1976, 21 U.S.C. Artículo 360k(a)] excluye la aplicabilidad no sólo del derecho regulatorio estatal sobre seguridad de productos sanitarios, sino también del Common Law sobre responsabilidad civil del fabricante.
En este artículo se analiza un grupo de alternancias de diátesis que denotan la misma oposición de significado: el cambio de foco de los participantes en el evento. El estudio se ha llevado a cabo a partir de un estudio de 1.000 verbos del castellano, el inglés y el catalán. Nuestro objetivo es poner de manifiesto la relación semántica existente entre diversas construcciones que tradicionalmente han sido estudiadas de manera independiente debido a sus diferencias formales pero que expresan una misma oposición semántica. La elección de una u otra alternancia corresponde a diferentes finalidades comunicativas.
In this paper we present ClInt (Clinical Interview), a bilingual Spanish-Catalan spoken corpus that contains 15 hours of clinical interviews. It consists of audio files aligned with multiple-level transcriptions comprising orthographic, phonetic and morphological information, as well as linguistic and extralinguistic encoding. This is a previously non-existent resource for these languages and it offers a wide-ranging exploitation potential in a broad variety of disciplines such as Linguistics, Natural Language Processing and related fields.
CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition.
This article examines the mainstream categorical definition of coreference as "identity of reference." It argues that coreference is best handled when identity is treated as a continuum, ranging from full identity to non-identity, with room for near-identity relations to explain currently problematic cases. This middle ground is needed to account for those linguistic expressions in real text that stand in relations that are neither full coreference nor non-coreference, a situation that has led to contradictory treatment of cases in previous coreference annotation efforts. We discuss key issues for coreference such as conceptual categorization, individuation, criteria of identity, and the discourse model construct. We redefine coreference as a scalar relation between two (or more) linguistic expressions that refer to discourse entities considered to be at the same granularity level relevant to the linguistic and pragmatic context. We view coreference relations in terms of mental space theory and discuss a large number of real life examples that show near-identity at different degrees.