956 resultados para Aligned Corpus


Relevância:

20.00% 20.00%

Publicador:

Resumo:

False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the words’ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the followed methodology to automatically generate titles for a corpus of questions that belong to sociological opinion polls. Titles for questions have a twofold function: (1) they are the input of user searches and (2) they inform about the whole contents of the question and possible answer options. Thus, generation of titles can be considered as a case of automatic summarization. However, the fact that summarization had to be performed over very short texts together with the aforementioned quality conditions imposed on new generated titles led the authors to follow knowledge-rich and domain-dependent strategies for summarization, disregarding the more frequent extractive techniques for summarization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research in social psychology has shown that public attitudes towards feminism are mostly based on stereotypical views linking feminism with leftist politics and lesbian orientation. It is claimed that such attitudes are due to the negative and sexualised media construction of feminism. Studies concerned with the media representation of feminism seem to confirm this tendency. While most of this research provides significant insights into the representation of feminism, the findings are often based on a small sample of texts. Also, most of the research was conducted in an Anglo-American setting. This study attempts to address some of the shortcomings of previous work by examining the discourse of feminism in a large corpus of German and British newspaper data. It does so by employing the tools of Corpus Linguistics. By investigating the collocation profiles of the search term feminism, we provide evidence of salient discourse patterns surrounding feminism in two different cultural contexts. © The Author(s) 2012.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian parallel texts. We discuss some differences in the approaches and the interpretation of some concepts, as well as various problems associated with the construction of our corpus, in particular the occasional ‘nonparallelism’ of original and translated texts. We give examples of the application of the parallel corpus for the study of lexical semantics and note the outstanding role of the corpus in the lexicographic description of Ukrainian and Bulgarian translation equivalents. We draw attention to the importance of creating parallel corpora as objects of national as well as global cultural heritage.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we develop a new entropic matching kernel for weighted graphs by aligning depth-based representations. We demonstrate that this kernel can be seen as an aligned subtree kernel that incorporates explicit subtree correspondences, and thus addresses the drawback of neglecting the relative locations between substructures that arises in the R-convolution kernels. Experiments on standard datasets demonstrate that our kernel can easily outperform state-of-the-art graph kernels in terms of classification accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, I concentrate on court cases with litigants in person (lay people who act on their own behalf in legal proceedings without a counsel or solicitor) and discuss the challenges of building a corpus of courtroom discourse where it is crucial to distinguish between speakers due to their distinct institutional roles. The corpus incorporates seven sub-corpora of verbatim transcripts from different court cases with litigants in person and comprises over eleven-million tokens. The focus of this paper is on the interplay between the legal and lay discourse types and how judges project their institutional roles through well-initiated turns directed at litigants in person and counsels. As a versatile discourse marker, well provides a good opportunity to explore how judges have to adapt their roles to ensure lay litigants in person receive the necessary support and that their lack of competence does not impede on the fairness of the proceedings. Given the breadth and importance of the topic of litigation in person, I discuss how the tools and approaches of corpus linguistics can be helpful in this multi-disciplinary area where multiple functions and uses of individual linguistic features need to be explored in depth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction: Resveratrol (RVT) found in red wine protects against erectile dysfunction and relaxes penile tissue (corpus cavernosum) via a nitric oxide (NO) independent pathway. However, the mechanism remains to be elucidated. Hydrogen sulfide (H2S) is a potent vasodilator and neuromodulator generated in corpus cavernosum. Aims: We investigated whether RVT caused the relaxation of mice corpus cavernosum (MCC) through H2S. Methods: H2S formation is measured by methylene blue assay and vascular reactivity experiments have been performed by DMT strip myograph in CD1 MCC strips. Main Outcome Measures: Endothelial NO synthase (eNOS) inhibitor Nω-Nitro-L-arginine (L-NNA, 0.1mM) or H2S inhibitor aminooxyacetic acid (AOAA, 2mM) which inhibits both cystathionine-β-synthase (CBS) and cystathionine-gamma-lyase (CSE) enzyme or combination of AOAA with PAG (CSE inhibitor) has been used in the presence/absence of RVT (0.1mM, 30min) to elucidate the role of NO or H2S pathways on the effects of RVT in MCC. Concentration-dependent relaxations to RVT, L-cysteine, sodium hydrogen sulfide (NaHS) and acetylcholine (ACh) were studied. Results: Exposure of murine corpus cavernosum to RVT increased both basal and L-cysteine-stimulated H2S formation. Both of these effects were reversed by AOAA but not by L-NNA. RVT caused concentration-dependent relaxation of MCC and that RVT-induced relaxation was significantly inhibited by AOAA or AOAA+PAG but not by L-NNA. L-cysteine caused concentration-dependent relaxations, which are inhibited by AOAA or AOAA+PAG significantly. Incubation of MCC with RVT significantly increased L-cysteine-induced relaxation, and this effect was inhibited by AOAA+PAG. However, RVT did not alter the effect of exogenous H2S (NaHS) or ACh-induced relaxations. Conclusions: These results demonstrate that RVT-induced relaxation is at least partly dependent on H2S formation and acts independent of eNOS pathway. In phosphodiesterase 5 inhibitor (PDE-5i) nonresponder population, combination therapy with RVT may reverse erectile dysfunction via stimulating endogenous H2S formation. Yetik-Anacak G, Dereli MV, Sevin G, Ozzayim O, Erac Y, and Ahmed A. Resveratrol stimulates hydrogen sulfide (H2S) formation to relax murine corpus cavernosum.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Starting with a description of the software and hardware used for corpus linguistics in the late 1980s to early 1990s, this contribution discusses difficulties faced by the software designer when attempting to allow users to study text. Future human-machine interfaces may develop to be much more sophisticated, and certainly the aspects of text which can be studied will progress beyond plain text without images. Another area which will develop further is the study of patternings involving not just single words but word-relations across large stretches of text.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article I argue that the study of the linguistic aspects of epistemology has become unhelpfully focused on the corpus-based study of hedging and that a corpus-driven approach can help to improve upon this. Through focusing on a corpus of texts from one discourse community (that of genetics) and identifying frequent tri-lexical clusters containing highly frequent lexical items identified as keywords, I undertake an inductive analysis identifying patterns of epistemic significance. Several of these patterns are shown to be hedging devices and the whole corpus frequencies of the most salient of these, candidate and putative, are then compared to the whole corpus frequencies for comparable wordforms and clusters of epistemic significance. Finally I interviewed a ‘friendly geneticist’ in order to check my interpretation of some of the terms used and to get an expert interpretation of the overall findings. In summary I argue that the highly unexpected patterns of hedging found in genetics demonstrate the value of adopting a corpus-driven approach and constitute an advance in our current understanding of how to approach the relationship between language and epistemology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent decades, spillover has become a highly influential concept which has led to the initiation of new theoretical and methodological approaches that are designed to understand how people attempt to reconcile their work and private lives. The very notion of spillover presupposes that these spheres are connected, since the people who move between them bring certain ‘less visible’ content with them such as cognitive or affective mental constructs, skills, behaviors, etc. This paper attempts to create fresh insight into the different areas, themes and methodologies related to how spillover has been addressed over the last ten years. Four main categories are discussed based on the 76 academic articles that were selected: (1) general spillover research, (2) job flexibility and spillover, (3) individual coping strategies, and (4) the spillover effect on the different genders. The final section of the paper provides a tentative synthesis of the main conclusions and findings from the examined papers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Carbon nanotubes (CNTs) have become one of the most interesting allotropes of carbon due to their intriguing mechanical, electrical, thermal and optical properties. The synthesis and electron emission properties of CNT arrays have been investigated in this work. Vertically aligned CNTs of different densities were synthesized on copper substrate with catalyst dots patterned by nanosphere lithography. The CNTs synthesized with catalyst dots patterned by spheres of 500 nm diameter exhibited the best electron emission properties with the lowest turn-on/threshold electric fields and the highest field enhancement factor. Furthermore, CNTs were treated with NH3 plasma for various durations and the optimum enhancement was obtained for a plasma treatment of 1.0 min. CNT point emitters were also synthesized on a flat-tip or a sharp-tip to understand the effect of emitter geometry on the electron emission. The experimental results show that electron emission can be enhanced by decreasing the screening effect of the electric field by neighboring CNTs. In another part of the dissertation, vertically aligned CNTs were synthesized on stainless steel (SS) substrates with and without chemical etching or catalyst deposition. The density and length of CNTs were determined by synthesis time. For a prolonged growth time, the catalyst activity terminated and the plasma started etching CNTs destructively. CNTs with uniform diameter and length were synthesized on SS substrates subjected to chemical etching for a period of 40 minutes before the growth. The direct contact of CNTs with stainless steel allowed for the better field emission performance of CNTs synthesized on pristine SS as compared to the CNTs synthesized on Ni/Cr coated SS. Finally, fabrication of large arrays of free-standing vertically aligned CNT/SnO2 core-shell structures was explored by using a simple wet-chemical route. The structure of the SnO2 nanoparticles was studied by X-ray diffraction and electron microscopy. Transmission electron microscopy reveals that a uniform layer of SnO2 is conformally coated on every tapered CNT. The strong adhesion of CNTs with SS guaranteed the formation of the core-shell structures of CNTs with SnO2 or other metal oxides, which are expected to have applications in chemical sensors and lithium ion batteries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El presente trabajo tiene como objetivo hacer una representación de los “errores” producidos por los aprendices de español del Curso de Letras (Habilitación en Español de la Universidad Federal de Uberlândia. Para este fin, fue compilado un corpus lingüístico a partir de las producciones orales y escritos de los alumnos del segundo, cuarto, sexto y octavo semestre. Los principales temas y autores que dieron sustento teórico a nuestro estudio, en cuanto a los análisis descriptivos fueron: Interlengua (CORDER, 1967; SELINKER, 1972; BARALO, 1999, 2004; DURAO, 2007), Lingüística Contrastiva (SÖHRMAN, 2007), Modelo para Análisis de Errores (DURAO, 2004; ANDRADE, 2011; SANTOS GARGALLO, 2004), entre los principales. Cabe destacar que adoptamos una perspectiva de análisis de base empírica, apoyados en los subsidios que propicia la Lingüística de Corpus (BERBER SARDINHA, 2004). Otro componente importante en esta tesis fue la metodología. Se detalla paso a paso desde el levantamiento y lectura del referencial teórico, hasta la finalización del proceso de escritura del trabajo. Presentándose de esta manera como un futuro referencial para investigaciones que se basan en la utilización de LC como abordaje metodológica, y en el análisis de errores de aprendices. Los análisis desarrollados en el transcurso de este trabajo, comprendieron primeramente el dimensionamiento de los corpora utilizados, seguido de listas de las palabras más recurrentes, análisis cuantitativos y cualitativos, los cuales constituyeron un mapeo de los “errores”, otorgando de esta manera, un valor potencial al tratarse de un estudio que podrá ser utilizado como referente para una eventual elaboración de material didáctico, pensado especialmente para las clases de español que ofrece el Curso de Letras/Habilitación en Español de la Universidad Federal de Uberlândia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El estudio de las relaciones causales y su expresión lingüística ha sido comúnmente estudiado desde diferentes perspectivas en los años recientes. Sin embargo, pocos estudios han intentado combinar diferentes enfoques para establecer el significado de estas relaciones, y han investigado de manera contrastiva las señales usadas para expresarlas. Este trabajo de fin de master es un proyecto para avanzar el conocimiento en este área mediante la investigación de: a) la posibilidad de caracterizar las relaciones causales en diferentes tipos, usando características que combinan un enfoque funcional y cognitivo; b) los tipos de relaciones causales preferidas en los textos expositivos en inglés y sus traducciones al español; c) las expresiones lingüísticas preferidas para expresar dichas relaciones causales en los textos originales en inglés y sus traducciones al español. La metodología usada en esta investigación se basa en la anotación manual de un corpus bilingüe compuesto de un total de 37 textos expositivos (incluyendo los textos originales en inglés y sus traducciones al español) extraídos del corpus MULTINOT, un corpus de alta calidad, con registros diversificados y multifuncional bilingüe inglésespañol, actualmente compilado y anotado multidimensionalmente por los miembros del grupo de investigación FUNCAP con el proyecto MULTINOT (véase Lavid et al.2015) El estudio se llevó a cabo en cuatro pasos principales: primero, un esquema de anotación para las relaciones causales en inglés y español fue diseñado constando de tres sistemas interrelacionados y sus correspondientes características; tras ello, se compiló un inventario de señales para las relaciones causales en inglés y español, y una categorización en diferentes tipos; seguidamente, el esquema de anotación fue implementado en la herramienta UAM Corpus Tool y el conjunto de textos bilingües fue anotado por el autor de este estudio; finalmente, los datos extraídos de la anotación fueron analizados estadísticamente para comprobar las posibles diferencias entre los textos originales en inglés y sus traducciones al español respecto a la selección del tipo de relación de causa y sus señales. El análisis estadístico de los datos anotados sugiere que los tipos de relaciones de causa preferidos en los textos originales en inglés y son los tipos de contenido y no volitivos, que el orden de aparición de estos tipos de señales preferido es la segunda posición, y las señales más recurrentes usadas para expresar dichas relaciones son las conjunciones, seguidas de los sintagmas verbales. El análisis de las traducciones al español revela un alto grado de similitud con los datos de los textos originales en inglés, lo que sugiere que en las traducciones al español se conservan las preferencias de los textos originales en la mayoría de los casos y que estas elecciones pueden considerarse un indicativo de los textos expositivos en inglés. Proyectos futuros se centraran en el análisis de los textos originales en español para comprobar si las tendencias observadas en los textos originales en inglés y sus traducciones al español son también validas en textos originales en español, y en la especificación de patrones que puede ayudar al análisis automático de estas relaciones