960 resultados para Linguistica de Corpus


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Corpus Linguistics is a young discipline. The earliest work was done in the 1960s, but corpora only began to be widely used by lexicographers and linguists in the late 1980s, by language teachers in the late 1990s, and by language students only very recently. This course in corpus linguistics was held at the Departamento de Linguistica Aplicada, E.T.S.I. de Minas, Universidad Politecnica de Madrid from June 15-19 1998. About 45 teachers registered for the course. 30% had PhDs in linguistics, 20% in literature, and the rest were doctorandi or qualified English teachers. The course was designed to introduce the use of corpora and other computational resources in teaching and research, with special reference to scientific and technological discourse in English. Each participant had a computer networked with the lecturer’s machine, whose display could be projected onto a large screen. Application programs were loaded onto the central server, and telnet and a web browser were available. COBUILD gave us permission to access the 323 million word Bank of English corpus, Mike Scott allowed us to use his Wordsmith Tools software, and Tim Johns gave us a copy of his MicroConcord program.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El presente trabajo tiene como objetivo hacer una representación de los “errores” producidos por los aprendices de español del Curso de Letras (Habilitación en Español de la Universidad Federal de Uberlândia. Para este fin, fue compilado un corpus lingüístico a partir de las producciones orales y escritos de los alumnos del segundo, cuarto, sexto y octavo semestre. Los principales temas y autores que dieron sustento teórico a nuestro estudio, en cuanto a los análisis descriptivos fueron: Interlengua (CORDER, 1967; SELINKER, 1972; BARALO, 1999, 2004; DURAO, 2007), Lingüística Contrastiva (SÖHRMAN, 2007), Modelo para Análisis de Errores (DURAO, 2004; ANDRADE, 2011; SANTOS GARGALLO, 2004), entre los principales. Cabe destacar que adoptamos una perspectiva de análisis de base empírica, apoyados en los subsidios que propicia la Lingüística de Corpus (BERBER SARDINHA, 2004). Otro componente importante en esta tesis fue la metodología. Se detalla paso a paso desde el levantamiento y lectura del referencial teórico, hasta la finalización del proceso de escritura del trabajo. Presentándose de esta manera como un futuro referencial para investigaciones que se basan en la utilización de LC como abordaje metodológica, y en el análisis de errores de aprendices. Los análisis desarrollados en el transcurso de este trabajo, comprendieron primeramente el dimensionamiento de los corpora utilizados, seguido de listas de las palabras más recurrentes, análisis cuantitativos y cualitativos, los cuales constituyeron un mapeo de los “errores”, otorgando de esta manera, un valor potencial al tratarse de un estudio que podrá ser utilizado como referente para una eventual elaboración de material didáctico, pensado especialmente para las clases de español que ofrece el Curso de Letras/Habilitación en Español de la Universidad Federal de Uberlândia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research analyzes the average previous stressed vowels [ε] and [e] and later [ɔ] and [o] in nominal and verbal forms in the 1st person singular and 3rd person singular and plural in the present tense, specifically the umlaut process of mid vowels /e/ and /o/, which assimilate in /ε/ and /ᴐ/ in stressed position. The general objective of this research is to describe and quantify the occurrence of umlaut and subsequently analyze in which words there is regularity or not. As specific objectives we have: i) to compile and to label an oral, spontaneous, synchronic and regional corpus, from radio programs produced in the city of Ituiutaba, Minas Gerais; ii) to describe the characteristics of the corpus to be compiled; iii) to investigate the alternating timbre of mid vowels in stressed position; iv) to identify instances of nominal and verbal umlaut of the middle vowels in stressed position; v) to describe the identified cases of nominal and verbal umlaut; vi) to analyze the probable causes for the variation of the middle vowels. To perform the proposed analysis, we have adopted as a theoretical-methodological basis multi-representational models: Phonology of Use (BYBEE, 2001) and Exemplar Theory (PIERREHUMBERT, 2001) combined with the precepts of Corpus Linguistics (BEBER SARDINHA, 2004). The corpus consisted of 16 radio programs – eight political and eight religious – from the city of Ituiutaba-MG, with recordings of about 20 to 40 minutes. We note, by means of the results generated by WordSmith Tools® software, version 6.0 (SCOTT, 2012), that the analyzed forms show little variation, which shows that the umlaut is a process already lexicalized in participants of the radio programs analyzed. We conclude that the results converge with the proposal of the Phonology of Use (BYBEE, 2001; PHILLIPS, 1984) that less frequent words that have no phonetic environment conducive to change, are changed first.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research investigated the nasality of vowels in the spontaneous speech of inhabitants of the quilombola communities of Brejo dos Crioulos and Poções (MG). As a theoretical framework, we based on the assumptions of Phonetics and Phonology, in renowned scholars on the investigation of nasality (CAGLIARI, 1977; CÂMARA JR., 1984, 2013; BISOL, 2013; ABAURRE; PAGOTTO, 1996; SILVA, 2015), with subsidies of the Corpus Linguistics. Its general goal was to investigate the occurrence of nasality, in the dialect of these quilombola communities, and their linguistic behavior, considering the linguistic factors that can interfere in the phenomenon. Specifically it was aimed to a) detect the occurrence of nasalized vowels with the help of the resources that the Corpus Linguistics provides (Praat and WorldSmith Tolls); b) discriminate the different types of occurring contexts of nasalized vowels; c) make quantitative and qualitative analyzes of the nasalized vowels in the study corpus; d) describe and analyze the behavior of nasalized vowels and; e) contrast the values of F1 and F2 of the oral and nasalized vowels. It was hypothesized that the nasality happens because it is conditioned by the nasal segment following the nasalized vowel - phonological process of “assimilation” - its position as the primary stress and grammatical category. It was believed that the quilombolas communities of Brejo dos Crioulos and Poções produce nasalized vowels in their speech and this linguistic phenomenon is favored by the adjacent presence of consonants or nasal vowels. Furthermore, it was hypothesized that the values of F1 and F2 of oral and nasalized vowels in these communities are distinct. The following research questions were elaborated: (i) is the presence of nasalized vowels in the speech of these quilombola communities conditioned to the presence of a nasal sound segment? (ii) does the nasal sound segment following the nasalized vowel favor the occurrence of the nasality phenomenon? is there a difference between the values of F1 and F2 of the oral and nasalized vowels in both quilombola communities considered? To compose our corpus, 24 interviews recordings were used (12 female speakers and 12 male speakers), a total of 24 participants. It was found that the following nasal sound segment tends to condition the nasalized vowel. In general, it assimilates the lowering of the soft palate of nasal consonant segment immediately following, but there are cases of nasal vowel segment - regressive assimilation; the stressed syllable tends to favor the nasality, but it occurs in pretonic and postonic position as well; F1 and F2 values of oral and nasalized vowels in the quilombola communities of Poções and Brejo dos Crioulos are distinct: the group of Brejo dos Crioulos tends to produce the F1 of oral and nasalized vowels more lowered than the group of Poções and the F2, in a more anterior position. The nasality tends to occur in verbs and nouns, although it is not specific to a grammatical category. This research found cases of spurious nasalization, confirming previous studies. In turn, it revealed cases of lexical items with favorable context for nasalization, but with its non-occurrence. This last case, considered as the lowering of the uniform soft palate in PB, presented pronounced vowels without the soft palate lowering. That is, it was detected variation in the phenomenon of nasalization in PB. With this work, it was promoted the discussion about nasality, in order to contribute to the linguistic studies about the functioning of Brazilian Portuguese in this geographical context.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The QUT-NOISE-TIMIT corpus consists of 600 hours of noisy speech sequences designed to enable a thorough evaluation of voice activity detection (VAD) algorithms across a wide variety of common background noise scenarios. In order to construct the final mixed-speech database, a collection of over 10 hours of background noise was conducted across 10 unique locations covering 5 common noise scenarios, to create the QUT-NOISE corpus. This background noise corpus was then mixed with speech events chosen from the TIMIT clean speech corpus over a wide variety of noise lengths, signal-to-noise ratios (SNRs) and active speech proportions to form the mixed-speech QUT-NOISE-TIMIT corpus. The evaluation of five baseline VAD systems on the QUT-NOISE-TIMIT corpus is conducted to validate the data and show that the variety of noise available will allow for better evaluation of VAD systems than existing approaches in the literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Extracellular matrix regulates many cellular processes likely to be important for development and regression of corpora lutea. Therefore, we identified the types and components of the extracellular matrix of the human corpus luteum at different stages of the menstrual cycle. Two different types of extracellular matrix were identified by electron microscopy; subendothelial basal laminas and an interstitial matrix located as aggregates at irregular intervals between the non-vascular cells. No basal laminas were associated with luteal cells. At all stages, collagen type IV α1 and laminins α5, β2 and γ1 were localized by immunohistochemistry to subendothelial basal laminas, and collagen type IV α1 and laminins α2, α5, β1 and β2 localized in the interstitial matrix. Laminin α4 and β1 chains occurred in the subendothelial basal lamina from mid-luteal stage to regression; at earlier stages, a punctate pattern of staining was observed. Therefore, human luteal subendothelial basal laminas potentially contain laminin 11 during early luteal development and, additionally, laminins 8, 9 and 10 at the mid-luteal phase. Laminin α1 and α3 chains were not detected in corpora lutea. Versican localized to the connective tissue extremities of the corpus luteum. Thus, during the formation of the human corpus luteum, remodelling of extracellular matrix does not result in basal laminas as present in the adrenal cortex or ovarian follicle. Instead, novel aggregates of interstitial matrix of collagen and laminin are deposited within the luteal parenchyma, and it remains to be seen whether this matrix is important for maintaining the luteal cell phenotype.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although LH is essential for survival and function of the corpus luteum (CL) in higher primates, luteolysis occurs during nonfertile cycles without a discernible decrease in circulating LH levels. Using genome-wide expression analysis, several experiments were performed to examine the processes of luteolysis and rescue of luteal function in monkeys. Induced luteolysis with GnRH receptor antagonist (Cetrorelix) resulted in differential regulation of 3949 genes, whereas replacement with exogenous LH (Cetrorelix plus LH) led to regulation of 4434 genes (1563 down-regulation and 2871 up-regulation). A model system for prostaglandin (PG) F-2 alpha-induced luteolysis in the monkey was standardized and demonstrated that PGF(2 alpha) regulated expression of 2290 genes in the CL. Analysis of the LH-regulated luteal transcriptome revealed that 120 genes were regulated in an antagonistic fashion by PGF(2 alpha). Based on the microarray data, 25 genes were selected for validation by real-time RT-PCR analysis, and expression of these genes was also examined in the CL throughout the luteal phase and from monkeys treated with human chorionic gonadotropin (hCG) to mimic early pregnancy. The results indicated changes in expression of genes favorable to PGF(2 alpha) action during the late to very late luteal phase, and expressions of many of these genes were regulated in an opposite manner by exogenous hCG treatment. Collectively, the findings suggest that curtailment of expression of downstream LH-target genes possibly through PGF(2 alpha) action on the CL is among the mechanisms underlying cross talk between the luteotropic and luteolytic signaling pathways that result in the cessation of luteal function, but hCG is likely to abrogate the PGF(2 alpha)-responsive gene expression changes resulting in luteal rescue crucial for the maintenance of early pregnancy. (Endocrinology 150: 1473-1484, 2009)