875 resultados para Corpus de aprendizes
Resumo:
This paper presents and discusses initiatives taken in a public university in the state of São Paulo in order to collect and organize a corpus composed of argumentative texts to be part of Br-ICLE – a subcorpus of ICLE1 – which is composed of texts produced by Brazilian students enrolled in the courses of Arts and Languages and Translation. The discussion is based on a three-year project in which we could observe underuse and overuse features in the texts produced by Brazilian undergraduate students.
Resumo:
In this paper we discuss and present the first results gathered from the compilation of a learner corpus comprised of texts written by university students (Language, Literature and Translation Course). We made use of Corpus Linguistics and observations of researchers from the learner’s corpus field, in order to compile and analyze a corpus of argumentative and descriptive texts written in Spanish. Four hundred compositions were collected (about 120 thousand words) from August 2011 to December 2012. The methodology adopted assisted us considerably in maintaining a comprehensive working agenda, taking into consideration students’ needs and using the data collected as subsidy to improve classroom management of content. We also present the difficulties faced during the data gathering and propose procedures do avoid or minimize them.
Resumo:
The aim of this article is to present part of the results of a study concerning the lexicon employed by students learning Spanish language in a Languages degree (Undergraduate Education program). We describe the use and the context in which two verbal forms occur. We made use of Corpus Linguistics theory in order to compile two corpora of descriptive and argumentative compositions and observe the use of two Spanish verbs (haber and tener – third person singular). 250 compositions from first and second year students were collected. The WordSmith Tools software was applied to generate the list of words and the list of concordance. The verbal forms hay and tiene were the most used, and in some cases they were applied inappropriately when compared to the traditional Spanish grammar and to an electronic corpus. The results were discussed in class and were important to raise consciousness in relation to the students’ textual production.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
El presente trabajo tiene como objetivo hacer una representación de los “errores” producidos por los aprendices de español del Curso de Letras (Habilitación en Español de la Universidad Federal de Uberlândia. Para este fin, fue compilado un corpus lingüístico a partir de las producciones orales y escritos de los alumnos del segundo, cuarto, sexto y octavo semestre. Los principales temas y autores que dieron sustento teórico a nuestro estudio, en cuanto a los análisis descriptivos fueron: Interlengua (CORDER, 1967; SELINKER, 1972; BARALO, 1999, 2004; DURAO, 2007), Lingüística Contrastiva (SÖHRMAN, 2007), Modelo para Análisis de Errores (DURAO, 2004; ANDRADE, 2011; SANTOS GARGALLO, 2004), entre los principales. Cabe destacar que adoptamos una perspectiva de análisis de base empírica, apoyados en los subsidios que propicia la Lingüística de Corpus (BERBER SARDINHA, 2004). Otro componente importante en esta tesis fue la metodología. Se detalla paso a paso desde el levantamiento y lectura del referencial teórico, hasta la finalización del proceso de escritura del trabajo. Presentándose de esta manera como un futuro referencial para investigaciones que se basan en la utilización de LC como abordaje metodológica, y en el análisis de errores de aprendices. Los análisis desarrollados en el transcurso de este trabajo, comprendieron primeramente el dimensionamiento de los corpora utilizados, seguido de listas de las palabras más recurrentes, análisis cuantitativos y cualitativos, los cuales constituyeron un mapeo de los “errores”, otorgando de esta manera, un valor potencial al tratarse de un estudio que podrá ser utilizado como referente para una eventual elaboración de material didáctico, pensado especialmente para las clases de español que ofrece el Curso de Letras/Habilitación en Español de la Universidad Federal de Uberlândia.
Resumo:
O presente estudo tem como objetivo geral traçar um perfil das escolhas léxico-gramaticais da escrita em inglês de um grupo de aprendizes brasileiros na cidade do Rio de Janeiro, ao longo dos anos de 2009 a 2012, através da análise de sua produção de quadrigramas (ou blocos de quatro itens lexicais usados com frequência por vários aprendizes) em composições escritas como parte da avaliação final de curso. Como objetivo específico, a pesquisa pretendeu analisar se os quadrigramas produzidos estavam dentre aqueles que haviam sido previamente ensinados para a execução da redação ou se pertenceriam a alguma outra categoria, isto é, quadrigramas já incorporados ao uso da língua ou quadrigramas errôneos usados com abrangência pela população investigada. Para tal, foram coletadas composições escritas por aprendizes de mesmo nível de proficiência de várias filiais de um mesmo curso livre de inglês na cidade do Rio de Janeiro. Em seguida, essas composições foram digitadas e anotadas para constituírem um corpus digital facilmente identificável em termos do tipo e gênero textual, perfil do aprendiz, filial e área de origem do Rio de Janeiro. O estudo faz uso de preceitos e métodos da Linguística de Corpus, área da Linguística que compila grandes quantidades de textos e deles extrai dados com o auxílio de um programa de computador para mapear uso, frequência, distribuição e abrangência de determinados fenômenos linguístico ou discursivo. O resultado demonstra que os aprendizes investigados usaram poucos quadrigramas ensinados e, coletivamente, preferiram usar outros que não haviam sido ensinados nas aulas específicas para o nível cursado. O estudo também demonstrou que quando o gênero textual faz parte de seu mundo pessoal, os aprendizes parecem utilizar mais quadrigramas previamente ensinados. Isto pode querer dizer que o gênero pode influenciar nas escolhas léxico-gramaticais corretas. O estudo abre portas para se compreender a importância de blocos léxico-gramaticais em escrita em L2 como forma de assegurar fluência e acuracidade no idioma e sugere que é preciso proporcionar maiores oportunidades de prática e conscientização dos aprendizes quanto ao uso de tais blocos
Resumo:
Due to factors such as globalization and the demography of Hispanic countries, the Spanish language has influenced worldwide, especially since the last decade of the 20th century, mainly countries which are geographically close to Spanish American countries, such as Brazil. In this context, it is necessary to think of the linguistic variation of the Spanish language, in which we emphasize the lexical variation. As any language, Spanish has a colossal and wealthy lexicon with different varieties related to the characteristics from each region: history, culture, customs etc. This diversity influences the development, the amplification and the renewal of a language, as well as the teaching-learning process. Thus, we proposed to analyze the lexicon varieties of Spanish language entries in some Spanish-Portuguese bilingual dictionaries for Brazilian learners. In order to achieve this goal, we selected some examples of lexical variety in a corpus organized with texts of different textual genres and we verified if these lexical items are registered in the dictionaries chosen. As the corpus is organized from texts present in didactic manuals used in Brazil, our goal is to verify if the vocabulary which the Brazilian learner has contact in formal education is registered in the analyzed dictionaries.
Resumo:
This research, theoretically founded on Corpus Linguistics and Phraseology, has the purpose of extracting and analyzing general language and specialized collocations in the medical field, taken from a parallel corpus comprised of transcriptions of the TV serial Grey’s Anatomy. Based on this extraction, it is proposed a compilation of a bilingual glossary, so that the referred material can be used by learner translators as well as English language teachers.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The QUT-NOISE-TIMIT corpus consists of 600 hours of noisy speech sequences designed to enable a thorough evaluation of voice activity detection (VAD) algorithms across a wide variety of common background noise scenarios. In order to construct the final mixed-speech database, a collection of over 10 hours of background noise was conducted across 10 unique locations covering 5 common noise scenarios, to create the QUT-NOISE corpus. This background noise corpus was then mixed with speech events chosen from the TIMIT clean speech corpus over a wide variety of noise lengths, signal-to-noise ratios (SNRs) and active speech proportions to form the mixed-speech QUT-NOISE-TIMIT corpus. The evaluation of five baseline VAD systems on the QUT-NOISE-TIMIT corpus is conducted to validate the data and show that the variety of noise available will allow for better evaluation of VAD systems than existing approaches in the literature.
Resumo:
Extracellular matrix regulates many cellular processes likely to be important for development and regression of corpora lutea. Therefore, we identified the types and components of the extracellular matrix of the human corpus luteum at different stages of the menstrual cycle. Two different types of extracellular matrix were identified by electron microscopy; subendothelial basal laminas and an interstitial matrix located as aggregates at irregular intervals between the non-vascular cells. No basal laminas were associated with luteal cells. At all stages, collagen type IV α1 and laminins α5, β2 and γ1 were localized by immunohistochemistry to subendothelial basal laminas, and collagen type IV α1 and laminins α2, α5, β1 and β2 localized in the interstitial matrix. Laminin α4 and β1 chains occurred in the subendothelial basal lamina from mid-luteal stage to regression; at earlier stages, a punctate pattern of staining was observed. Therefore, human luteal subendothelial basal laminas potentially contain laminin 11 during early luteal development and, additionally, laminins 8, 9 and 10 at the mid-luteal phase. Laminin α1 and α3 chains were not detected in corpora lutea. Versican localized to the connective tissue extremities of the corpus luteum. Thus, during the formation of the human corpus luteum, remodelling of extracellular matrix does not result in basal laminas as present in the adrenal cortex or ovarian follicle. Instead, novel aggregates of interstitial matrix of collagen and laminin are deposited within the luteal parenchyma, and it remains to be seen whether this matrix is important for maintaining the luteal cell phenotype.
Resumo:
In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.
Resumo:
Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.