867 resultados para corpus, collocations, corpus linguistics, EPTIC
Resumo:
[EN]The use of large corpora in the study of languages is a well established tradition. In the same vein, scholarship is also well represented in the case of the study of corpora for making grammars of languages. This is the case of the COBUILD grammar and dictionary and the case of the Longman Grammar of Spoken and Written English. This means that corpora have been analyzed in order to identify patterns in languages that can be later practised by learners following those patterns described and exemplified with real instances.
Resumo:
Corpora—large collections of written and/or spoken text stored and accessed electronically—provide the means of investigating language that is of growing importance academically and professionally. Corpora are now routinely used in the following fields: The production of dictionaries and other reference materials; The development of aids to translation; Language teaching materials; The investigation of ideologies and cultural assumptions; Natural language processing; and The investigation of all aspects of linguistic behaviour, including vocabulary, grammar and pragmatics.
Resumo:
Starting with a description of the software and hardware used for corpus linguistics in the late 1980s to early 1990s, this contribution discusses difficulties faced by the software designer when attempting to allow users to study text. Future human-machine interfaces may develop to be much more sophisticated, and certainly the aspects of text which can be studied will progress beyond plain text without images. Another area which will develop further is the study of patternings involving not just single words but word-relations across large stretches of text.
Le collocazioni in traduzione e interpretazione tra italiano e francese: Uno studio su eptic_01_2011
Resumo:
This dissertation aims at investigating differences in phraseological patterns in translated and interpreted language, on the basis of the intermodal corpus EPTIC_01_2011 and focusing on Italian and French. First of all, an overview is offered of the main studies and theories about corpus linguistics and collocations: the notion of corpus is defined and a typology (focusing on intermodal corpora) is presented, before moving on to the linguistic phenomenon of collocation and its investigation through corpus linguistics methods. Second, the general structure of EPTIC_01_2011 is presented, including the ways in which its texts have been assembled, edited through ad hoc conventions and enriched with metadata. The methodology proposed by Durrant and Schmitt (2009), slightly edited to fit the present study, has been used to extract and compare noun+adjective/adjective+noun bigrams from a quantitative point of view. A subset of these data have then been extracted and analysed manually. The results of the study are presented through graphs and examples, with an in-depth discussion of the bigrams considered. Lastly, the data collected are analysed and categorised in terms of shifts occurring in translation and in interpreting, potential causes are discussed and ideas for further research and for the development of the EPTIC corpus are sketched.
Resumo:
The aim of this dissertation is to investigate the differences in the phraseological patterns used by Italian and English translators and interpreters through the intermodal corpus EPTIC_01_2011. First, the most important studies and theories about corpus linguistics and collocations are introduced. After defining the notion of “corpus”, the different types of corpora are categorised, giving particular attention to the intermodal one. Then the dissertation focuses on a description of collocations, as defined by the main linguistics scholars, and it describes some attempts to apply corpus linguistics to the study of collocations. Secondly, EPTIC_01_2011 is presented, with a description of its structure and of the text editing process carried out applying specific editing conventions and adding a set of metadata before each text. The analysis of collocation candidate bigrams (adjective+noun/noun+adjective) from a quantitative point of view, was conducted applying a methodology adapted from Durrant and Schmitt (2009). Qualitative analysis was also performed on a subsection of the data. The results of the study are presented through examples and graphs, giving particular attention to the interpretation of the data analysed from a qualitative perspective. Finally, results are summarised and categorised, and suggestions are made concerning the diverging choices made in translation and interpreting. The final section concentrates on further studies that could be carried out in the future, as well as on suggestions for corpus enlargement.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
This research, theoretically founded on Corpus Linguistics and Phraseology, has the purpose of extracting and analyzing general language and specialized collocations in the medical field, taken from a parallel corpus comprised of transcriptions of the TV serial Grey’s Anatomy. Based on this extraction, it is proposed a compilation of a bilingual glossary, so that the referred material can be used by learner translators as well as English language teachers.
Resumo:
This study reports a diachronic corpus investigation of common-number pronouns used to convey unknown or otherwise unspecified reference. The study charts agreement patterns in these pronouns in various diachronic and synchronic corpora. The objective is to provide base-line data on variant frequencies and distributions in the history of English, as there are no previous systematic corpus-based observations on this topic. This study seeks to answer the questions of how pronoun use is linked with the overall typological development in English and how their diachronic evolution is embedded in the linguistic and social structures in which they are used. The theoretical framework draws on corpus linguistics and historical sociolinguistics, grammaticalisation, diachronic typology, and multivariate analysis of modelling sociolinguistic variation. The method employs quantitative corpus analyses from two main electronic corpora, one from Modern English and the other from Present-day English. The Modern English material is the Corpus of Early English Correspondence, and the time frame covered is 1500-1800. The written component of the British National Corpus is used in the Present-day English investigations. In addition, the study draws supplementary data from other electronic corpora. The material is used to compare the frequencies and distributions of common-number pronouns between these two time periods. The study limits the common-number uses to two subsystems, one anaphoric to grammatically singular antecedents and one cataphoric, in which the pronoun is followed by a relative clause. Various statistical tools are used to process the data, ranging from cross-tabulations to multivariate VARBRUL analyses in which the effects of sociolinguistic and systemic parameters are assessed to model their impact on the dependent variable. This study shows how one pronoun type has extended its uses in both subsystems, an increase linked with grammaticalisation and the changes in other pronouns in English through the centuries. The variationist sociolinguistic analysis charts how grammaticalisation in the subsystems is embedded in the linguistic and social structures in which the pronouns are used. The study suggests a scale of two statistical generalisations of various sociolinguistic factors which contribute to grammaticalisation and its embedding at various stages of the process.
Resumo:
The present study provides a usage-based account of how three grammatical structures, declarative content clauses, interrogative content clause and as-predicative constructions, are used in academic research articles. These structures may be used in both knowledge claims and citations, and they often express evaluative meanings. Using the methodology of quantitative corpus linguistics, I investigate how the culture of the academic discipline influences the way in which these constructions are used in research articles. The study compares the rates of occurrence of these grammatical structures and investigates their co-occurrence patterns in articles representing four different disciplines (medicine, physics, law, and literary criticism). The analysis is based on a purpose-built 2-million-word corpus, which has been part-of-speech tagged. The analysis demonstrates that the use of these grammatical structures varies between disciplines, and further shows that the differences observed in the corpus data are linked with differences in the nature of knowledge and the patterns of enquiry. The constructions in focus tend to be more frequently used in the soft disciplines, law and literary criticism, where their co-occurrence patterns are also more varied. This reflects both the greater variety of topics discussed in these disciplines, and the higher frequency of references to statements made by other researchers. Knowledge-building in the soft fields normally requires a careful contextualisation of the arguments, giving rise to statements reporting earlier research employing the constructions in focus. In contrast, knowledgebuilding in the hard fields is typically a cumulative process, based on agreed-upon methods of analysis. This characteristic is reflected in the structure and contents of research reports, which offer fewer opportunities for using these constructions.
Resumo:
A presente pesquisa tem por objetivo investigar como a palavra paz é entendida, em termos de conceito, pelo Conselho de Segurança da Organização das Nações Unidas. Para tanto, são analisados trinta e sete relatórios oficiais produzidos pelo Conselho de Segurança, no período de agosto de 1994 a junho de 2009, acerca das missões de paz realizadas em trinta e uma regiões/países que apresentavam ameaça à paz e à segurança internacionais durante aquele período. De acordo com a Conselheira Gilda Santos Neves, chefe da Divisão das Nações Unidas do Ministério das Relações Exteriores, em seu texto O Brasil e a Criação da Comissão para a Consolidação da Paz (2008), a paz é algo que se consolida e não se constrói. Tal posição norteia a presente pesquisa, uma vez que o objetivo aqui é mapear as expressões linguísticas realizadas através da palavra paz. As bases teóricas desta pesquisa encontram-se fundamentadas na teoria da metáfora cognitiva, de Lakoff e Johnson (1980), bem como no estudo de Deignan (2005) em seu livro intitulado Metaphor and Corpus Linguistics, que visa a fornecer os benefícios que a abordagem cognitiva de metáforas pode obter através da análise de corpora digitalizados. Após compilar os relatórios do Conselho de Segurança e prepará-los para serem lidos pelo programa computacional WordSmith Tools 3.0, foram extraídas todas as ocorrências da palavra paz dos referidos relatórios. Das 686 ocorrências geradas, foram deixadas para análise somente aquelas com sentido metafórico e, no total, nove esquemas conceptuais foram construídos. A pesquisa feita sugere que, para o Conselho de Segurança, a paz é algo profundamente desejado tanto pela população das zonas de conflito quanto pela comunidade internacional. No entanto, a paz não é facilmente construída ou estabelecida. Alcançar a paz implica seguir um processo com diferentes etapas, ou seja, com início, meio e fim, bem como superar obstáculos e retrocessos que surgem no meio do caminho. Para tanto, diversos investimentos têm de ser feitos por todos aqueles envolvidos e realmente interessados na paz mundial. Por fim, vê-se que a visão da Conselheira Gilda Santos Neves, de acordo com as metáforas aqui analisadas, está correta, já que, conforme apontam os resultados do presente estudo, o conceito de paz, para o Conselho de Segurança, não é o de algo a ser construído do zero
Resumo:
O presente estudo tem como objetivo geral traçar um perfil das escolhas léxico-gramaticais da escrita em inglês de um grupo de aprendizes brasileiros na cidade do Rio de Janeiro, ao longo dos anos de 2009 a 2012, através da análise de sua produção de quadrigramas (ou blocos de quatro itens lexicais usados com frequência por vários aprendizes) em composições escritas como parte da avaliação final de curso. Como objetivo específico, a pesquisa pretendeu analisar se os quadrigramas produzidos estavam dentre aqueles que haviam sido previamente ensinados para a execução da redação ou se pertenceriam a alguma outra categoria, isto é, quadrigramas já incorporados ao uso da língua ou quadrigramas errôneos usados com abrangência pela população investigada. Para tal, foram coletadas composições escritas por aprendizes de mesmo nível de proficiência de várias filiais de um mesmo curso livre de inglês na cidade do Rio de Janeiro. Em seguida, essas composições foram digitadas e anotadas para constituírem um corpus digital facilmente identificável em termos do tipo e gênero textual, perfil do aprendiz, filial e área de origem do Rio de Janeiro. O estudo faz uso de preceitos e métodos da Linguística de Corpus, área da Linguística que compila grandes quantidades de textos e deles extrai dados com o auxílio de um programa de computador para mapear uso, frequência, distribuição e abrangência de determinados fenômenos linguístico ou discursivo. O resultado demonstra que os aprendizes investigados usaram poucos quadrigramas ensinados e, coletivamente, preferiram usar outros que não haviam sido ensinados nas aulas específicas para o nível cursado. O estudo também demonstrou que quando o gênero textual faz parte de seu mundo pessoal, os aprendizes parecem utilizar mais quadrigramas previamente ensinados. Isto pode querer dizer que o gênero pode influenciar nas escolhas léxico-gramaticais corretas. O estudo abre portas para se compreender a importância de blocos léxico-gramaticais em escrita em L2 como forma de assegurar fluência e acuracidade no idioma e sugere que é preciso proporcionar maiores oportunidades de prática e conscientização dos aprendizes quanto ao uso de tais blocos