Biblioteca Digital

992 resultados para corpus linguistics

A new venture in corpus-based lexicography: towards a dictionary of academic English

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper asserts the increasing importance of academic English in an increasingly Anglophone world, and looks at the differences between academic English and general English, especially in terms of vocabulary. The creation of wordlists has played an important role in trying to establish the academic English lexicon, but these wordlists are not based on appropriate data, or are implemented inappropriately. There is as yet no adequate dictionary of academic English, and this paper reports on new efforts at Aston University to create a suitable corpus on which such a dictionary could be based.

Designing a model for a corpus-driven dictionary of academic English

Relevância:

70.00% 70.00%

Publicador:

Resumo:

University students encounter difficulties with academic English because of its vocabulary, phraseology, and variability, and also because academic English differs in many respects from general English, the language which they have experienced before starting their university studies. Although students have been provided with many dictionaries that contain some helpful information on words used in academic English, these dictionaries remain focused on the uses of words in general English. There is therefore a gap in the dictionary market for a dictionary for university students, and this thesis provides a proposal for such a dictionary (called the Dictionary of Academic English; DOAE) in the form of a model which depicts how the dictionary should be designed, compiled, and offered to students. The model draws on state-of-the-art techniques in lexicography, dictionary-use research, and corpus linguistics. The model demanded the creation of a completely new corpus of academic language (Corpus of Academic Journal Articles; CAJA). The main advantages of the corpus are its large size (83.5 million words) and balance. Having access to a large corpus of academic language was essential for a corpus-driven approach to data analysis. A good corpus balance in terms of domains enabled a detailed domain-labelling of senses, patterns, collocates, etc. in the dictionary database, which was then used to tailor the output according to the needs of different types of student. The model proposes an online dictionary that is designed as an online dictionary from the outset. The proposed dictionary is revolutionary in the way it addresses the needs of different types of student. It presents students with a dynamic dictionary whose contents can be customised according to the user's native language, subject of study, variant spelling preferences, and/or visual preferences (e.g. black and white).

A statistical analysis of regional variation in adverb position in a corpus of written Standard American English

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper investigates whether the position of adverb phrases in sentences is regionally patterned in written Standard American English, based on an analysis of a 25 million word corpus of letters to the editor representing the language of 200 cities from across the United States. Seven measures of adverb position were tested for regional patterns using the global spatial autocorrelation statistic Moran’s I and the local spatial autocorrelation statistic Getis-Ord Gi*. Three of these seven measures were indentified as exhibiting significant levels of spatial autocorrelation, contrasting the language of the Northeast with language of the Southeast and the South Central states. These results demonstrate that continuous regional grammatical variation exists in American English and that regional linguistic variation exists in written Standard English.

Creation of a Digital Corpus of Bulgarian Dialects

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The paper presents our considerations related to the creation of a digital corpus of Bulgarian dialects. The dialectological archive of Bulgarian language consists of more than 250 audio tapes. All tapes were recorded between 1955 and 1965 in the course of regular dialectological expeditions throughout the country. The records typically contain interviews with inhabitants of small villages in Bulgaria. The topics covered are usually related to such issues as birth, everyday life, marriage, family relationship, death, etc. Only a few tapes contain folk songs from different regions of the country. Taking into account the progressive deterioration of the magnetic media and the realistic prospects of data loss, the Institute for Bulgarian Language at the Academy of Sciences launched in 1997 a project aiming at restoration and digital preservation of the dialectological archive. Within the framework of this project more than the half of the records was digitized, de-noised and stored on digital recording media. Since then restoration and digitization activities are done in the Institute on a regular basis. As a result a large collection of sound files has been gathered. Our further efforts are aimed at the creation of a digital corpus of Bulgarian dialects, which will be made available for phonological and linguistic research. Such corpora typically include besides the sound files two basic elements: a transcription, aligned with the sound file, and a set of standardized metadata that defines the corpus. In our work we will present considerations on how these tasks could be realized in the case of the corpus of Bulgarian dialects. Our suggestions will be based on a comparative analysis of existing methods and techniques to build such corpora, and by selecting the ones that fit closer to the particular needs. Our experience can be used in similar institutions storing folklore archives, history related spoken records etc.

On the F word:a corpus-based analysis of the media representation of feminism in British and German press discourse, 1990-2009

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Research in social psychology has shown that public attitudes towards feminism are mostly based on stereotypical views linking feminism with leftist politics and lesbian orientation. It is claimed that such attitudes are due to the negative and sexualised media construction of feminism. Studies concerned with the media representation of feminism seem to confirm this tendency. While most of this research provides significant insights into the representation of feminism, the findings are often based on a small sample of texts. Also, most of the research was conducted in an Anglo-American setting. This study attempts to address some of the shortcomings of previous work by examining the discourse of feminism in a large corpus of German and British newspaper data. It does so by employing the tools of Corpus Linguistics. By investigating the collocation profiles of the search term feminism, we provide evidence of salient discourse patterns surrounding feminism in two different cultural contexts. © The Author(s) 2012.

Linguistic Corpora as International Cultural Heritage: The Corpus of Bulgarian and Ukrainian Parallel Texts

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The paper relates about our ongoing work on the creation of a corpus of Bulgarian and Ukrainian parallel texts. We discuss some differences in the approaches and the interpretation of some concepts, as well as various problems associated with the construction of our corpus, in particular the occasional ‘nonparallelism’ of original and translated texts. We give examples of the application of the parallel corpus for the study of lexical semantics and note the outstanding role of the corpus in the lexicographic description of Ukrainian and Bulgarian translation equivalents. We draw attention to the importance of creating parallel corpora as objects of national as well as global cultural heritage.

A corpus-assisted study of the discourse marker well as an indicator of judges' institutional roles in court cases with litigants in person

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, I concentrate on court cases with litigants in person (lay people who act on their own behalf in legal proceedings without a counsel or solicitor) and discuss the challenges of building a corpus of courtroom discourse where it is crucial to distinguish between speakers due to their distinct institutional roles. The corpus incorporates seven sub-corpora of verbatim transcripts from different court cases with litigants in person and comprises over eleven-million tokens. The focus of this paper is on the interplay between the legal and lay discourse types and how judges project their institutional roles through well-initiated turns directed at litigants in person and counsels. As a versatile discourse marker, well provides a good opportunity to explore how judges have to adapt their roles to ensure lay litigants in person receive the necessary support and that their lack of competence does not impede on the fairness of the proceedings. Given the breadth and importance of the topic of litigation in person, I discuss how the tools and approaches of corpus linguistics can be helpful in this multi-disciplinary area where multiple functions and uses of individual linguistic features need to be explored in depth.

As vogais médias tônicas: um estudo contrastivo da metafonia com base em corpus

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This research analyzes the average previous stressed vowels [ε] and [e] and later [ɔ] and [o] in nominal and verbal forms in the 1st person singular and 3rd person singular and plural in the present tense, specifically the umlaut process of mid vowels /e/ and /o/, which assimilate in /ε/ and /ᴐ/ in stressed position. The general objective of this research is to describe and quantify the occurrence of umlaut and subsequently analyze in which words there is regularity or not. As specific objectives we have: i) to compile and to label an oral, spontaneous, synchronic and regional corpus, from radio programs produced in the city of Ituiutaba, Minas Gerais; ii) to describe the characteristics of the corpus to be compiled; iii) to investigate the alternating timbre of mid vowels in stressed position; iv) to identify instances of nominal and verbal umlaut of the middle vowels in stressed position; v) to describe the identified cases of nominal and verbal umlaut; vi) to analyze the probable causes for the variation of the middle vowels. To perform the proposed analysis, we have adopted as a theoretical-methodological basis multi-representational models: Phonology of Use (BYBEE, 2001) and Exemplar Theory (PIERREHUMBERT, 2001) combined with the precepts of Corpus Linguistics (BEBER SARDINHA, 2004). The corpus consisted of 16 radio programs – eight political and eight religious – from the city of Ituiutaba-MG, with recordings of about 20 to 40 minutes. We note, by means of the results generated by WordSmith Tools® software, version 6.0 (SCOTT, 2012), that the analyzed forms show little variation, which shows that the umlaut is a process already lexicalized in participants of the radio programs analyzed. We conclude that the results converge with the proposal of the Phonology of Use (BYBEE, 2001; PHILLIPS, 1984) that less frequent words that have no phonetic environment conducive to change, are changed first.

A nasalidade no dialeto quilombola do Norte de Minas: uma análise contrastiva baseada em corpus

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This research investigated the nasality of vowels in the spontaneous speech of inhabitants of the quilombola communities of Brejo dos Crioulos and Poções (MG). As a theoretical framework, we based on the assumptions of Phonetics and Phonology, in renowned scholars on the investigation of nasality (CAGLIARI, 1977; CÂMARA JR., 1984, 2013; BISOL, 2013; ABAURRE; PAGOTTO, 1996; SILVA, 2015), with subsidies of the Corpus Linguistics. Its general goal was to investigate the occurrence of nasality, in the dialect of these quilombola communities, and their linguistic behavior, considering the linguistic factors that can interfere in the phenomenon. Specifically it was aimed to a) detect the occurrence of nasalized vowels with the help of the resources that the Corpus Linguistics provides (Praat and WorldSmith Tolls); b) discriminate the different types of occurring contexts of nasalized vowels; c) make quantitative and qualitative analyzes of the nasalized vowels in the study corpus; d) describe and analyze the behavior of nasalized vowels and; e) contrast the values of F1 and F2 of the oral and nasalized vowels. It was hypothesized that the nasality happens because it is conditioned by the nasal segment following the nasalized vowel - phonological process of “assimilation” - its position as the primary stress and grammatical category. It was believed that the quilombolas communities of Brejo dos Crioulos and Poções produce nasalized vowels in their speech and this linguistic phenomenon is favored by the adjacent presence of consonants or nasal vowels. Furthermore, it was hypothesized that the values of F1 and F2 of oral and nasalized vowels in these communities are distinct. The following research questions were elaborated: (i) is the presence of nasalized vowels in the speech of these quilombola communities conditioned to the presence of a nasal sound segment? (ii) does the nasal sound segment following the nasalized vowel favor the occurrence of the nasality phenomenon? is there a difference between the values of F1 and F2 of the oral and nasalized vowels in both quilombola communities considered? To compose our corpus, 24 interviews recordings were used (12 female speakers and 12 male speakers), a total of 24 participants. It was found that the following nasal sound segment tends to condition the nasalized vowel. In general, it assimilates the lowering of the soft palate of nasal consonant segment immediately following, but there are cases of nasal vowel segment - regressive assimilation; the stressed syllable tends to favor the nasality, but it occurs in pretonic and postonic position as well; F1 and F2 values of oral and nasalized vowels in the quilombola communities of Poções and Brejo dos Crioulos are distinct: the group of Brejo dos Crioulos tends to produce the F1 of oral and nasalized vowels more lowered than the group of Poções and the F2, in a more anterior position. The nasality tends to occur in verbs and nouns, although it is not specific to a grammatical category. This research found cases of spurious nasalization, confirming previous studies. In turn, it revealed cases of lexical items with favorable context for nasalization, but with its non-occurrence. This last case, considered as the lowering of the uniform soft palate in PB, presented pronounced vowels without the soft palate lowering. That is, it was detected variation in the phenomenon of nasalization in PB. With this work, it was promoted the discussion about nasality, in order to contribute to the linguistic studies about the functioning of Brazilian Portuguese in this geographical context.

A representação social da violência de gênero contra a mulher no Espírito Santo

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O objetivo principal deste trabalho é analisar como os meios de comunicação ajudam a construir a representação social da violência de gênero contra a mulher no Espírito Santo, que lidera o ranking nacional de feminicídios, com taxa de 9,8 homicídios para cada 100 mil mulheres. Elegemos como corpus de pesquisa notícias sobre violência de gênero no ES, veiculadas no ano de 2013, nos jornais A Gazeta e A Tribuna. Em hipótese, acreditamos que essas notícias ajudam a construir representações sociais acerca da violência de gênero a partir da apresentação de estereótipos de vítima e agressor na sociedade, da individualização do problema da violência, da associação desse problema às classes sociais menos privilegiadas e da apresentação do crime de violência de gênero como crime passional. O estudo dessas notícias apresenta-se como algo complexo, do qual não participam apenas informações de ordem linguística, mas também de carácter social, histórico, cultural e cognitivo, uma vez que a análise discursiva não pode ser dissociada do contexto, dos atores sociais e das instituições envolvidas na produção da notícia, bem como das ideologias presentes nesse processo. Por esse motivo, assumimos como base teórica de nossa investigação uma proposta multidisciplinar: a Teoria Sociocognitiva de Teun A. van Dijk (1999a; 2011a; 2012; 2014b). Ademais, contamos com as contribuições dos estudos sobre gênero e discurso de Cameron (1985, 1997), Wodak (1997), West, Lazar e Kramarae (2000), Fernández Díaz (2003), Lazar (1993, 2005, 2007), Magalhães (2005; 2009), Heberle, Ostermann e Figueiredo (2006). Além das análises discursivo-analíticas, também utilizamos o programa de linguística de corpus WordSmith Tools para realizar análises quantitativas. Os resultados das análises nos levaram à confirmação das hipóteses iniciais: o discurso das notícias reforça estereótipos de vítima e agressor, típicos de uma estrutura social patriarcal, na qual é atribuída à vítima ou aos vícios (álcool e drogas) a responsabilidade da violência sofrida; além disso, a violência de gênero é apresentada como um problema individual e associada às classes sociais menos favorecidas; e, por último, o discurso das notícias apresenta grande parte dos crimes de violência de gênero como crimes passionais.

O uso de corpora na análise da representação do discurso em tradução

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este artigo apresenta uma pesquisa sobre a representação do discurso ficcional embasado na gramática sistêmico - funcional proposta por Halliday e na Lingüística de Corpus, utilizando-se o software WordSmith Tools. A análise focaliza a metafunção ideacional, realizada pelo sistema de transitividade, focalizando os processos mentais e a relação lógico - semântica da projeção. O objetivo da pesquisa foi observar como os pensamentos das personagens de um corpus ficcional são representados através dos verbos de elocução THINK e PENSAR, buscando descrever padrões textuais nos três romances que compõem o corpus.

Der Wissenschaftlichkeit auf der Spur: zum Einsatz von Korpora in der Vermittlung des Deutschen als (fremder) Wissenschaftssprache

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recently, Corpus Linguistics has become a popular research tool in the field of German as a Foreign Language. However, little attention has been paid to teaching and learning potentials that corpora and corpus-based teaching offer. This paper seeks to demonstrate some of the ways in which corpus-based techniques can be used for teaching purposes, even by those who have little experience in Corpus Linguistics. The focus will be on teaching and learning German for Academic Purposes in German Studies abroad.

Formulaic sequences in native and non-native argumentative writing in German

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Whereas there is substantial scholarship on formulaic language in L1 and L2 English, there is less research on formulaicity in other languages. The aim of this paper is to contribute to learner corpus research into formulaic language in native and non-native German. To this effect, a corpus of argumentative essays written by advanced British students of German (WHiG) was compared with a corpus of argumentative essays written by German native speakers (Falko-L1). A corpus-driven analysis reveals a larger number of 3-grams in WHiG than in Falko-L1, which suggests that British advanced learners of German are more likely to use formulaic language in argumentative writing than their native-speaker counterparts. Secondly, by classifying the formulaic sequences according to their functions, this study finds that native speakers of German prefer discourse-structuring devices to stance expressions, whilst British advanced learners display the opposite preferences. Thirdly, the results show that learners of German make greater use of macro-discourse-structuring devices and cautious language, whereas native speakers favour micro-discourse structuring devices and tend to use more direct language. This study increases our understanding of formulaic language typical of British advanced learners of German and reveals how diverging cultural paradigms can shape written native speaker and learner output.

A Linguistic Analysis of the Written Production of Second Language Learners : The Variation of Article Usage by Adult Chinese Learners of English

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study aims to test Robertson’s lexical transfer principle, which posits that Chinese learners use demonstratives (particularly this) and the numeral one as markers of definiteness and indefiniteness. This is tested by analysing Chinese learners’ written production collected from the Spoken and Written English Corpus of Chinese Learners 2.0 (SWECCL 2.0). The purpose is to understand the variation of article usage by adult Chinese learners of English. More specifically, the study examines to what extent articles, possessive and demonstrative pronouns are used in Chinese learners’ English and how definite and indefinite articles are used by the Chinese learners. Findings suggest that Robertson’s lexical transfer principle was corroborated by the present study. In addition, Chinese learners prefer to use demonstrative determiners, the possessive determiner our, and the numeral one to perform the function of marking definiteness and indefiniteness. In particular, the learners try to use the demonstrative determiners that and this in the anaphoric function instead of the definite article, and the demonstrative determiner those is frequently used in the cataphoric function. What is more, the learners use the numeral one as a marker of indefiniteness, and it is also used as a marker of definiteness in the anaphoric function. Further, the possessive determiner our is used as a marker of definiteness in larger situation uses referring to something unique. To this end, the study is able to show that the definite article is used to perform the function of marking indefiniteness, and in some particular contexts the definite article functions as a Chinese specifier in Chinese learners’ English. Also, the indefinite article is frequently used in quantifier phrases but is rarely used in other functions. There are three main reasons that may explain why Chinese learners use determiners variously. Firstly, the choice of determiners by Chinese learners is influenced by linguistic contexts. Secondly, because of learning strategies, Chinese learners try to ignore the anaphoric function and cataphoric function that they are not yet ready to process in article usage. Thirdly, interlanguage grammar influences the optionality in the use of articles.

The use of the pronouns we, us, and our in political speeches : A comparative study of the inaugural addresses of Bush and Obama

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pronouns carry considerable importance in language. The speaker’s identity and connection to the audience emerges through the consistent use of certain pronouns (De Fina, 1995). This research is about the use of we, us, and our in political discourse. Specifically, their use will be examined in the inaugural addresses of George W. Bush in 2005 and Barack Obama in 2009. The aim of this research is to examine the frequencies and the co-occurrences of these pronouns and then compare their use in these two speeches. More specifically, how do the pronouns examined affect the message and enhance hearer credibility. This is done by applying (a) a quantitative corpus linguistics analysis and (b) qualitative analysis of the context of use. The results show that there is a difference in frequency of pronoun use; however, the usage of pronouns is rather similar in the two speeches

«
1
2
3
4
5
6
7
8
...
66
67
»