21 resultados para corpus analysis
em Aston University Research Archive
Resumo:
Following Andersen's (1986, 1991) study of untutored anglophone learners of Spanish, aspectual features have been at the centre of hypotheses on the development of past verbal morphology in language acquisition. The Primacy of Aspect Hypothesis claims that the association of any verb category (Aktionsart) with any aspect (perfective or imperfective) constitutes the endpoint of acquisition. However, its predictions rely on the observation of a limited number of untutored learners at the early stages of their acquisition, and have yet to be confirmed in other settings. The aim of the present thesis is to evaluate the explanatory power of the PAH in respect of the acquisition of French past tenses, an aspect of the language which constitutes a serious stumbling block for foreign learners, even those at the highest levels of proficiency (Coppieters 1987). The present research applies the PAH to the production of 61 anglophone 'advanced learners' (as defined in Bartning 1997) in a tutored environment. In so doing, it tests concurrent explanations, including the influence of the input, the influence of chunking, and the hypothesis of cyclic development. Finally, it discusses the cotextual and contextual factors that still provoke what Anderson (1991) terms "non-native glitches" at the final stage, as predicted by the PAH. The first part of the thesis provides the theoretical background to the corpus analysis. It opens with a diachronic presentation of the French past tense system focusing on present areas of competition and developments that emphasize the complexity of the system to be acquired. The concepts of time, grammatical aspect and lexical aspect (Aktionsart) are introduced and discussed in the second chapter, and a distinctive formal representation of the French past tenses is offered in the third chapter. The second part of the thesis is devoted to a corpus analysis. The data gathering procedures and the choice of tasks (oral and written film narratives based on Modern Times, cloze tests and acceptability judgement tests) are described and justified in the research methodology chapter. The research design was shaped by previous studies and consequently allows comparison with these. The second chapter is devoted to the narratives analysis and the third to the grammatical tasks. This section closes with a summary of discoveries and a comparison with previous results. The conclusion addresses the initial research questions in the light of both theory and practice. It shows that the PAH fails to account for the complex phenomenon of past tense development in the acquisitional settings under study, as it adopts a local (the verb phrase) and linear (steady progression towards native usage) approach. It is thus suggested that past tense acquisition rather follows a pendular development as learners reformulate their learning hypotheses and become increasingly able to shift from local to global cues and so to integrate the influence of cotext and context in their tense choice.
Resumo:
The European Union institutions represent a complex setting and a specific case of institutional translation. The European Central Bank (ECB) is a particular context as the documents translated belong to the field of economics and, thus, contain many specialised terms and neologisms that pose challenges to translators. This study aims to investigate the translation practices at the ECB, and to analyse their effects on the translated texts. In order to illustrate the way texts are translated at the ECB, the thesis will focus on metaphorical expressions and the conceptual metaphors by which they are sanctioned. Metaphor is often associated with literature and less with specialised texts. However, according to Lakoff and Johnson’s (1980) conceptual metaphor theory, our conceptual system is fundamentally metaphorical in nature and metaphors are pervasive elements of thought and speech. The corpus compiled comprises economic documents translated at the ECB, mainly from English into Romanian. Using corpus analysis, the most salient metaphorical expressions were identified in the source and target texts and explained with reference to the main conceptual metaphors. Translation strategies are discussed on the basis of a comparison of the source and target texts. The text-based analysis is complemented by questionnaires distributed to translators, which give insights into the institution’s translation practices. As translation is an institutional process, translators have to follow certain guidelines and practices; these are discussed with reference to translators’ agency. A gap was identified in the field of institutional translation. The translation process in the EU institutions has been insufficiently explored, especially regarding the new languages of the European Union. By combining the analysis of the institutional practices, the texts produced in the institution and the translators’ work (by the questionnaires distributed to translators), this thesis intends to bring a contribution to institutional translation and metaphor translation, particularly regarding a new EU language, Romanian.
Resumo:
Based on a corpus of English, German, and Polish spoken academic discourse, this article analyzes the distribution and function of humor in academic research presentations. The corpus is the result of a European research cooperation project consisting of 300,000 tokens of spoken academic language, focusing on the genres research presentation, student presentation, and oral examination. The article investigates difference between the German and English research cultures as expressed in the genre of specialist research presentations, and the role of humor as a pragmatic device in their respective contexts. The data is analyzed according to the paradigms of corpus-assisted discourse studies (CADS). The findings show that humor is used in research presentations as an expression of discourse reflexivity. They also reveal a considerable difference in the quantitative distribution of humor in research presentations depending on the educational, linguistic, and cultural background of the presenters, thus confirming the notion of different research cultures. Such research cultures nurture distinct attitudes to genres of academic language: whereas in one of the cultures identified researchers conform with the constraints and structures of the genre, those working in another attempt to subvert them, for example by the application of humor. © 2012 Elsevier B.V.
Resumo:
This paper investigates whether the position of adverb phrases in sentences is regionally patterned in written Standard American English, based on an analysis of a 25 million word corpus of letters to the editor representing the language of 200 cities from across the United States. Seven measures of adverb position were tested for regional patterns using the global spatial autocorrelation statistic Moran’s I and the local spatial autocorrelation statistic Getis-Ord Gi*. Three of these seven measures were indentified as exhibiting significant levels of spatial autocorrelation, contrasting the language of the Northeast with language of the Southeast and the South Central states. These results demonstrate that continuous regional grammatical variation exists in American English and that regional linguistic variation exists in written Standard English.
Resumo:
Research in social psychology has shown that public attitudes towards feminism are mostly based on stereotypical views linking feminism with leftist politics and lesbian orientation. It is claimed that such attitudes are due to the negative and sexualised media construction of feminism. Studies concerned with the media representation of feminism seem to confirm this tendency. While most of this research provides significant insights into the representation of feminism, the findings are often based on a small sample of texts. Also, most of the research was conducted in an Anglo-American setting. This study attempts to address some of the shortcomings of previous work by examining the discourse of feminism in a large corpus of German and British newspaper data. It does so by employing the tools of Corpus Linguistics. By investigating the collocation profiles of the search term feminism, we provide evidence of salient discourse patterns surrounding feminism in two different cultural contexts. © The Author(s) 2012.
Resumo:
The density of axons in the optic nerve, olfactory tract and corpus callosum was quantified in non-demented elderly subjects and in Alzheimer’s disease (AD) using an image analysis system. In each fibre tract, there was significant reduction in the density of axons in AD compared with non-demented subjects, the greatest reductions being observed in the olfactory tract and corpus callosum. Axonal loss in the optic nerve and olfactory tract was mainly of axons with smaller myelinated cross-sectional areas. In the corpus callosum, a reduction in the number of ‘thin’ and ‘thick’ fibres was observed in AD, but there was a proportionally greater loss of the ‘thick’ fibres. The data suggest significant degeneration of white matter fibre tracts in AD involving the smaller axons in the two sensory nerves and both large and small axons in the corpus callosum. Loss of axons in AD could reflect an associated white matter disorder and/or be secondary to neuronal degeneration.
Resumo:
The judicial interest in ‘scientific’ evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.
Resumo:
Based on Goffman’s definition that frames are general ‘schemata of interpretation’ that people use to ‘locate, perceive, identify, and label’, other scholars have used the concept in a more specific way to analyze media coverage. Frames are used in the sense of organizing devices that allow journalists to select and emphasise topics, to decide ‘what matters’ (Gitlin 1980). Gamson and Modigliani (1989) consider frames as being embedded within ‘media packages’ that can be seen as ‘giving meaning’ to an issue. According to Entman (1993), framing comprises a combination of different activities such as: problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described. Previous research has analysed climate change with the purpose of testing Downs’s model of the issue attention cycle (Trumbo 1996), to uncover media biases in the US press (Boykoff and Boykoff 2004), to highlight differences between nations (Brossard et al. 2004; Grundmann 2007) or to analyze cultural reconstructions of scientific knowledge (Carvalho and Burgess 2005). In this paper we shall present data from a corpus linguistics-based approach. We will be drawing on results of a pilot study conducted in Spring 2008 based on the Nexis news media archive. Based on comparative data from the US, the UK, France and Germany, we aim to show how the climate change issue has been framed differently in these countries and how this framing indicates differences in national climate change policies.
Resumo:
A set of full-color images of objects is described for use in experiments investigating the effects of in-depth rotation on the identification of three-dimensional objects. The corpus contains up to 11 perspective views of 70 nameable objects. We also provide ratings of the "goodness" of each view, based on Thurstonian scaling of subjects' preferences in a paired-comparison experiment. An exploratory cluster analysis on the scaling solutions indicates that the amount of information available in a given view generally is the major determinant of the goodness of the view. For instance, objects with an elongated front-back axis tend to cluster together, and the front and back views of these objects, which do not reveal the object's major surfaces and features, are evaluated as the worst views.
Resumo:
The goal of this study is to determine if various measures of contraction rate are regionally patterned in written Standard American English. In order to answer this question, this study employs a corpus-based approach to data collection and a statistical approach to data analysis. Based on a spatial autocorrelation analysis of the values of eleven measures of contraction across a 25 million word corpus of letters to the editor representing the language of 200 cities from across the contiguous United States, two primary regional patterns were identified: easterners tend to produce relatively few standard contractions (not contraction, verb contraction) compared to westerners, and northeasterners tend to produce relatively few non-standard contractions (to contraction, non-standard not contraction) compared to southeasterners. These findings demonstrate that regional linguistic variation exists in written Standard American English and that regional linguistic variation is more common than is generally assumed.
Resumo:
University students encounter difficulties with academic English because of its vocabulary, phraseology, and variability, and also because academic English differs in many respects from general English, the language which they have experienced before starting their university studies. Although students have been provided with many dictionaries that contain some helpful information on words used in academic English, these dictionaries remain focused on the uses of words in general English. There is therefore a gap in the dictionary market for a dictionary for university students, and this thesis provides a proposal for such a dictionary (called the Dictionary of Academic English; DOAE) in the form of a model which depicts how the dictionary should be designed, compiled, and offered to students. The model draws on state-of-the-art techniques in lexicography, dictionary-use research, and corpus linguistics. The model demanded the creation of a completely new corpus of academic language (Corpus of Academic Journal Articles; CAJA). The main advantages of the corpus are its large size (83.5 million words) and balance. Having access to a large corpus of academic language was essential for a corpus-driven approach to data analysis. A good corpus balance in terms of domains enabled a detailed domain-labelling of senses, patterns, collocates, etc. in the dictionary database, which was then used to tailor the output according to the needs of different types of student. The model proposes an online dictionary that is designed as an online dictionary from the outset. The proposed dictionary is revolutionary in the way it addresses the needs of different types of student. It presents students with a dynamic dictionary whose contents can be customised according to the user's native language, subject of study, variant spelling preferences, and/or visual preferences (e.g. black and white).
Resumo:
This research sets out to compare the values in British and German political discourse, especially the discourse of social policy, and to analyse their relationship to political culture through an analysis of the values of health care reform. The work proceeds from the hypothesis that the known differences in political culture between the two countries will be reflected in the values of political discourse, and takes a comparison of two major recent legislative debates on health care reform as a case study. The starting point in the first chapter is a brief comparative survey of the post-war political cultures of the two countries, including a brief account of the historical background to their development and an overview of explanatory theoretical models. From this are developed the expected contrasts in values in accordance with the hypothesis. The second chapter explains the basis for selecting the corpus texts and the contextual information which needs to be recorded to make a comparative analysis, including the context and content of the reform proposals which comprise the case study. It examines any contextual factors which may need to be taken into account in the analysis. The third and fourth chapters explain the analytical method, which is centred on the use of definition-based taxonomies of value items and value appeal methods to identify, on a sentence-by-sentence basis, the value items in the corpus texts and the methods used to make appeals to those value items. The third chapter is concerned with the classification and analysis of values, the fourth with the classification and analysis of value appeal methods. The fifth chapter will present and explain the results of the analysis, and the sixth will summarize the conclusions and make suggestions for further research.
Resumo:
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.
Resumo:
Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.
Resumo:
This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’. The corpus represents his concern with the nature of linguistic evidence. Lexicography is for him the canonical mode of language description at the lexical level. And his belief that the corpus should ‘drive’ the description is reflected in his constant attempts to utilize the emergent computer technologies to automate the initial stages of analysis and defer the intuitive, interpretative contributions of linguists to increasingly later stages in the process. Sinclair’s model of corpus-driven lexicography has spread far beyond its initial implementation at Cobuild, to most EFL dictionaries, to native-speaker dictionaries (e.g. the New Oxford Dictionary of English, and many national language dictionaries in emerging or re-emerging speech communities) and bilingual dictionaries (e.g. Collins, Oxford-Hachette).