940 resultados para Corpus lingüístico
Resumo:
Based on Goffman’s definition that frames are general ‘schemata of interpretation’ that people use to ‘locate, perceive, identify, and label’, other scholars have used the concept in a more specific way to analyze media coverage. Frames are used in the sense of organizing devices that allow journalists to select and emphasise topics, to decide ‘what matters’ (Gitlin 1980). Gamson and Modigliani (1989) consider frames as being embedded within ‘media packages’ that can be seen as ‘giving meaning’ to an issue. According to Entman (1993), framing comprises a combination of different activities such as: problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described. Previous research has analysed climate change with the purpose of testing Downs’s model of the issue attention cycle (Trumbo 1996), to uncover media biases in the US press (Boykoff and Boykoff 2004), to highlight differences between nations (Brossard et al. 2004; Grundmann 2007) or to analyze cultural reconstructions of scientific knowledge (Carvalho and Burgess 2005). In this paper we shall present data from a corpus linguistics-based approach. We will be drawing on results of a pilot study conducted in Spring 2008 based on the Nexis news media archive. Based on comparative data from the US, the UK, France and Germany, we aim to show how the climate change issue has been framed differently in these countries and how this framing indicates differences in national climate change policies.
Resumo:
This paper asserts the increasing importance of academic English in an increasingly Anglophone world, and looks at the differences between academic English and general English, especially in terms of vocabulary. The creation of wordlists has played an important role in trying to establish the academic English lexicon, but these wordlists are not based on appropriate data, or are implemented inappropriately. There is as yet no adequate dictionary of academic English, and this paper reports on new efforts at Aston University to create a suitable corpus on which such a dictionary could be based.
Resumo:
This paper is a progress report on a research path I first outlined in my contribution to “Words in Context: A Tribute to John Sinclair on his Retirement” (Heffer and Sauntson, 2000). Therefore, I first summarize that paper here, in order to provide the relevant background. The second half of the current paper consists of some further manual analyses, exploring various parameters and procedures that might assist in the design of an automated computational process for the identification of lexical sets. The automation itself is beyond the scope of the current paper.
Resumo:
Almost everyone who has an email account receives from time to time unwanted emails. These emails can be jokes from friends or commercial product offers from unknown people. In this paper we focus on these unwanted messages which try to promote a product or service, or to offer some “hot” business opportunities. These messages are called junk emails. Several methods to filter junk emails were proposed, but none considers the linguistic characteristics of junk emails. In this paper, we investigate the linguistic features of a corpus of junk emails, and try to decide if they constitute a distinct genre. Our corpus of junk emails was build from the messages received by the authors over a period of time. Initially, the corpus consisted of 1563, but after eliminating the duplications automatically we kept only 673 files, totalising just over 373,000 tokens. In order to decide if the junk emails constitute a different genre, a comparison with a corpus of leaflets extracted from BNC and with the whole BNC corpus is carried out. Several characteristics at the lexical and grammatical levels were identified.
Resumo:
A set of full-color images of objects is described for use in experiments investigating the effects of in-depth rotation on the identification of three-dimensional objects. The corpus contains up to 11 perspective views of 70 nameable objects. We also provide ratings of the "goodness" of each view, based on Thurstonian scaling of subjects' preferences in a paired-comparison experiment. An exploratory cluster analysis on the scaling solutions indicates that the amount of information available in a given view generally is the major determinant of the goodness of the view. For instance, objects with an elongated front-back axis tend to cluster together, and the front and back views of these objects, which do not reveal the object's major surfaces and features, are evaluated as the worst views.
Resumo:
University students encounter difficulties with academic English because of its vocabulary, phraseology, and variability, and also because academic English differs in many respects from general English, the language which they have experienced before starting their university studies. Although students have been provided with many dictionaries that contain some helpful information on words used in academic English, these dictionaries remain focused on the uses of words in general English. There is therefore a gap in the dictionary market for a dictionary for university students, and this thesis provides a proposal for such a dictionary (called the Dictionary of Academic English; DOAE) in the form of a model which depicts how the dictionary should be designed, compiled, and offered to students. The model draws on state-of-the-art techniques in lexicography, dictionary-use research, and corpus linguistics. The model demanded the creation of a completely new corpus of academic language (Corpus of Academic Journal Articles; CAJA). The main advantages of the corpus are its large size (83.5 million words) and balance. Having access to a large corpus of academic language was essential for a corpus-driven approach to data analysis. A good corpus balance in terms of domains enabled a detailed domain-labelling of senses, patterns, collocates, etc. in the dictionary database, which was then used to tailor the output according to the needs of different types of student. The model proposes an online dictionary that is designed as an online dictionary from the outset. The proposed dictionary is revolutionary in the way it addresses the needs of different types of student. It presents students with a dynamic dictionary whose contents can be customised according to the user's native language, subject of study, variant spelling preferences, and/or visual preferences (e.g. black and white).
Resumo:
Based on a corpus of English, German, and Polish spoken academic discourse, this article analyzes the distribution and function of humor in academic research presentations. The corpus is the result of a European research cooperation project consisting of 300,000 tokens of spoken academic language, focusing on the genres research presentation, student presentation, and oral examination. The article investigates difference between the German and English research cultures as expressed in the genre of specialist research presentations, and the role of humor as a pragmatic device in their respective contexts. The data is analyzed according to the paradigms of corpus-assisted discourse studies (CADS). The findings show that humor is used in research presentations as an expression of discourse reflexivity. They also reveal a considerable difference in the quantitative distribution of humor in research presentations depending on the educational, linguistic, and cultural background of the presenters, thus confirming the notion of different research cultures. Such research cultures nurture distinct attitudes to genres of academic language: whereas in one of the cultures identified researchers conform with the constraints and structures of the genre, those working in another attempt to subvert them, for example by the application of humor. © 2012 Elsevier B.V.
Resumo:
We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.