37 resultados para Bilingual Corpus
em Aston University Research Archive
Resumo:
This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’. The corpus represents his concern with the nature of linguistic evidence. Lexicography is for him the canonical mode of language description at the lexical level. And his belief that the corpus should ‘drive’ the description is reflected in his constant attempts to utilize the emergent computer technologies to automate the initial stages of analysis and defer the intuitive, interpretative contributions of linguists to increasingly later stages in the process. Sinclair’s model of corpus-driven lexicography has spread far beyond its initial implementation at Cobuild, to most EFL dictionaries, to native-speaker dictionaries (e.g. the New Oxford Dictionary of English, and many national language dictionaries in emerging or re-emerging speech communities) and bilingual dictionaries (e.g. Collins, Oxford-Hachette).
Resumo:
Corpus Linguistics is a young discipline. The earliest work was done in the 1960s, but corpora only began to be widely used by lexicographers and linguists in the late 1980s, by language teachers in the late 1990s, and by language students only very recently. This course in corpus linguistics was held at the Departamento de Linguistica Aplicada, E.T.S.I. de Minas, Universidad Politecnica de Madrid from June 15-19 1998. About 45 teachers registered for the course. 30% had PhDs in linguistics, 20% in literature, and the rest were doctorandi or qualified English teachers. The course was designed to introduce the use of corpora and other computational resources in teaching and research, with special reference to scientific and technological discourse in English. Each participant had a computer networked with the lecturer’s machine, whose display could be projected onto a large screen. Application programs were loaded onto the central server, and telnet and a web browser were available. COBUILD gave us permission to access the 323 million word Bank of English corpus, Mike Scott allowed us to use his Wordsmith Tools software, and Tim Johns gave us a copy of his MicroConcord program.
Resumo:
This article analyses how speakers of an autochthonous heritage language (AHL) make use of digital media, through the example of Low German, a regional language used by a decreasing number of speakers mainly in northern Germany. The focus of the analysis is on Web 2.0 and its interactive potential for individual speakers. The study therefore examines linguistic practices on the social network site Facebook, with special emphasis on language choice, bilingual practices and writing in the autochthonous heritage language. The findings suggest that social network sites such as Facebook have the potential to provide new mediatized spaces for speakers of an AHL that can instigate sociolinguistic change.
Resumo:
Based on Goffman’s definition that frames are general ‘schemata of interpretation’ that people use to ‘locate, perceive, identify, and label’, other scholars have used the concept in a more specific way to analyze media coverage. Frames are used in the sense of organizing devices that allow journalists to select and emphasise topics, to decide ‘what matters’ (Gitlin 1980). Gamson and Modigliani (1989) consider frames as being embedded within ‘media packages’ that can be seen as ‘giving meaning’ to an issue. According to Entman (1993), framing comprises a combination of different activities such as: problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described. Previous research has analysed climate change with the purpose of testing Downs’s model of the issue attention cycle (Trumbo 1996), to uncover media biases in the US press (Boykoff and Boykoff 2004), to highlight differences between nations (Brossard et al. 2004; Grundmann 2007) or to analyze cultural reconstructions of scientific knowledge (Carvalho and Burgess 2005). In this paper we shall present data from a corpus linguistics-based approach. We will be drawing on results of a pilot study conducted in Spring 2008 based on the Nexis news media archive. Based on comparative data from the US, the UK, France and Germany, we aim to show how the climate change issue has been framed differently in these countries and how this framing indicates differences in national climate change policies.
Resumo:
This paper asserts the increasing importance of academic English in an increasingly Anglophone world, and looks at the differences between academic English and general English, especially in terms of vocabulary. The creation of wordlists has played an important role in trying to establish the academic English lexicon, but these wordlists are not based on appropriate data, or are implemented inappropriately. There is as yet no adequate dictionary of academic English, and this paper reports on new efforts at Aston University to create a suitable corpus on which such a dictionary could be based.
Resumo:
This paper is a progress report on a research path I first outlined in my contribution to “Words in Context: A Tribute to John Sinclair on his Retirement” (Heffer and Sauntson, 2000). Therefore, I first summarize that paper here, in order to provide the relevant background. The second half of the current paper consists of some further manual analyses, exploring various parameters and procedures that might assist in the design of an automated computational process for the identification of lexical sets. The automation itself is beyond the scope of the current paper.
Resumo:
Almost everyone who has an email account receives from time to time unwanted emails. These emails can be jokes from friends or commercial product offers from unknown people. In this paper we focus on these unwanted messages which try to promote a product or service, or to offer some “hot” business opportunities. These messages are called junk emails. Several methods to filter junk emails were proposed, but none considers the linguistic characteristics of junk emails. In this paper, we investigate the linguistic features of a corpus of junk emails, and try to decide if they constitute a distinct genre. Our corpus of junk emails was build from the messages received by the authors over a period of time. Initially, the corpus consisted of 1563, but after eliminating the duplications automatically we kept only 673 files, totalising just over 373,000 tokens. In order to decide if the junk emails constitute a different genre, a comparison with a corpus of leaflets extracted from BNC and with the whole BNC corpus is carried out. Several characteristics at the lexical and grammatical levels were identified.