31 resultados para Linguistica de Corpus
Resumo:
University students encounter difficulties with academic English because of its vocabulary, phraseology, and variability, and also because academic English differs in many respects from general English, the language which they have experienced before starting their university studies. Although students have been provided with many dictionaries that contain some helpful information on words used in academic English, these dictionaries remain focused on the uses of words in general English. There is therefore a gap in the dictionary market for a dictionary for university students, and this thesis provides a proposal for such a dictionary (called the Dictionary of Academic English; DOAE) in the form of a model which depicts how the dictionary should be designed, compiled, and offered to students. The model draws on state-of-the-art techniques in lexicography, dictionary-use research, and corpus linguistics. The model demanded the creation of a completely new corpus of academic language (Corpus of Academic Journal Articles; CAJA). The main advantages of the corpus are its large size (83.5 million words) and balance. Having access to a large corpus of academic language was essential for a corpus-driven approach to data analysis. A good corpus balance in terms of domains enabled a detailed domain-labelling of senses, patterns, collocates, etc. in the dictionary database, which was then used to tailor the output according to the needs of different types of student. The model proposes an online dictionary that is designed as an online dictionary from the outset. The proposed dictionary is revolutionary in the way it addresses the needs of different types of student. It presents students with a dynamic dictionary whose contents can be customised according to the user's native language, subject of study, variant spelling preferences, and/or visual preferences (e.g. black and white).
Resumo:
Based on a corpus of English, German, and Polish spoken academic discourse, this article analyzes the distribution and function of humor in academic research presentations. The corpus is the result of a European research cooperation project consisting of 300,000 tokens of spoken academic language, focusing on the genres research presentation, student presentation, and oral examination. The article investigates difference between the German and English research cultures as expressed in the genre of specialist research presentations, and the role of humor as a pragmatic device in their respective contexts. The data is analyzed according to the paradigms of corpus-assisted discourse studies (CADS). The findings show that humor is used in research presentations as an expression of discourse reflexivity. They also reveal a considerable difference in the quantitative distribution of humor in research presentations depending on the educational, linguistic, and cultural background of the presenters, thus confirming the notion of different research cultures. Such research cultures nurture distinct attitudes to genres of academic language: whereas in one of the cultures identified researchers conform with the constraints and structures of the genre, those working in another attempt to subvert them, for example by the application of humor. © 2012 Elsevier B.V.
Resumo:
We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.
Resumo:
This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’. The corpus represents his concern with the nature of linguistic evidence. Lexicography is for him the canonical mode of language description at the lexical level. And his belief that the corpus should ‘drive’ the description is reflected in his constant attempts to utilize the emergent computer technologies to automate the initial stages of analysis and defer the intuitive, interpretative contributions of linguists to increasingly later stages in the process. Sinclair’s model of corpus-driven lexicography has spread far beyond its initial implementation at Cobuild, to most EFL dictionaries, to native-speaker dictionaries (e.g. the New Oxford Dictionary of English, and many national language dictionaries in emerging or re-emerging speech communities) and bilingual dictionaries (e.g. Collins, Oxford-Hachette).
Resumo:
UK universities are accepting increasing numbers of students whose L1 is not English on a wide range of programmes at all levels. These students require additional support and training in English, focussing on their academic disciplines. Corpora have been used in EAP since the 1980s, mainly for research, but a growing number of researchers and practitioners have been advocating the use of corpora in EAP pedagogy, and such use is gradually increasing. This paper outlines the processes and factors to be considered in the design and compilation of an EAP corpus (e.g., the selection and acquisition of texts, metadata, data annotation, software tools and outputs, web interface, and screen displays), especially one intended to be used for teaching. Such a corpus would also facilitate EAP research in terms of longitudinal studies, student progression and development, and course and materials design. The paper has been informed by the preparatory work on the EAP subcorpus of the ACORN corpus project at Aston University. © 2007 Elsevier Ltd. All rights reserved.
Resumo:
Corpora—large collections of written and/or spoken text stored and accessed electronically—provide the means of investigating language that is of growing importance academically and professionally. Corpora are now routinely used in the following fields: The production of dictionaries and other reference materials; The development of aids to translation; Language teaching materials; The investigation of ideologies and cultural assumptions; Natural language processing; and The investigation of all aspects of linguistic behaviour, including vocabulary, grammar and pragmatics.
Resumo:
This paper investigates whether the position of adverb phrases in sentences is regionally patterned in written Standard American English, based on an analysis of a 25 million word corpus of letters to the editor representing the language of 200 cities from across the United States. Seven measures of adverb position were tested for regional patterns using the global spatial autocorrelation statistic Moran’s I and the local spatial autocorrelation statistic Getis-Ord Gi*. Three of these seven measures were indentified as exhibiting significant levels of spatial autocorrelation, contrasting the language of the Northeast with language of the Southeast and the South Central states. These results demonstrate that continuous regional grammatical variation exists in American English and that regional linguistic variation exists in written Standard English.
Resumo:
This study uses a purpose-built corpus to explore the linguistic legacy of Britain’s maritime history found in the form of hundreds of specialised ‘Maritime Expressions’ (MEs), such as TAKEN ABACK, ANCHOR and ALOOF, that permeate modern English. Selecting just those expressions commencing with ’A’, it analyses 61 MEs in detail and describes the processes by which these technical expressions, from a highly specialised occupational discourse community, have made their way into modern English. The Maritime Text Corpus (MTC) comprises 8.8 million words, encompassing a range of text types and registers, selected to provide a cross-section of ‘maritime’ writing. It is analysed using WordSmith analytical software (Scott, 2010), with the 100 million-word British National Corpus (BNC) as a reference corpus. Using the MTC, a list of keywords of specific salience within the maritime discourse has been compiled and, using frequency data, concordances and collocations, these MEs are described in detail and their use and form in the MTC and the BNC is compared. The study examines the transformation from ME to figurative use in the general discourse, in terms of form and metaphoricity. MEs are classified according to their metaphorical strength and their transference from maritime usage into new registers and domains such as those of business, politics, sports and reportage etc. A revised model of metaphoricity is developed and a new category of figurative expression, the ‘resonator’, is proposed. Additionally, developing the work of Lakov and Johnson, Kovesces and others on Conceptual Metaphor Theory (CMT), a number of Maritime Conceptual Metaphors are identified and their cultural significance is discussed.
Resumo:
Research in social psychology has shown that public attitudes towards feminism are mostly based on stereotypical views linking feminism with leftist politics and lesbian orientation. It is claimed that such attitudes are due to the negative and sexualised media construction of feminism. Studies concerned with the media representation of feminism seem to confirm this tendency. While most of this research provides significant insights into the representation of feminism, the findings are often based on a small sample of texts. Also, most of the research was conducted in an Anglo-American setting. This study attempts to address some of the shortcomings of previous work by examining the discourse of feminism in a large corpus of German and British newspaper data. It does so by employing the tools of Corpus Linguistics. By investigating the collocation profiles of the search term feminism, we provide evidence of salient discourse patterns surrounding feminism in two different cultural contexts. © The Author(s) 2012.
A corpus-based regional dialect survey of grammatical variation in written standard American English
Resumo:
In this paper, I concentrate on court cases with litigants in person (lay people who act on their own behalf in legal proceedings without a counsel or solicitor) and discuss the challenges of building a corpus of courtroom discourse where it is crucial to distinguish between speakers due to their distinct institutional roles. The corpus incorporates seven sub-corpora of verbatim transcripts from different court cases with litigants in person and comprises over eleven-million tokens. The focus of this paper is on the interplay between the legal and lay discourse types and how judges project their institutional roles through well-initiated turns directed at litigants in person and counsels. As a versatile discourse marker, well provides a good opportunity to explore how judges have to adapt their roles to ensure lay litigants in person receive the necessary support and that their lack of competence does not impede on the fairness of the proceedings. Given the breadth and importance of the topic of litigation in person, I discuss how the tools and approaches of corpus linguistics can be helpful in this multi-disciplinary area where multiple functions and uses of individual linguistic features need to be explored in depth.
Resumo:
Introduction: Resveratrol (RVT) found in red wine protects against erectile dysfunction and relaxes penile tissue (corpus cavernosum) via a nitric oxide (NO) independent pathway. However, the mechanism remains to be elucidated. Hydrogen sulfide (H2S) is a potent vasodilator and neuromodulator generated in corpus cavernosum. Aims: We investigated whether RVT caused the relaxation of mice corpus cavernosum (MCC) through H2S. Methods: H2S formation is measured by methylene blue assay and vascular reactivity experiments have been performed by DMT strip myograph in CD1 MCC strips. Main Outcome Measures: Endothelial NO synthase (eNOS) inhibitor Nω-Nitro-L-arginine (L-NNA, 0.1mM) or H2S inhibitor aminooxyacetic acid (AOAA, 2mM) which inhibits both cystathionine-β-synthase (CBS) and cystathionine-gamma-lyase (CSE) enzyme or combination of AOAA with PAG (CSE inhibitor) has been used in the presence/absence of RVT (0.1mM, 30min) to elucidate the role of NO or H2S pathways on the effects of RVT in MCC. Concentration-dependent relaxations to RVT, L-cysteine, sodium hydrogen sulfide (NaHS) and acetylcholine (ACh) were studied. Results: Exposure of murine corpus cavernosum to RVT increased both basal and L-cysteine-stimulated H2S formation. Both of these effects were reversed by AOAA but not by L-NNA. RVT caused concentration-dependent relaxation of MCC and that RVT-induced relaxation was significantly inhibited by AOAA or AOAA+PAG but not by L-NNA. L-cysteine caused concentration-dependent relaxations, which are inhibited by AOAA or AOAA+PAG significantly. Incubation of MCC with RVT significantly increased L-cysteine-induced relaxation, and this effect was inhibited by AOAA+PAG. However, RVT did not alter the effect of exogenous H2S (NaHS) or ACh-induced relaxations. Conclusions: These results demonstrate that RVT-induced relaxation is at least partly dependent on H2S formation and acts independent of eNOS pathway. In phosphodiesterase 5 inhibitor (PDE-5i) nonresponder population, combination therapy with RVT may reverse erectile dysfunction via stimulating endogenous H2S formation. Yetik-Anacak G, Dereli MV, Sevin G, Ozzayim O, Erac Y, and Ahmed A. Resveratrol stimulates hydrogen sulfide (H2S) formation to relax murine corpus cavernosum.
Resumo:
Starting with a description of the software and hardware used for corpus linguistics in the late 1980s to early 1990s, this contribution discusses difficulties faced by the software designer when attempting to allow users to study text. Future human-machine interfaces may develop to be much more sophisticated, and certainly the aspects of text which can be studied will progress beyond plain text without images. Another area which will develop further is the study of patternings involving not just single words but word-relations across large stretches of text.
Resumo:
In this article I argue that the study of the linguistic aspects of epistemology has become unhelpfully focused on the corpus-based study of hedging and that a corpus-driven approach can help to improve upon this. Through focusing on a corpus of texts from one discourse community (that of genetics) and identifying frequent tri-lexical clusters containing highly frequent lexical items identified as keywords, I undertake an inductive analysis identifying patterns of epistemic significance. Several of these patterns are shown to be hedging devices and the whole corpus frequencies of the most salient of these, candidate and putative, are then compared to the whole corpus frequencies for comparable wordforms and clusters of epistemic significance. Finally I interviewed a ‘friendly geneticist’ in order to check my interpretation of some of the terms used and to get an expert interpretation of the overall findings. In summary I argue that the highly unexpected patterns of hedging found in genetics demonstrate the value of adopting a corpus-driven approach and constitute an advance in our current understanding of how to approach the relationship between language and epistemology.