867 resultados para corpus, collocations, corpus linguistics, EPTIC
Resumo:
This paper is a progress report on a research path I first outlined in my contribution to “Words in Context: A Tribute to John Sinclair on his Retirement” (Heffer and Sauntson, 2000). Therefore, I first summarize that paper here, in order to provide the relevant background. The second half of the current paper consists of some further manual analyses, exploring various parameters and procedures that might assist in the design of an automated computational process for the identification of lexical sets. The automation itself is beyond the scope of the current paper.
Resumo:
Almost everyone who has an email account receives from time to time unwanted emails. These emails can be jokes from friends or commercial product offers from unknown people. In this paper we focus on these unwanted messages which try to promote a product or service, or to offer some “hot” business opportunities. These messages are called junk emails. Several methods to filter junk emails were proposed, but none considers the linguistic characteristics of junk emails. In this paper, we investigate the linguistic features of a corpus of junk emails, and try to decide if they constitute a distinct genre. Our corpus of junk emails was build from the messages received by the authors over a period of time. Initially, the corpus consisted of 1563, but after eliminating the duplications automatically we kept only 673 files, totalising just over 373,000 tokens. In order to decide if the junk emails constitute a different genre, a comparison with a corpus of leaflets extracted from BNC and with the whole BNC corpus is carried out. Several characteristics at the lexical and grammatical levels were identified.
Resumo:
A set of full-color images of objects is described for use in experiments investigating the effects of in-depth rotation on the identification of three-dimensional objects. The corpus contains up to 11 perspective views of 70 nameable objects. We also provide ratings of the "goodness" of each view, based on Thurstonian scaling of subjects' preferences in a paired-comparison experiment. An exploratory cluster analysis on the scaling solutions indicates that the amount of information available in a given view generally is the major determinant of the goodness of the view. For instance, objects with an elongated front-back axis tend to cluster together, and the front and back views of these objects, which do not reveal the object's major surfaces and features, are evaluated as the worst views.
Resumo:
Based on a corpus of English, German, and Polish spoken academic discourse, this article analyzes the distribution and function of humor in academic research presentations. The corpus is the result of a European research cooperation project consisting of 300,000 tokens of spoken academic language, focusing on the genres research presentation, student presentation, and oral examination. The article investigates difference between the German and English research cultures as expressed in the genre of specialist research presentations, and the role of humor as a pragmatic device in their respective contexts. The data is analyzed according to the paradigms of corpus-assisted discourse studies (CADS). The findings show that humor is used in research presentations as an expression of discourse reflexivity. They also reveal a considerable difference in the quantitative distribution of humor in research presentations depending on the educational, linguistic, and cultural background of the presenters, thus confirming the notion of different research cultures. Such research cultures nurture distinct attitudes to genres of academic language: whereas in one of the cultures identified researchers conform with the constraints and structures of the genre, those working in another attempt to subvert them, for example by the application of humor. © 2012 Elsevier B.V.
Resumo:
This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’. The corpus represents his concern with the nature of linguistic evidence. Lexicography is for him the canonical mode of language description at the lexical level. And his belief that the corpus should ‘drive’ the description is reflected in his constant attempts to utilize the emergent computer technologies to automate the initial stages of analysis and defer the intuitive, interpretative contributions of linguists to increasingly later stages in the process. Sinclair’s model of corpus-driven lexicography has spread far beyond its initial implementation at Cobuild, to most EFL dictionaries, to native-speaker dictionaries (e.g. the New Oxford Dictionary of English, and many national language dictionaries in emerging or re-emerging speech communities) and bilingual dictionaries (e.g. Collins, Oxford-Hachette).
Resumo:
UK universities are accepting increasing numbers of students whose L1 is not English on a wide range of programmes at all levels. These students require additional support and training in English, focussing on their academic disciplines. Corpora have been used in EAP since the 1980s, mainly for research, but a growing number of researchers and practitioners have been advocating the use of corpora in EAP pedagogy, and such use is gradually increasing. This paper outlines the processes and factors to be considered in the design and compilation of an EAP corpus (e.g., the selection and acquisition of texts, metadata, data annotation, software tools and outputs, web interface, and screen displays), especially one intended to be used for teaching. Such a corpus would also facilitate EAP research in terms of longitudinal studies, student progression and development, and course and materials design. The paper has been informed by the preparatory work on the EAP subcorpus of the ACORN corpus project at Aston University. © 2007 Elsevier Ltd. All rights reserved.
Resumo:
This paper describes the followed methodology to automatically generate titles for a corpus of questions that belong to sociological opinion polls. Titles for questions have a twofold function: (1) they are the input of user searches and (2) they inform about the whole contents of the question and possible answer options. Thus, generation of titles can be considered as a case of automatic summarization. However, the fact that summarization had to be performed over very short texts together with the aforementioned quality conditions imposed on new generated titles led the authors to follow knowledge-rich and domain-dependent strategies for summarization, disregarding the more frequent extractive techniques for summarization.
A corpus-based regional dialect survey of grammatical variation in written standard American English
Resumo:
Introduction: Resveratrol (RVT) found in red wine protects against erectile dysfunction and relaxes penile tissue (corpus cavernosum) via a nitric oxide (NO) independent pathway. However, the mechanism remains to be elucidated. Hydrogen sulfide (H2S) is a potent vasodilator and neuromodulator generated in corpus cavernosum. Aims: We investigated whether RVT caused the relaxation of mice corpus cavernosum (MCC) through H2S. Methods: H2S formation is measured by methylene blue assay and vascular reactivity experiments have been performed by DMT strip myograph in CD1 MCC strips. Main Outcome Measures: Endothelial NO synthase (eNOS) inhibitor Nω-Nitro-L-arginine (L-NNA, 0.1mM) or H2S inhibitor aminooxyacetic acid (AOAA, 2mM) which inhibits both cystathionine-β-synthase (CBS) and cystathionine-gamma-lyase (CSE) enzyme or combination of AOAA with PAG (CSE inhibitor) has been used in the presence/absence of RVT (0.1mM, 30min) to elucidate the role of NO or H2S pathways on the effects of RVT in MCC. Concentration-dependent relaxations to RVT, L-cysteine, sodium hydrogen sulfide (NaHS) and acetylcholine (ACh) were studied. Results: Exposure of murine corpus cavernosum to RVT increased both basal and L-cysteine-stimulated H2S formation. Both of these effects were reversed by AOAA but not by L-NNA. RVT caused concentration-dependent relaxation of MCC and that RVT-induced relaxation was significantly inhibited by AOAA or AOAA+PAG but not by L-NNA. L-cysteine caused concentration-dependent relaxations, which are inhibited by AOAA or AOAA+PAG significantly. Incubation of MCC with RVT significantly increased L-cysteine-induced relaxation, and this effect was inhibited by AOAA+PAG. However, RVT did not alter the effect of exogenous H2S (NaHS) or ACh-induced relaxations. Conclusions: These results demonstrate that RVT-induced relaxation is at least partly dependent on H2S formation and acts independent of eNOS pathway. In phosphodiesterase 5 inhibitor (PDE-5i) nonresponder population, combination therapy with RVT may reverse erectile dysfunction via stimulating endogenous H2S formation. Yetik-Anacak G, Dereli MV, Sevin G, Ozzayim O, Erac Y, and Ahmed A. Resveratrol stimulates hydrogen sulfide (H2S) formation to relax murine corpus cavernosum.
Resumo:
In this article I argue that the study of the linguistic aspects of epistemology has become unhelpfully focused on the corpus-based study of hedging and that a corpus-driven approach can help to improve upon this. Through focusing on a corpus of texts from one discourse community (that of genetics) and identifying frequent tri-lexical clusters containing highly frequent lexical items identified as keywords, I undertake an inductive analysis identifying patterns of epistemic significance. Several of these patterns are shown to be hedging devices and the whole corpus frequencies of the most salient of these, candidate and putative, are then compared to the whole corpus frequencies for comparable wordforms and clusters of epistemic significance. Finally I interviewed a ‘friendly geneticist’ in order to check my interpretation of some of the terms used and to get an expert interpretation of the overall findings. In summary I argue that the highly unexpected patterns of hedging found in genetics demonstrate the value of adopting a corpus-driven approach and constitute an advance in our current understanding of how to approach the relationship between language and epistemology.