8 resultados para Tokens
em Aston University Research Archive
Resumo:
Almost everyone who has an email account receives from time to time unwanted emails. These emails can be jokes from friends or commercial product offers from unknown people. In this paper we focus on these unwanted messages which try to promote a product or service, or to offer some “hot” business opportunities. These messages are called junk emails. Several methods to filter junk emails were proposed, but none considers the linguistic characteristics of junk emails. In this paper, we investigate the linguistic features of a corpus of junk emails, and try to decide if they constitute a distinct genre. Our corpus of junk emails was build from the messages received by the authors over a period of time. Initially, the corpus consisted of 1563, but after eliminating the duplications automatically we kept only 673 files, totalising just over 373,000 tokens. In order to decide if the junk emails constitute a different genre, a comparison with a corpus of leaflets extracted from BNC and with the whole BNC corpus is carried out. Several characteristics at the lexical and grammatical levels were identified.
Resumo:
This thesis begins with a sociolinguistic correlational study of three phonetic variables - (h), (t) and (ing) - as used by four occupational groups - nurses, chefs, hairdressers and taxi-drivers. The groups were selected to incorporate three independent variables: sex (male-dominated versus female-dominated occupations); training (length and specialisation - nurses and chefs being more specialised than hairdressers and taxi-drivers) and location (the populations were selected from two cities - Liverpool and Birmingham). Although the correlational work demonstrates intra-sex and occupation consistency in speakers' choice of linguistic variants (females (particularly nurses) being significantly closer to the prestige norm), it is essentially non-explanatory and cannot accout for narrative dynamics and style shift. Therefore, an in-depth qualitative examination of the data (which draws mainly on Narrative and Discourse Analysis) forms the major part of the analysis. The study first analyses features common to all the narratives, direct speech, expressive phonology and linguistic ambiguity emerging as characteristic of all humorous storytelling. Secondly, three major sources of inter-personal variation are invetigated: narrator perspective, sex and occuptational role. Perspective is found to vary with topic and personality, greater narrator involvement coinciding with a higher proportion of internal evaluation devices. Sex differences include topic choice and bonding in the storytelling sessions. Sex differences are also evident in style shifting, where the narrator mimics the voice of a character in the narrative (aodpting segmental and/or prosodic tokens to signal a change of persona). The research finds that female narrators rarely employ segmental accommodation downwards on the social scale (whereas men do), but are on the other hand adept at using prosodic effects for mimicry. Taxi-drivers emerge as the group with the most distinctive narrative flair, a fact which is related to their occupation. The conclusion stresses a need for both quantitative and qualitative approaches to data; the importance of occupational role, as opposed to sex role per se in determining narrative conventions; the view of narrative as a negotiable entity, which is the product of relationships among participants; and the importance of considering the totality of the communicative act.
Resumo:
Based on a corpus of English, German, and Polish spoken academic discourse, this article analyzes the distribution and function of humor in academic research presentations. The corpus is the result of a European research cooperation project consisting of 300,000 tokens of spoken academic language, focusing on the genres research presentation, student presentation, and oral examination. The article investigates difference between the German and English research cultures as expressed in the genre of specialist research presentations, and the role of humor as a pragmatic device in their respective contexts. The data is analyzed according to the paradigms of corpus-assisted discourse studies (CADS). The findings show that humor is used in research presentations as an expression of discourse reflexivity. They also reveal a considerable difference in the quantitative distribution of humor in research presentations depending on the educational, linguistic, and cultural background of the presenters, thus confirming the notion of different research cultures. Such research cultures nurture distinct attitudes to genres of academic language: whereas in one of the cultures identified researchers conform with the constraints and structures of the genre, those working in another attempt to subvert them, for example by the application of humor. © 2012 Elsevier B.V.
Resumo:
Despite the growth of spoken academic corpora in recent years, relatively little is known about the language of seminar discussions in higher education. This thesis compares seminar discussions across three disciplinary areas. The aim of this thesis is to uncover the functions and patterns of talk used in different disciplinary discussions and to highlight language on a macro and micro level that would be useful for materials design and teaching purposes. A framework for identifying and analysing genres in spoken language based on Hallidayan Systemic Functional Linguistics (SFL) is used. Stretches of talk sharing a similar purpose and predictable functional staging, termed Discussion Macro Genres (DMGs) are identified. Language is compared across DMGs and across disciplines through use of corpus techniques in conjunction with SFL genre theory. Data for the study comprises just over 180,000 tokens and is drawn from the British Academic Spoken English corpus (BASE), recorded at two universities in the UK. The discipline areas investigated are Arts and Humanities, Social Sciences and Physical Sciences. Findings from this study make theoretical, empirical and methodological contributions to the field of spoken EAP. The empirical findings are firstly, that the majority of the seminar discussion can be assigned to one of the three main DMG in the corpus: Responding, Debating and Problem Solving. Secondly, it characterises each discipline area according to two DMGs. Thirdly, the majority of the discussion is non-oppositional in nature, suggesting that ‘debate’ is not the only form of discussion that students need to be prepared for. Finally, while some characteristics of the discussion are tied to the DMG and common across disciplines, others are discipline specific. On a theoretical level, this study shows that an SFL genre model for investigating spoken discourse can be successfully extended to investigate longer stretches of discourse than have previously been identified. The methodological contribution is to demonstrate how corpus techniques can be combined with SFL genre theory to investigate extended stretches of spoken discussion. The thesis will be of value to those working in the field of teaching spoken EAP/ ESAP as well as to materials developers.
Resumo:
This thesis examines the ways Indonesian politicians exploit the rhetorical power of metaphors in the Indonesian political discourse. The research applies the Conceptual Metaphor Theory, Metaphorical Frame Analysis and Critical Discourse Analysis to textual and oral data. The corpus comprises: 150 political news articles from two newspapers (Harian Kompas and Harian Waspada, 2010-2011 edition), 30 recordings of two television news and talk-show programmes (TV-One and Metro-TV), and 20 interviews with four legislators, two educated persons and two laymen. For this study, a corpus of written bahasa Indonesia was also compiled, which comprises 150 texts of approximately 439,472 tokens. The data analysis shows the potential power of metaphors in relation to how politicians communicate the results of their thinking, reasoning and meaning-making through language and discourse and its social consequences. The data analysis firstly revealed 1155 metaphors. These metaphors were then classified into the categories of conventional metaphor, cognitive function of metaphor, metaphorical mapping and metaphor variation. The degree of conventionality of metaphors is established based on the sum of expressions in each group of metaphors. Secondly, the analysis revealed that metaphor variation is influenced by the broader Indonesian cultural context and the natural and physical environment, such as the social dimension, the regional, style and the individual. The mapping system of metaphor is unidirectionality. Thirdly, the data show that metaphoric thought pervades political discourse in relation to its uses as: (1) a felicitous tool for the rhetoric of political leaders, (2) part of meaning-making that keeps the discourse contexts alive and active, and (3) the degree to which metaphor and discourse shape the conceptual structures of politicians‟ rhetoric. Fourthly, the analysis of data revealed that the Indonesian political discourse attempts to create both distance and solidarity towards general and specific social categories accomplished via metaphorical and frame references to the conceptualisations of us/them. The result of the analysis shows that metaphor and frame are excellent indicators of the us/them categories which work dialectically in the discourse. The acts of categorisation via metaphors and frames at both textual and conceptual level activate asymmetrical concepts and contribute to social and political hierarchical constructs, i.e. WEAKNESS vs.POWER, STUDENT vs. TEACHER, GHOST vs. CHOSEN WARRIOR, and so on. This analysis underscores the dynamic nature of categories by documenting metaphorical transfers between, i.e. ENEMY, DISEASE, BUSINESS, MYSTERIOUS OBJECT and CORRUPTION, LAW, POLITICS and CASE. The metaphorical transfers showed that politicians try to dictate how they categorise each other in order to mobilise audiences to act on behalf of their ideologies and to create distance and solidarity.
Resumo:
This study explored the role of formant transitions and F0-contour continuity in binding together speech sounds into a coherent stream. Listening to a repeating recorded word produces verbal transformations to different forms; stream segregation contributes to this effect and so it can be used to measure changes in perceptual coherence. In experiment 1, monosyllables with strong formant transitions between the initial consonant and following vowel were monotonized; each monosyllable was paired with a weak-transitions counterpart. Further stimuli were derived by replacing the consonant-vowel transitions with samples from adjacent steady portions. Each stimulus was concatenated into a 3-min-long sequence. Listeners only reported more forms in the transitions-removed condition for strong-transitions words, for which formant-frequency discontinuities were substantial. In experiment 2, the F0 contour of all-voiced monosyllables was shaped to follow a rising or falling pattern, spanning one octave. Consecutive tokens either had the same contour, giving an abrupt F0 change between each token, or alternated, giving a continuous contour. Discontinuous sequences caused more transformations and forms, and shorter times to the first transformation. Overall, these findings support the notion that continuity cues provided by formant transitions and the F0 contour play an important role in maintaining the perceptual coherence of speech.
Resumo:
Short text messages a.k.a Microposts (e.g. Tweets) have proven to be an effective channel for revealing information about trends and events, ranging from those related to Disaster (e.g. hurricane Sandy) to those related to Violence (e.g. Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond. In this work we study the problem of topic classification (TC) of Microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of Microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to Microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of Microposts with features extracted only from the Microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of Microposts. Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen Microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and Microposts at a conceptual level, considering the enriched representation of these documents. Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures. © 2014 Elsevier B.V. All rights reserved.
Resumo:
In this paper, I concentrate on court cases with litigants in person (lay people who act on their own behalf in legal proceedings without a counsel or solicitor) and discuss the challenges of building a corpus of courtroom discourse where it is crucial to distinguish between speakers due to their distinct institutional roles. The corpus incorporates seven sub-corpora of verbatim transcripts from different court cases with litigants in person and comprises over eleven-million tokens. The focus of this paper is on the interplay between the legal and lay discourse types and how judges project their institutional roles through well-initiated turns directed at litigants in person and counsels. As a versatile discourse marker, well provides a good opportunity to explore how judges have to adapt their roles to ensure lay litigants in person receive the necessary support and that their lack of competence does not impede on the fairness of the proceedings. Given the breadth and importance of the topic of litigation in person, I discuss how the tools and approaches of corpus linguistics can be helpful in this multi-disciplinary area where multiple functions and uses of individual linguistic features need to be explored in depth.