874 resultados para CORPUS-LUTEUM


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thematization is recognized as a fundamental phenomenon in the construction of messages and texts by di erent linguistic schools. This location within a text privileges the elements that guide the reader in the orientation and interpretation of discourse at di erent levels. Thematizing a linguistic unit by locating it in the rst-initial position of a clause, paragraph, or text, confers upon it a special status: a signal of the organizational strategy which characterizes di erent text types playing a role as a variable in the distinction of registers, text types and genres. However, in spite of the importance of the study of thematization for message and textual structuring, to date there are no linguistic studies that have undertook the task of validating its aspects in a comparative manner, either for linguistic or computational purposes. This study, therefore, lls a research gap by implementing a methodology based on contrastive corpus annotation, which allows to empirically validate aspects of the phenomenon of Thematization in English and Spanish, it also seeks to develop a bilingual English-Spanish comparable corpus of newspaper texts automatically annotated with thematic features at clausal and discourse levels. The empirically validated categories (Thematic Field and its elements: Textual Theme, Interpersonal Theme, PreHead and Head) are used to annotate a larger corpus of three newspaper genres news reports, editorials and letters to the editor in terms of thematic choices. This characterization, reveals interesting results, such as the use of genre-speci c strategies in thematic position. In addition, the thesis investigates the possibility to automate the annotation of thematic features in the bilingual corpus through the development of a set of JAVA rules implemented in GATE. It also shows the e cacy of this method in comparison with the manual annotation results...

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El amplio dominio del inglés como lengua global también está dejando huella en el mundo académico. En un principio fue la lengua a través de la cual se realizaba gran parte de la investigación y la publicación de los conocimientos específicos de cada disciplina académica. Actualmente también se está convirtiendo gradualmente en lengua de instrucción. A pesar de que en numerosos contextos a lo largo de la historia la enseñanza a través de una lengua extranjera ha sido más la regla que la excepción, las repercusiones que está causando a todos los niveles (político, económico, social, educativo y pedagógico) hacen de este fenómeno educativo un objeto necesario de investigación. Uno de los principales factores que han llevado a la adopción del inglés como lengua de instrucción en la educación superior ha sido la internacionalización de la universidad. Además, puesto que su implementación ya constituye una práctica extendida y aceptada en previos niveles educativos debido a la expansión del aprendizaje integrado de contenidos y lengua (AICLE) en primaria y secundaria (Dafouz & Guerrini, 2009; Dalton-Puffer, Nikula & Smit, 2010), continuar con este enfoque parece una elección lógica y, en principio, no muy costosa y problemática (Coleman, 2013: XIV). A este hecho hay que sumarle el factor competitivo que lleva a las universidades a atraer a estudiantes nacionales e internacionales, a profesores e investigadores con vocación y éxito de todas las partes del mundo y a alumnos de postgrado con talento con el objeto de incentivar la reputación y el prestigio de la universidad (Graddol, 2006; Ramos, 2013; Dafouz, 2015)...

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Entre 1598 y 1606 la procesión del Corpus en Antequera quedó suspendida por desavenencias entre los miembros del concejo y el clero de la Colegiata por ocupar los mejores puestos y más cercanos a la Sagrada Forma para significar así su posición predominante en la sociedad del momento. Las alteraciones a describir en el presente artículo forman parte del proceso de institucionalización de una fiesta, cuyo culmen estético y simbólico se confirma durante el siglo XVII, bajo la ideología contrarreformista de Trento. Between 1598 and 1606 the Corpus Christi’s procession in Antequera was suspended because of disagreements between the members of the Council and the clergy. These disagreements focused on the matter of who must occupy the best positions in the procession (that is: as close to the Communion Bread as possible). These locations signify the status in that society. The changes described in this paper are part of the process of institutionalization of a festivity, which reach its crowning moment (aesthetically and symbolically) during the seventeenth century, under the ideology of Trento.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Little is known about Ancient Arabia before the arrival of Islam as it was an area with few inhabited settlements and it was mostly a passageway for traders. In those inhabited settlements we could find some settled Arabs, but the prevailing life style was that of the rest of the population, nomadic Bedouin Arabs who travelled from place to place looking for water and pasture for their cattle, which they lived off. The desert was their natural habitat, a hostile environment full of danger where life was not easy. Camel taming made it possible for them to live that nomadic lifestyle, and the Bedouins became inseparable from their camels and from their horses and cattle. In order to make a living they worked as hunters, transported caravans, and plundered too. In the pre-Islamic era, knowledge was transmitted by oral communication, so very little written information about that time and place remains. One thing that has been handed down are proverbs, which after the 8th Century started to be collected by several writers in various written works. Given the characteristics of those proverbs, which are conserved almost intact from their origins, we can learn much about the lifestyle in Ancient Arabia. What is to be investigated within this thesis is whether through Paremiology it is possible to learn more about this area at this historic moment that precedes the arrival of Islam, and the first years of this religion. To learn about history, we usually rely on historians and palaeontologists, but this work will demonstrate that through Paremiology it is possible to know other aspects of culture, their knowledge, the way of life, thinking, society, etc...

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A distinct metonymic pattern was discovered in the course of conducting a corpus-based study of figurative uses of WORD. The pattern involved examples such as Not one word of it made any sense and I agree with every word. It was labelled ‘hyperbolic synecdoche’, defined as a case in which a lexeme which typically refers to part of an entity (a) is used to stand for the whole entity and (b) is described with reference to the end point on a scale. Specifically, the speaker/writer selects the perspective of a lower-level unit (such as word for ‘utterance’), which is quantified as NOTHING or ALL, thus forming a subset of ‘extreme case formulations’. Hyperbolic synecdoche was found to exhibit a restricted range of lexicogrammatical patterns involving word, with the negated NOTHING patterns being considerably more common than the ALL patterns. The phenomenon was shown to be common in metonymic uses in general, constituting one-fifth of all cases of metonymy in word. The examples of hyperbolic synecdoche were found not to be covered by the oftquoted ‘abbreviation’ rationale for metonymy; instead, they represent a more roundabout way of expression. It is shown that other cases of hyperbolic synecdoche exist outside of word and the domain of communication (such as ‘time’ and ‘money’).

Relevância:

20.00% 20.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The effectiveness of higher-order spectral (HOS) phase features in speaker recognition is investigated by comparison with Mel Cepstral features on the same speech data. HOS phase features retain phase information from the Fourier spectrum unlikeMel–frequency Cepstral coefficients (MFCC). Gaussian mixture models are constructed from Mel– Cepstral features and HOS features, respectively, for the same data from various speakers in the Switchboard telephone Speech Corpus. Feature clusters, model parameters and classification performance are analyzed. HOS phase features on their own provide a correct identification rate of about 97% on the chosen subset of the corpus. This is the same level of accuracy as provided by MFCCs. Cluster plots and model parameters are compared to show that HOS phase features can provide complementary information to better discriminate between speakers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we analyse a 600,000 word corpus comprised of policy statements produced within supranational, national, state and local legislatures about the nature and causes of(un)employment. We identify significant rhetorical and discursive features deployed by third sector (un)employment policy authors that function to extend their legislative grasp to encompass the most intimate aspects of human association.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this article I outline and demonstrate a synthesis of the methods developed by Lemke (1998) and Martin (2000) for analyzing evaluations in English. I demonstrate the synthesis using examples from a 1.3-million-word technology policy corpus drawn from institutions at the local, state, national, and supranational levels. Lemke's (1998) critical model is organized around the broad 'evaluative dimensions' that are deployed to evaluate propositions and proposals in English. Martin's (2000) model is organized with a more overtly systemic-functional orientation around the concept of 'encoded feeling'. In applying both these models at different times, whilst recognizing their individual usefulness and complementarity, I found specific limitations that led me to work towards a synthesis of the two approaches. I also argue for the need to consider genre, media, and institutional aspects more explicitly when claiming intertextual and heteroglossic relations as the basis for inferred evaluations. A basic assertion made in this article is that the perceived Desirability of a process, person, circumstance, or thing is identical to its 'value'. But the Desirability of anything is a socially and thus historically conditioned attribution that requires significant amounts of institutional inculcation of other 'types' of value-appropriateness, importance, beauty, power, and so on. I therefore propose a method informed by critical discourse analysis (CDA) that sees evaluation as happening on at least four interdependent levels of abstraction.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Although internet chat is a significant aspect of many internet users’ lives, the manner in which participants in quasi-synchronous chat situations orient to issues of social and moral order remains to be studied in depth. The research presented here is therefore at the forefront of a continually developing area of study. This work contributes new insights into how members construct and make accountable the social and moral orders of an adult-oriented Internet Relay Chat (IRC) channel by addressing three questions: (1) What conversational resources do participants use in addressing matters of social and moral order? (2) How are these conversational resources deployed within IRC interaction? and (3) What interactional work is locally accomplished through use of these resources? A survey of the literature reveals considerable research in the field of computer-mediated communication, exploring both asynchronous and quasi-synchronous discussion forums. The research discussed represents a range of communication interests including group and collaborative interaction, the linguistic construction of social identity, and the linguistic features of online interaction. It is suggested that the present research differs from previous studies in three ways: (1) it focuses on the interaction itself, rather than the ways in which the medium affects the interaction; (2) it offers turn-by-turn analysis of interaction in situ; and (3) it discusses membership categories only insofar as they are shown to be relevant by participants through their talk. Through consideration of the literature, the present study is firmly situated within the broader computer-mediated communication field. Ethnomethodology, conversation analysis and membership categorization analysis were adopted as appropriate methodological approaches to explore the research focus on interaction in situ, and in particular to investigate the ways in which participants negotiate and co-construct social and moral orders in the course of their interaction. IRC logs collected from one chat room were analysed using a two-pass method, based on a modification of the approaches proposed by Pomerantz and Fehr (1997) and ten Have (1999). From this detailed examination of the data corpus three interaction topics are identified by means of which participants clearly orient to issues of social and moral order: challenges to rule violations, ‘trolling’ for cybersex, and experiences regarding the 9/11 attacks. Instances of these interactional topics are subjected to fine-grained analysis, to demonstrate the ways in which participants draw upon various interactional resources in their negotiation and construction of channel social and moral orders. While these analytical topics stand alone in individual focus, together they illustrate different instances in which participants’ talk serves to negotiate social and moral orders or collaboratively construct new orders. Building on the work of Vallis (2001), Chapter 5 illustrates three ways that rule violation is initiated as a channel discussion topic: (1) through a visible violation in open channel, (2) through an official warning or sanction by a channel operator regarding the violation, and (3) through a complaint or announcement of a rule violation by a non-channel operator participant. Once the topic has been initiated, it is shown to become available as a topic for others, including the perceived violator. The fine-grained analysis of challenges to rule violations ultimately demonstrates that channel participants orient to the rules as a resource in developing categorizations of both the rule violation and violator. These categorizations are contextual in that they are locally based and understood within specific contexts and practices. Thus, it is shown that compliance with rules and an orientation to rule violations as inappropriate within the social and moral orders of the channel serves two purposes: (1) to orient the speaker as a group member, and (2) to reinforce the social and moral orders of the group. Chapter 6 explores a particular type of rule violation, solicitations for ‘cybersex’ known in IRC parlance as ‘trolling’. In responding to trolling violations participants are demonstrated to use affiliative and aggressive humour, in particular irony, sarcasm and insults. These conversational resources perform solidarity building within the group, positioning non-Troll respondents as compliant group members. This solidarity work is shown to have three outcomes: (1) consensus building, (2) collaborative construction of group membership, and (3) the continued construction and negotiation of existing social and moral orders. Chapter 7, the final data analysis chapter, offers insight into how participants, in discussing the events of 9/11 on the actual day, collaboratively constructed new social and moral orders, while orienting to issues of appropriate and reasonable emotional responses. This analysis demonstrates how participants go about ‘doing being ordinary’ (Sacks, 1992b) in formulating their ‘first thoughts’ (Jefferson, 2004). Through sharing their initial impressions of the event, participants perform support work within the interaction, in essence working to normalize both the event and their initial misinterpretation of it. Normalising as a support work mechanism is also shown in relation to participants constructing the ‘quiet’ following the event as unusual. Normalising is accomplished by reference to the indexical ‘it’ and location formulations, which participants use both to negotiate who can claim to experience the ‘unnatural quiet’ and to identify the extent of the quiet. Through their talk participants upgrade the quiet from something legitimately experienced by one person in a particular place to something that could be experienced ‘anywhere’, moving the phenomenon from local to global provenance. With its methodological design and detailed analysis and findings, this research contributes to existing knowledge in four ways. First, it shows how rules are used by participants as a resource in negotiating and constructing social and moral orders. Second, it demonstrates that irony, sarcasm and insults are three devices of humour which can be used to perform solidarity work and reinforce existing social and moral orders. Third, it demonstrates how new social and moral orders are collaboratively constructed in relation to extraordinary events, which serve to frame the event and evoke reasonable responses for participants. And last, the detailed analysis and findings further support the use of conversation analysis and membership categorization as valuable methods for approaching quasi-synchronous computer-mediated communication.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we propose an unsupervised segmentation approach, named "n-gram mutual information", or NGMI, which is used to segment Chinese documents into n-character words or phrases, using language statistics drawn from the Chinese Wikipedia corpus. The approach alleviates the tremendous effort that is required in preparing and maintaining the manually segmented Chinese text for training purposes, and manually maintaining ever expanding lexicons. Previously, mutual information was used to achieve automated segmentation into 2-character words. The NGMI approach extends the approach to handle longer n-character words. Experiments with heterogeneous documents from the Chinese Wikipedia collection show good results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.