14 resultados para Comparable Corpora

em Aston University Research Archive


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study presents a detailed contrastive description of the textual functioning of connectives in English and Arabic. Particular emphasis is placed on the organisational force of connectives and their role in sustaining cohesion. The description is intended as a contribution for a better understanding of the variations in the dominant tendencies for text organisation in each language. The findings are expected to be utilised for pedagogical purposes, particularly in improving EFL teaching of writing at the undergraduate level. The study is based on an empirical investigation of the phenomenon of connectivity and, for optimal efficiency, employs computer-aided procedures, particularly those adopted in corpus linguistics, for investigatory purposes. One important methodological requirement is the establishment of two comparable and statistically adequate corpora, also the design of software and the use of existing packages and to achieve the basic analysis. Each corpus comprises ca 250,000 words of newspaper material sampled in accordance to a specific set of criteria and assembled in machine readable form prior to the computer-assisted analysis. A suite of programmes have been written in SPITBOL to accomplish a variety of analytical tasks, and in particular to perform a battery of measurements intended to quantify the textual functioning of connectives in each corpus. Concordances and some word lists are produced by using OCP. Results of these researches confirm the existence of fundamental differences in text organisation in Arabic in comparison to English. This manifests itself in the way textual operations of grouping and sequencing are performed and in the intensity of the textual role of connectives in imposing linearity and continuity and in maintaining overall stability. Furthermore, computation of connective functionality and range of operationality has identified fundamental differences in the way favourable choices for text organisation are made and implemented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Language learners ask a variety of questions about words and their meanings and uses: “What does X mean? What is the word for X in English? Can you say X? When do you use X and when do you use Y (e.g. synonyms, grammatical structures, prepositional choices, variant phrases, etc)?”

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Corpora amylacea (CA) are spherical or ovoid bodies 50-50 microns in diameter. They have been described in normal elderly brain as well as in a number of neurodegenerative disorders. In this study, the incidence of CA in the optic nerves of Alzheimer's disease (AD) patients was compared with normal elderly controls. Samples of optic nerves (MRC Brain Bank, Institute of Psychiatry) were taken from 12 AD patients (age range 69-94 years) and 18 controls (43-82 years). Optic nerves were fixed in 2% buffered glutaraldehyde, post-fixed in osmium tetroxide, embedded in epoxy resin and then sectioned to a thickness of 2 microns. Sections were stained with toluidine blue. CA were present in all of the optic nerves examined. In addition, a number of similarly stained but more irregularly shaped bodies were present. Fewer CA were found in the optic nerves of AD patients compared with controls. By contrast, the number or irregularly shaped bodies was increased in AD. In AD, there may be a preferential decline in the large diameter fibres which may mediate the M-cell pathway. Hence, the decline in the incidence of CA in AD may be associated with a reduction in these fibres. It is also possible that the irregualrly shaped bodies are a degeneration product of the CA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentiment-topic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We show a new method for term extraction from a domain relevant corpus using natural language processing for the purposes of semi-automatic ontology learning. Literature shows that topical words occur in bursts. We find that the ranking of extracted terms is insensitive to the choice of population model, but calculating frequencies relative to the burst size rather than the document length in words yields significantly different results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Translation training in the university context needs to train students in the processes, in order to enhance and optimise the product as outcome of these processes. Evaluation of a target text as product has often been accused of being a subjective process, which does not easily lend itself to the type of feedback that could enable students to apply criteria more widely. For students, it often seems as though they make different inappropriate or incorrect choices every time they translate a new text, and the learning process appears unpredictable and haphazard. Within functionalist approaches to translation, with their focus on the target text in terms of functional adequacy to the intended purpose, as stipulated in the translation brief, there are guidelines for text production that can help to develop a more systematic approach not only to text production, but also to translation evaluation. In the context of a focus on user knowledge needs, target language conventions and acceptability, the use of corpora is an indispensable tool for the trainee translator. Evaluation can take place against the student's own reasoned selection process, based on hard evidence, against criteria which currently obtain in the TL and the TL culture. When trainee and evaluator work within the same guidelines, there is more scope for constructive learning and feedback.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article I argue that the study of the linguistic aspects of epistemology has become unhelpfully focused on the corpus-based study of hedging and that a corpus-driven approach can help to improve upon this. Through focusing on a corpus of texts from one discourse community (that of genetics) and identifying frequent tri-lexical clusters containing highly frequent lexical items identified as keywords, I undertake an inductive analysis identifying patterns of epistemic significance. Several of these patterns are shown to be hedging devices and the whole corpus frequencies of the most salient of these, candidate and putative, are then compared to the whole corpus frequencies for comparable wordforms and clusters of epistemic significance. Finally I interviewed a ‘friendly geneticist’ in order to check my interpretation of some of the terms used and to get an expert interpretation of the overall findings. In summary I argue that the highly unexpected patterns of hedging found in genetics demonstrate the value of adopting a corpus-driven approach and constitute an advance in our current understanding of how to approach the relationship between language and epistemology.