974 resultados para Corpus Linguistic


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As increasing numbers of Chinese language learners choose to learn English online (CNNIC, 2012), there is a need to investigate popular websites and their language learning designs. This paper reports on the first stage of a study that analysed the pedagogical, linguistic and content features of 25 Chinese English Language Learning (ELL) websites ranked according to their value and importance to users. The website ranking was undertaken using a system known as PageRank. The aim of the study was to identify the features characterising popular sites as opposed to those of less popular sites for the purpose of producing a framework for ELL website design in the Chinese context. The study found that a pedagogical focus with developmental instructional materials accommodating diverse proficiency levels was a major contributor to website popularity. Chinese language use for translations and teaching directives and intermediate level English for learning materials were also significant features. Content topics included Anglophone/Western and non-Anglophone/Eastern contexts. Overall, popular websites were distinguished by their mediation of access to and scaffolded support for ELL.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a framework to analyse performance on multiple choice questions with the focus on linguistic factors. Item Response Theory (IRT) is deployed to estimate ability and question difficulty levels. A logistic regression model is used to detect Differential Item Functioning questions. Probit models testify relationships between performance and linguistic factors controlling the effects of question construction and students’ background. Empirical results have important implications. The lexical density of stems affects performance. The use of non-Economics specialised vocabulary has differing impacts on the performance of students with different language backgrounds. The IRT-based ability and difficulty help explain performance variations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The linguistic turn within philosophy has recently gained increased attention within social sciences. It can be seen as an attempt to investigate traditional philosophical problems by analysing the linguistic expressions used for these investigations. More generally, the phenomenon of language itself must be considered because of its (constitutional) impact on the investigation of phenomena in social sciences. In order to understand the consequences of the linguistic turn, its origins in philosophy are important and will be discussed. Within social sciences the linguistic turn already had significant impact. As an example, we will therefore discuss what directions the linguistic turn enabled for organizational analysis. Information Systems as a discipline must face the consequences of the linguistic turn as well. We will discuss how the linguistic framework introduced impacts the development of knowledge management and that of managerial and organizational support systems. This example shows what different perspectives the linguistic turn can provide for investigations within Information Systems. In addition, we will briefly outline the impact of the linguistic turn with respect to methodologies in Information Systems research.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many successful query expansion techniques ignore information about the term dependencies that exist within natural language. However, researchers have recently demonstrated that consistent and significant improvements in retrieval effectiveness can be achieved by explicitly modelling term dependencies within the query expansion process. This has created an increased interest in dependency-based models. State-of-the-art dependency-based approaches primarily model term associations known within structural linguistics as syntagmatic associations, which are formed when terms co-occur together more often than by chance. However, structural linguistics proposes that the meaning of a word is also dependent on its paradigmatic associations, which are formed between words that can substitute for each other without effecting the acceptability of a sentence. Given the reliance on word meanings when a user formulates their query, our approach takes the novel step of modelling both syntagmatic and paradigmatic associations within the query expansion process based on the (pseudo) relevant documents returned in web search. The results demonstrate that this approach can provide significant improvements in web re- trieval effectiveness when compared to a strong benchmark retrieval system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Localization of technology is now widely applied to the preservation and revival of the culture of indigenous peoples around the world, most commonly through the translation into indigenous languages, which has been proven to increase the adoption of technology. However, this current form of localization excludes two demographic groups, which are key to the effectiveness of localization efforts in the African context: the younger generation (under the age of thirty) with an Anglo- American cultural view who have no need or interest in their indigenous culture; and the older generation (over the age of fifty) who are very knowledgeable about their indigenous culture, but have little or no knowledge on the use of a computer. This paper presents the design of a computer game engine that can be used to provide an interface for both technology and indigenous culture learning for both generations. Four indigenous Ugandan games are analyzed and identified for their attractiveness to both generations, to both rural and urban populations, and for their propensity to develop IT skills in older generations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Teachers in the Pacific region have often signalled the need for more locally produced information texts in both the vernacular and English, to engage their readers with local content and to support literacy development across the curriculum. The Information Text Awareness Project (ITAP), initially informed by the work of Nea Stewart-Dore, has provided a means to address this need through supporting local teachers to write their own information texts. The article reports on the impact of an ITAP workshop carried out in Nadi, Fiji in 2012. Nine teacher volunteers from the project trialled the use of the texts in their classrooms with positive results in relation to student learning and belief in themselves as writers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Relational elements of language (e.g. spatial prepositions) act to direct attention to aspects of an incoming message. The listener or reader must be able to use these elements to focus and refocus attention on the mental representation being constructed. Research has shown that this type of attention control is specific to language and can be distinguished from attention control for non-relational (semantic or content) elements. Twenty-two monolinguals (18–30 years) and nineteen bilinguals (18–30 years) completed two conditions of an alternating-runs task-switching paradigm in their first language. The relational condition involved processing spatial prepositions, and the non-relational condition involved processing concrete nouns and adjectives. Overall, monolinguals had significantly larger shift costs (i.e. greater attention control burden) in the relational condition than the non-relational condition, whereas bilinguals performed similarly in both conditions. This suggests that proficiency in a second language has a positive impact on linguistic attention control in one's native language.