890 resultados para 380202 Computational Linguistics
Resumo:
It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
Resumo:
This work discusses a proposition for organizing the lexical items from the conceptual domain labeled THE EMBROIDERY INDUSTRY OF IBITINGA in terms of a natural ontology. It also aims to establish the alignment between this ontology and the bases WordNet.Pr and WordNet.Br. © 2009 IEEE.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
In computational linguistics, information retrieval and applied cognition, words and concepts are often represented as vectors in high dimensional spaces computed from a corpus of text. These high dimensional spaces are often referred to as Semantic Spaces. We describe a novel and efficient approach to computing these semantic spaces via the use of complex valued vector representations. We report on the practical implementation of the proposed method and some associated experiments. We also briefly discuss how the proposed system relates to previous theoretical work in Information Retrieval and Quantum Mechanics and how the notions of probability, logic and geometry are integrated within a single Hilbert space representation. In this sense the proposed system has more general application and gives rise to a variety of opportunities for future research.
Resumo:
In this paper, we describe a voting mechanism for accurate named entity (NE) translation in English–Chinese question answering (QA). This mechanism involves translations from three different sources: machine translation,online encyclopaedia, and web documents. The translation with the highest number of votes is selected. We evaluated this approach using test collection, topics and assessment results from the NTCIR-8 evaluation forum. This mechanism achieved 95% accuracy in NEs translation and 0.3756 MAP in English–Chinese cross-lingual information retrieval of QA.
Resumo:
This paper presents some theoretical perspectives that might inform the design and development of information and communications technology (ICT) tools to support integrated (in-session) reflection and deep learning during e-learning. The role of why questioning provides the focus of discussion and is informed by the literature on critical thinking, sense-making, and reflective practice, as well as recent developments in knowledge management, computational linguistics and automated question generation. It is argued that there exists enormous scope for the development of ICT scaffolding targeted at supporting reflective practice during e-learning. The first generations of e-Portfolio tools provide some evidence for the significance of the benefits of integrating reflection into the design of ICT systems; however, following the review of a number of such systems, as well as a range of ICT applications and services designed to support e-learning, it is argued that the scope of implementation is limited.
Resumo:
This paper develops a framework for classifying term dependencies in query expansion with respect to the role terms play in structural linguistic associations. The framework is used to classify and compare the query expansion terms produced by the unigram and positional relevance models. As the unigram relevance model does not explicitly model term dependencies in its estimation process it is often thought to ignore dependencies that exist between words in natural language. The framework presented in this paper is underpinned by two types of linguistic association, namely syntagmatic and paradigmatic associations. It was found that syntagmatic associations were a more prevalent form of linguistic association used in query expansion. Paradoxically, it was the unigram model that exhibited this association more than the positional relevance model. This surprising finding has two potential implications for information retrieval models: (1) if linguistic associations underpin query expansion, then a probabilistic term dependence assumption based on position is inadequate for capturing them; (2) the unigram relevance model captures more term dependency information than its underlying theoretical model suggests, so its normative position as a baseline that ignores term dependencies should perhaps be reviewed.
Resumo:
This paper presents a combined structure for using real, complex, and binary valued vectors for semantic representation. The theory, implementation, and application of this structure are all significant. For the theory underlying quantum interaction, it is important to develop a core set of mathematical operators that describe systems of information, just as core mathematical operators in quantum mechanics are used to describe the behavior of physical systems. The system described in this paper enables us to compare more traditional quantum mechanical models (which use complex state vectors), alongside more generalized quantum models that use real and binary vectors. The implementation of such a system presents fundamental computational challenges. For large and sometimes sparse datasets, the demands on time and space are different for real, complex, and binary vectors. To accommodate these demands, the Semantic Vectors package has been carefully adapted and can now switch between different number types comparatively seamlessly. This paper describes the key abstract operations in our semantic vector models, and describes the implementations for real, complex, and binary vectors. We also discuss some of the key questions that arise in the field of quantum interaction and informatics, explaining how the wide availability of modelling options for different number fields will help to investigate some of these questions.
Resumo:
This paper presents some theoretical and interdisciplinary perspectives that might inform the design and development of information and communications technology (ICT) tools to support reflective inquiry during e-learning. The role of why-questioning provides the focus of discussion and is guided by literature that spans critical thinking, inquiry-based and problem-based learning, storytelling, sense-making, and reflective practice, as well as knowledge management, information science, computational linguistics and automated question generation. It is argued that there exists broad scope for the development of ICT scaffolding targeted at supporting reflective inquiry duringe-learning. Evidence suggests that wiki-based learning tasks, digital storytelling, and e-portfolio tools demonstrate the value of accommodating reflective practice and explanatory content in supporting learning; however, it is also argued that the scope for ICT tools that directly support why-questioning as a key aspect of reflective inquiry is a frontier ready for development.
Resumo:
INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2013 evaluation campaign, which consisted of four activities addressing three themes: searching professional and user generated data (Social Book Search track); searching structured or semantic data (Linked Data track); and focused retrieval (Snippet Retrieval and Tweet Contextualization tracks). INEX 2013 was an exciting year for INEX in which we consolidated the collaboration with (other activities in) CLEF and for the second time ran our workshop as part of the CLEF labs in order to facilitate knowledge transfer between the evaluation forums. This paper gives an overview of all the INEX 2013 tracks, their aims and task, the built test-collections, and gives an initial analysis of the results
Resumo:
INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2014 evaluation campaign, which consisted of three tracks: The Interactive Social Book Search Track investigated user information seeking behavior when interacting with various sources of information, for realistic task scenarios, and how the user interface impacts search and the search experience. The Social Book Search Track investigated the relative value of authoritative metadata and user-generated content for search and recommendation using a test collection with data from Amazon and LibraryThing, including user profiles and personal catalogues. The Tweet Contextualization Track investigated tweet contextualization, helping a user to understand a tweet by providing him with a short background summary generated from relevant Wikipedia passages aggregated into a coherent summary. INEX 2014 was an exciting year for INEX in which we for the third time ran our workshop as part of the CLEF labs. This paper gives an overview of all the INEX 2014 tracks, their aims and task, the built test-collections, the participants, and gives an initial analysis of the results.
Resumo:
Experiences showed that developing business applications that base on text analysis normally requires a lot of time and expertise in the field of computer linguistics. Several approaches of integrating text analysis systems with business applications have been proposed, but so far there has been no coordinated approach which would enable building scalable and flexible applications of text analysis in enterprise scenarios. In this paper, a service-oriented architecture for text processing applications in the business domain is introduced. It comprises various groups of processing components and knowledge resources. The architecture, created as a result of our experiences with building natural language processing applications in business scenarios, allows for the reuse of text analysis and other components, and facilitates the development of business applications. We verify our approach by showing how the proposed architecture can be applied to create a text analytics enabled business application that addresses a concrete business scenario. © 2010 IEEE.
Resumo:
This article presents and evaluates Quantum Inspired models of Target Activation using Cued-Target Recall Memory Modelling over multiple sources of Free Association data. Two components were evaluated: Whether Quantum Inspired models of Target Activation would provide a better framework than their classical psychological counterparts and how robust these models are across the different sources of Free Association data. In previous work, a formal model of cued-target recall did not exist and as such Target Activation was unable to be assessed directly. Further to that, the data source used was suspected of suffering from temporal and geographical bias. As a consequence, Target Activation was measured against cued-target recall data as an approximation of performance. Since then, a formal model of cued-target recall (PIER3) has been developed [10] with alternative sources of data also becoming available. This allowed us to directly model target activation in cued-target recall with human cued-target recall pairs and use multiply sources of Free Association Data. Featural Characteristics known to be important to Target Activation were measured for each of the data sources to identify any major differences that may explain variations in performance for each of the models. Each of the activation models were used in the PIER3 memory model for each of the data sources and was benchmarked against cued-target recall pairs provided by the University of South Florida (USF). Two methods where used to evaluate performance. The first involved measuring the divergence between the sets of results using the Kullback Leibler (KL) divergence with the second utilizing a previous statistical analysis of the errors [9]. Of the three sources of data, two were sourced from human subjects being the USF Free Association Norms and the University of Leuven (UL) Free Association Networks. The third was sourced from a new method put forward by Galea and Bruza, 2015 in which pseudo Free Association Networks (Corpus Based Association Networks - CANs) are built using co-occurrence statistics on large text corpus. It was found that the Quantum Inspired Models of Target Activation not only outperformed the classical psychological model but was more robust across a variety of data sources.