512 resultados para GUIDE-O (Information retrieval system)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we describe a voting mechanism for accurate named entity (NE) translation in English–Chinese question answering (QA). This mechanism involves translations from three different sources: machine translation,online encyclopaedia, and web documents. The translation with the highest number of votes is selected. We evaluated this approach using test collection, topics and assessment results from the NTCIR-8 evaluation forum. This mechanism achieved 95% accuracy in NEs translation and 0.3756 MAP in English–Chinese cross-lingual information retrieval of QA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the emergence of Web 2.0, Web users can classify Web items of their interest by using tags. Tags reflect users’ understanding to the items collected in each tag. Exploring user tagging behavior provides a promising way to understand users’ information needs. However, free and relatively uncontrolled vocabulary has its drawback in terms of lack of standardization and semantic ambiguity. Moreover, the relationships among tags have not been explored even there exist rich relationships among tags which could provide valuable information for us to better understand users. In this paper, we propose a novel approach to construct tag ontology based on the widely used general ontology WordNet to capture the semantics and the structural relationships of tags. Ambiguity of tags is a challenging problem to deal with in order to construct high quality tag ontology. We propose strategies to find the semantic meanings of tags and a strategy to disambiguate the semantics of tags based on the opinion of WordNet lexicographers. In order to evaluate the usefulness of the constructed tag ontology, in this paper we apply the extracted tag ontology in a tag recommendation experiment. We believe this is the first application of tag ontology for recommendation making. The initial result shows that by using the tag ontology to re-rank the recommended tags, the accuracy of the tag recommendation can be improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information has no value unless it is accessible. Information must be connected together so a knowledge network can then be built. Such a knowledge base is a key resource for Internet users to interlink information from documents. Information retrieval, a key technology for knowledge management, guarantees access to large corpora of unstructured text. Collaborative knowledge management systems such as Wikipedia are becoming more popular than ever; however, their link creation function is not optimized for discovering possible links in the collection and the quality of automatically generated links has never been quantified. This research begins with an evaluation forum which is intended to cope with the experiments of focused link discovery in a collaborative way as well as with the investigation of the link discovery application. The research focus was on the evaluation strategy: the evaluation framework proposal, including rules, formats, pooling, validation, assessment and evaluation has proved to be efficient, reusable for further extension and efficient for conducting evaluation. The collection-split approach is used to re-construct the Wikipedia collection into a split collection comprising single passage files. This split collection is proved to be feasible for improving relevant passages discovery and is devoted to being a corpus for focused link discovery. Following these experiments, a mobile client-side prototype built on iPhone is developed to resolve the mobile Search issue by using focused link discovery technology. According to the interview survey, the proposed mobile interactive UI does improve the experience of mobile information seeking. Based on this evaluation framework, a novel cross-language link discovery proposal using multiple text collections is developed. A dynamic evaluation approach is proposed to enhance both the collaborative effort and the interacting experience between submission and evaluation. A realistic evaluation scheme has been implemented at NTCIR for cross-language link discovery tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In information retrieval, a user's query is often not a complete representation of their real information need. The user's information need is a cognitive construction, however the use of cognitive models to perform query expansion have had little study. In this paper, we present a cognitively motivated query expansion technique that uses semantic features for use in ad hoc retrieval. This model is evaluated against a state-of-the-art query expansion technique. The results show our approach provides significant improvements in retrieval effectiveness for the TREC data sets tested.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two decades after its inception, Latent Semantic Analysis(LSA) has become part and parcel of every modern introduction to Information Retrieval. For any tool that matures so quickly, it is important to check its lore and limitations, or else stagnation will set in. We focus here on the three main aspects of LSA that are well accepted, and the gist of which can be summarized as follows: (1) that LSA recovers latent semantic factors underlying the document space, (2) that such can be accomplished through lossy compression of the document space by eliminating lexical noise, and (3) that the latter can best be achieved by Singular Value Decomposition. For each aspect we performed experiments analogous to those reported in the LSA literature and compared the evidence brought to bear in each case. On the negative side, we show that the above claims about LSA are much more limited than commonly believed. Even a simple example may show that LSA does not recover the optimal semantic factors as intended in the pedagogical example used in many LSA publications. Additionally, and remarkably deviating from LSA lore, LSA does not scale up well: the larger the document space, the more unlikely that LSA recovers an optimal set of semantic factors. On the positive side, we describe new algorithms to replace LSA (and more recent alternatives as pLSA, LDA, and kernel methods) by trading its l2 space for an l1 space, thereby guaranteeing an optimal set of semantic factors. These algorithms seem to salvage the spirit of LSA as we think it was initially conceived.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intuitively, any ‘bag of words’ approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document’s initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur’s search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Failing injectors are one of the most common faults in diesel engines. The severity of these faults could have serious effects on diesel engine operations such as engine misfire, knocking, insufficient power output or even cause a complete engine breakdown. It is thus essential to prevent such faults from occurring by monitoring the condition of these injectors. In this paper, the authors present the results of an experimental investigation on identifying the signal characteristics of a simulated incipient injector fault in a diesel engine using both in-cylinder pressure and acoustic emission (AE) techniques. A time waveform event driven synchronous averaging technique was used to minimize or eliminate the effect of engine speed variation and amplitude fluctuation. It was found that AE is an effective method to detect the simulated injector fault in both time (crank angle) and frequency (order) domains. It was also shown that the time domain in-cylinder pressure signal is a poor indicator for condition monitoring and diagnosis of the simulated injector fault due to the small effect of the simulated fault on the engine combustion process. Nevertheless, good correlations between the simulated injector fault and the lower order components of the enveloped in-cylinder pressure spectrum were found at various engine loading conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modelling the power systems load is a challenge since the load level and composition varies with time. An accurate load model is important because there is a substantial component of load dynamics in the frequency range relevant to system stability. The composition of loads need to be charaterised because the time constants of composite loads affect the damping contributions of the loads to power system oscillations, and their effects vary with the time of the day, depending on the mix of motors loads. This chapter has two main objectives: 1) describe the load modelling in small signal using on-line measurements; and 2) present a new approach to develop models that reflect the load response to large disturbances. Small signal load characterisation based on on-line measurements allows predicting the composition of load with improved accuracy compared with post-mortem or classical load models. Rather than a generic dynamic model for small signal modelling of the load, an explicit induction motor is used so the performance for larger disturbances can be more reliably inferred. The relation between power and frequency/voltage can be explicitly formulated and the contribution of induction motors extracted. One of the main features of this work is the induction motor component can be associated to nominal powers or equivalent motors

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discovering proper search intents is a vi- tal process to return desired results. It is constantly a hot research topic regarding information retrieval in recent years. Existing methods are mainly limited by utilizing context-based mining, query expansion, and user profiling techniques, which are still suffering from the issue of ambiguity in search queries. In this pa- per, we introduce a novel ontology-based approach in terms of a world knowledge base in order to construct personalized ontologies for identifying adequate con- cept levels for matching user search intents. An iter- ative mining algorithm is designed for evaluating po- tential intents level by level until meeting the best re- sult. The propose-to-attempt approach is evaluated in a large volume RCV1 data set, and experimental results indicate a distinct improvement on top precision after compared with baseline models.