947 resultados para Document Ranking


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Domain specific information retrieval has become in demand. Not only domain experts, but also average non-expert users are interested in searching domain specific (e.g., medical and health) information from online resources. However, a typical problem to average users is that the search results are always a mixture of documents with different levels of readability. Non-expert users may want to see documents with higher readability on the top of the list. Consequently the search results need to be re-ranked in a descending order of readability. It is often not practical for domain experts to manually label the readability of documents for large databases. Computational models of readability needs to be investigated. However, traditional readability formulas are designed for general purpose text and insufficient to deal with technical materials for domain specific information retrieval. More advanced algorithms such as textual coherence model are computationally expensive for re-ranking a large number of retrieved documents. In this paper, we propose an effective and computationally tractable concept-based model of text readability. In addition to textual genres of a document, our model also takes into account domain specific knowledge, i.e., how the domain-specific concepts contained in the document affect the document’s readability. Three major readability formulas are proposed and applied to health and medical information retrieval. Experimental results show that our proposed readability formulas lead to remarkable improvements in terms of correlation with users’ readability ratings over four traditional readability measures.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes about an English-Malayalam Cross-Lingual Information Retrieval system. The system retrieves Malayalam documents in response to query given in English or Malayalam. Thus monolingual information retrieval is also supported in this system. Malayalam is one of the most prominent regional languages of Indian subcontinent. It is spoken by more than 37 million people and is the native language of Kerala state in India. Since we neither had any full-fledged online bilingual dictionary nor any parallel corpora to build the statistical lexicon, we used a bilingual dictionary developed in house for translation. Other language specific resources like Malayalam stemmer, Malayalam morphological root analyzer etc developed in house were used in this work

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper is twofold: firstly, to carry out a theoreticalreview of the most recent stated preference techniques used foreliciting consumers preferences and, secondly, to compare the empiricalresults of two dierent stated preference discrete choice approaches.They dier in the measurement scale for the dependent variable and,therefore, in the estimation method, despite both using a multinomiallogit. One of the approaches uses a complete ranking of full-profiles(contingent ranking), that is, individuals must rank a set ofalternatives from the most to the least preferred, and the other usesa first-choice rule in which individuals must select the most preferredoption from a choice set (choice experiment). From the results werealize how important the measurement scale for the dependent variablebecomes and, to what extent, procedure invariance is satisfied.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Este documento analiza algunas estrategias de internacionalización aplicadas a casos de estudio de la empresa Colombiana de Petróleos, Ecopetrol con el fin de encontrar las razones que llevan a dicha compañía a ser un referente de crecimiento. Para ello, en primer lugar se explica el contexto en el cual la empresa se desarrolla; y pasa de ser una empresa estatal, para ocupar el puesto 280 dentro del Ranking Global Fortune 500. En segundo lugar el trabajo se centra en las diferentes estrategias y teorías de internacionalización en las cuales Ecopetrol se basa para lograr su éxito. Finalmente se realiza un análisis financiero con base en datos presentados por la herramienta bloomberg y entrevistas realizadas a especialistas en temas bursátiles y de internacionalización.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Com o presente trabalho pretendemos compreender eventuais relações entre a posição de uma escola no ranking dos exames nacionais do 9.º ano do ensino básico e a qualidade do serviço educativo prestado. O estudo decorreu em dois agrupamentos de escolas com posições diferenciadas nas listas de ordenação (rankings) publicadas no ano de 2013. Recolhemos informação através de entrevista aos diretores dos órgãos de gestão, análise documental e inquérito por questionário. Adotámos metodologia qualitativa e quantitativa cujos dados foram triangulados e analisados à luz do quadro teórico. A posição bastante diferenciada entre os dois agrupamentos no ranking (249.º e 848.º, respetivamente) não parece estar relacionado com a prestação do serviço educativo, para além de a classificação ser a mesma nos relatórios de avaliação externa, as diferenças identificadas através dos questionários e das entrevistas são pontuais e pouco relevantes reforçando que efetivamente a posição no ranking diz muito pouco sobre o trabalho realizado nas escolas, sobre as suas dinâmicas e lógicas de ação; Abstract: Ranking and educational quality. An (un)likely relationship? A study in two public schools With this study we aim to understand possible links between the ranking position of two different school based on the 9th grade’s national exams results and the quality of educational services provided. The study took place in two groups of schools with different ranking positions published in 2013. We collect information through interviews to the directors of the management bodies, document analysis and questionnaire survey. We adopted qualitative and quantitative methodology and data were triangulated and analyzed in the light of the theoretical framework. The rather unique position between the two groups in the rankings (249 and 848, respectively) does not seem to be related to the provision of educational services, as well as the classification is the same in the external evaluation reports, the differences identified by questionnaires and interviews are timely and very relevant stressing that effectively ranking position says very little about the work done in schools, on its dynamics and logics of action.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Universidade Estadual de Campinas. Faculdade de Educação Física

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-density polyethylene resins have increasingly been used in the production of pipes for water- and gas-pressurized distribution systems and are expected to remain in service for several years, but they eventually fail prematurely by creep fracture. Usual standard methods used to rank resins in terms of their resistance to fracture are expensive and non-practical for quality control purposes, justifying the search for alternative methods. Essential work of fracture (EWF) method provides a relatively simple procedure to characterize the fracture behavior of ductile polymers, such as polyethylene resins. In the present work, six resins were analyzed using the EWF methodology. The results show that the plastic work dissipation factor, beta w(p), is the most reliable parameter to evaluate the performance. Attention must be given to specimen preparation that might result in excessive dispersion in the results, especially for the essential work of fracture w(e).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Application of geographic information system (GIS) and global positioning system (GPS) technology in the Hlabisa community-based tuberculosis treatment programme documents the increase in accessibility to treatment after the expansion of the service from health facilities to include community workers and volunteers.