844 resultados para MEDINA, JOSE TORIBIO
Resumo:
In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.
Resumo:
Ranking documents according to the Probability Ranking Principle has been theoretically shown to guarantee optimal retrieval effectiveness in tasks such as ad hoc document retrieval. This ranking strategy assumes independence among document relevance assessments. This assumption, however, often does not hold, for example in the scenarios where redundancy in retrieved documents is of major concern, as it is the case in the sub–topic retrieval task. In this chapter, we propose a new ranking strategy for sub–topic retrieval that builds upon the interdependent document relevance and topic–oriented models. With respect to the topic– oriented model, we investigate both static and dynamic clustering techniques, aiming to group topically similar documents. Evidence from clusters is then combined with information about document dependencies to form a new document ranking. We compare and contrast the proposed method against state–of–the–art approaches, such as Maximal Marginal Relevance, Portfolio Theory for Information Retrieval, and standard cluster–based diversification strategies. The empirical investigation is performed on the ImageCLEF 2009 Photo Retrieval collection, where images are assessed with respect to sub–topics of a more general query topic. The experimental results show that our approaches outperform the state–of–the–art strategies with respect to a number of diversity measures.
Resumo:
The aim of this paper is to investigate the role of emotion features in diversifying document rankings to improve the effectiveness of Information Retrieval (IR) systems. For this purpose, two approaches are proposed to consider emotion features for diversification, and they are empirically tested on the TREC 678 Interactive Track collection. The results show that emotion features are capable of enhancing retrieval effectiveness.
Resumo:
For TREC Crowdsourcing 2011 (Stage 2) we propose a networkbased approach for assigning an indicative measure of worker trustworthiness in crowdsourced labelling tasks. Workers, the gold standard and worker/gold standard agreements are modelled as a network. For the purpose of worker trustworthiness assignment, a variant of the PageRank algorithm, named TurkRank, is used to adaptively combine evidence that suggests worker trustworthiness, i.e., agreement with other trustworthy co-workers and agreement with the gold standard. A single parameter controls the importance of co-worker agreement versus gold standard agreement. The TurkRank score calculated for each worker is incorporated with a worker-weighted mean label aggregation.
Resumo:
In this paper we define two models of users that require diversity in search results; these models are theoretically grounded in the notion of intrinsic and extrinsic diversity. We then examine Intent-Aware Expected Reciprocal Rank (ERR-IA), one of the official measures used to assess diversity in TREC 2011-12, with respect to the proposed user models. By analyzing ranking preferences as expressed by the user models and those estimated by ERR-IA, we investigate whether ERR-IA assesses document rankings according to the requirements of the diversity retrieval task expressed by the two models. Empirical results demonstrate that ERR-IA neglects query-intents coverage by attributing excessive importance to redundant relevant documents. ERR-IA behavior is contrary to the user models that require measures to first assess diversity through the coverage of intents, and then assess the redundancy of relevant intents. Furthermore, diversity should be considered separately from document relevance and the documents positions in the ranking.
Resumo:
In the field of information retrieval (IR), researchers and practitioners are often faced with a demand for valid approaches to evaluate the performance of retrieval systems. The Cranfield experiment paradigm has been dominant for the in-vitro evaluation of IR systems. Alternative to this paradigm, laboratory-based user studies have been widely used to evaluate interactive information retrieval (IIR) systems, and at the same time investigate users’ information searching behaviours. Major drawbacks of laboratory-based user studies for evaluating IIR systems include the high monetary and temporal costs involved in setting up and running those experiments, the lack of heterogeneity amongst the user population and the limited scale of the experiments, which usually involve a relatively restricted set of users. In this paper, we propose an alternative experimental methodology to laboratory-based user studies. Our novel experimental methodology uses a crowdsourcing platform as a means of engaging study participants. Through crowdsourcing, our experimental methodology can capture user interactions and searching behaviours at a lower cost, with more data, and within a shorter period than traditional laboratory-based user studies, and therefore can be used to assess the performances of IIR systems. In this article, we show the characteristic differences of our approach with respect to traditional IIR experimental and evaluation procedures. We also perform a use case study comparing crowdsourcing-based evaluation with laboratory-based evaluation of IIR systems, which can serve as a tutorial for setting up crowdsourcing-based IIR evaluations.
Resumo:
A pseudonym provides anonymity by protecting the identity of a legitimate user. A user with a pseudonym can interact with an unknown entity and be confident that his/her identity is secret even if the other entity is dishonest. In this work, we present a system that allows users to create pseudonyms from a trusted master public-secret key pair. The proposed system is based on the intractability of factoring and finding square roots of a quadratic residue modulo a composite number, where the composite number is a product of two large primes. Our proposal is different from previously published pseudonym systems, as in addition to standard notion of protecting privacy of an user, our system offers colligation between seemingly independent pseudonyms. This new property when combined with a trusted platform that stores a master secret key is extremely beneficial to an user as it offers a convenient way to generate a large number of pseudonyms using relatively small storage.
Resumo:
Implementation of an electronic tendering (e-tendering) systems requires careful attention to the needs of the system and its various participants. Fairness in an e-tendering is of utmost importance. Current proposals and implementations do not provide fairness and thus, are vulnerable to collusion and favourism. Dishonest participants, either the principal or tenderer may collude to alter or view competing tenders which would give the favoured tenderer a greater chance of winning the contract. This paper proposes an e-tendering system that is secure and fair to all participants. We employ the techniques of anonymous token system along with signed commitment approach to achieve a publicly verifiable fair e-tendering protocol. We also provide an analysis of the protocol that confirms the security of our proposal against security goals for an e-tendering system.
Resumo:
High quality, micron-sized interpenetrating grains of MgB2 with high density are produced at low temperatures (~420oC < T < ~500oC) under autogenous pressure by pre-mixing Mg powder and NaBH4 and heating in an Inconel 601 alloy reactor for 5−15 hours. Optimum production of MgB2 with yields greater than 75% occurs for autogenous pressure in the range 1.0 MPa to 2.0 MPa with the reactor at ~500oC. Autogenous pressure is induced by the decomposition of NaBH4 in the presence of Mg and/or other Mg-based compounds. The morphology, transition temperature and magnetic properties of MgB2 are dependent on the heating regime. Significant improvement in physical properties accrues when the reactor temperature is held at 250oC for >20minutes prior to a hold at 500oC.
Resumo:
Entomological surveillance and control are essential to the management of dengue fever (DF). Hence, understanding the spatial and temporal patterns of DF vectors, Aedes (Stegomyia) aegypti (L.) and Ae. (Stegomyia) albopictus (Skuse), is paramount. In the Philippines, resources are limited and entomological surveillance and control are generally commenced during epidemics, when transmission is difficult to control. Recent improvements in spatial epidemiological tools and methods offer opportunities to explore more efficient DF surveillance and control solutions: however, there are few examples in the literature from resource-poor settings. The objectives of this study were to: (i) explore spatial patterns of Aedes populations and (ii) predict areas of high and low vector density to inform DF control in San Jose village, Muntinlupa city, Philippines. Fortnightly, adult female Aedes mosquitoes were collected from 50 double-sticky ovitraps (SOs) located in San Jose village for the period June-November 2011. Spatial clustering analysis was performed to identify high and low density clusters of Ae. aegypti and Ae. albopictus mosquitoes. Spatial autocorrelation was assessed by examination of semivariograms, and ordinary kriging was undertaken to create a smoothed surface of predicted vector density in the study area. Our results show that both Ae. aegypti and Ae. albopictus were present in San Jose village during the study period. However, one Aedes species was dominant in a given geographic area at a time, suggesting differing habitat preferences and interspecies competition between vectors. Density maps provide information to direct entomological control activities and advocate the development of geographically enhanced surveillance and control systems to improve DF management in the Philippines.
Resumo:
In the TREC Web Diversity track, novelty-biased cumulative gain (α-NDCG) is one of the official measures to assess retrieval performance of IR systems. The measure is characterised by a parameter, α, the effect of which has not been thoroughly investigated. We find that common settings of α, i.e. α=0.5, may prevent the measure from behaving as desired when evaluating result diversification. This is because it excessively penalises systems that cover many intents while it rewards those that redundantly cover only few intents. This issue is crucial since it highly influences systems at top ranks. We revisit our previously proposed threshold, suggesting α be set on a query-basis. The intuitiveness of the measure is then studied by examining actual rankings from TREC 09-10 Web track submissions. By varying α according to our query-based threshold, the discriminative power of α-NDCG is not harmed and in fact, our approach improves α-NDCG's robustness. Experimental results show that the threshold for α can turn the measure to be more intuitive than using its common settings.
Resumo:
Novelty-biased cumulative gain (α-NDCG) has become the de facto measure within the information retrieval (IR) community for evaluating retrieval systems in the context of sub-topic retrieval. Setting the incorrect value of parameter α in α-NDCG prevents the measure from behaving as desired in particular circumstances. In fact, when α is set according to common practice (i.e. α = 0.5), the measure favours systems that promote redundant relevant sub-topics rather than provide novel relevant ones. Recognising this characteristic of the measure is important because it affects the comparison and the ranking of retrieval systems. We propose an approach to overcome this problem by defining a safe threshold for the value of α on a query basis. Moreover, we study its impact on system rankings through a comprehensive simulation.
Resumo:
In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.