39 resultados para Relevance ranking
Resumo:
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞-norm extreme of the lp-norm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on real-world data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
Abzymes are immunoglobulins endowed with enzymatic activities. The catalytic activity of an abzyme resides in the variable domain of the antibody, which is constituted by the close spatial arrangement of amino acid residues involved in catalysis. The origin of abzymes is conferred by the innate diversity of the immunoglobulin gene repertoire. Under deregulated immune conditions, as in autoimmune diseases, the generation of abzymes to self-antigens could be deleterious. Technical advancement in the ability to generate monoclonal antibodies has been exploited in the generation of abzymes with defined specificities and activities. Therapeutic applications of abzymes are being investigated with the generation of monoclonal abzymes against several pathogenesis-associated antigens. Here, we review the different contexts in which abzymes are generated, and we discuss the relevance of monoclonal abzymes for the treatment of human diseases.
Resumo:
Estimating program worst case execution time(WCET) accurately and efficiently is a challenging task. Several programs exhibit phase behavior wherein cycles per instruction (CPI) varies in phases during execution. Recent work has suggested the use of phases in such programs to estimate WCET with minimal instrumentation. However the suggested model uses a function of mean CPI that has no probabilistic guarantees. We propose to use Chebyshev's inequality that can be applied to any arbitrary distribution of CPI samples, to probabilistically bound CPI of a phase. Applying Chebyshev's inequality to phases that exhibit high CPI variation leads to pessimistic upper bounds. We propose a mechanism that refines such phases into sub-phases based on program counter(PC) signatures collected using profiling and also allows the user to control variance of CPI within a sub-phase. We describe a WCET analyzer built on these lines and evaluate it with standard WCET and embedded benchmark suites on two different architectures for three chosen probabilities, p={0.9, 0.95 and 0.99}. For p= 0.99, refinement based on PC signatures alone, reduces average pessimism of WCET estimate by 36%(77%) on Arch1 (Arch2). Compared to Chronos, an open source static WCET analyzer, the average improvement in estimates obtained by refinement is 5%(125%) on Arch1 (Arch2). On limiting variance of CPI within a sub-phase to {50%, 10%, 5% and 1%} of its original value, average accuracy of WCET estimate improves further to {9%, 11%, 12% and 13%} respectively, on Arch1. On Arch2, average accuracy of WCET improves to 159% when CPI variance is limited to 50% of its original value and improvement is marginal beyond that point.
Resumo:
Eleven GCMs (BCCR-BCCM2.0, INGV-ECHAM4, GFDL2.0, GFDL2.1, GISS, IPSL-CM4, MIROC3, MRI-CGCM2, NCAR-PCMI, UKMO-HADCM3 and UKMO-HADGEM1) were evaluated for India (covering 73 grid points of 2.5 degrees x 2.5 degrees) for the climate variable `precipitation rate' using 5 performance indicators. Performance indicators used were the correlation coefficient, normalised root mean square error, absolute normalised mean bias error, average absolute relative error and skill score. We used a nested bias correction methodology to remove the systematic biases in GCM simulations. The Entropy method was employed to obtain weights of these 5 indicators. Ranks of the 11 GCMs were obtained through a multicriterion decision-making outranking method, PROMETHEE-2 (Preference Ranking Organisation Method of Enrichment Evaluation). An equal weight scenario (assigning 0.2 weight for each indicator) was also used to rank the GCMs. An effort was also made to rank GCMs for 4 river basins (Godavari, Krishna, Mahanadi and Cauvery) in peninsular India. The upper Malaprabha catchment in Karnataka, India, was chosen to demonstrate the Entropy and PROMETHEE-2 methods. The Spearman rank correlation coefficient was employed to assess the association between the ranking patterns. Our results suggest that the ensemble of GFDL2.0, MIROC3, BCCR-BCCM2.0, UKMO-HADCM3, MPIECHAM4 and UKMO-HADGEM1 is suitable for India. The methodology proposed can be extended to rank GCMs for any selected region.
Resumo:
We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.
Resumo:
The problem of bipartite ranking, where instances are labeled positive or negative and the goal is to learn a scoring function that minimizes the probability of mis-ranking a pair of positive and negative instances (or equivalently, that maximizes the area under the ROC curve), has been widely studied in recent years. A dominant theoretical and algorithmic framework for the problem has been to reduce bipartite ranking to pairwise classification; in particular, it is well known that the bipartite ranking regret can be formulated as a pairwise classification regret, which in turn can be upper bounded using usual regret bounds for classification problems. Recently, Kotlowski et al. (2011) showed regret bounds for bipartite ranking in terms of the regret associated with balanced versions of the standard (non-pairwise) logistic and exponential losses. In this paper, we show that such (non-pairwise) surrogate regret bounds for bipartite ranking can be obtained in terms of a broad class of proper (composite) losses that we term as strongly proper. Our proof technique is much simpler than that of Kotlowski et al. (2011), and relies on properties of proper (composite) losses as elucidated recently by Reid and Williamson (2010, 2011) and others. Our result yields explicit surrogate bounds (with no hidden balancing terms) in terms of a variety of strongly proper losses, including for example logistic, exponential, squared and squared hinge losses as special cases. An important consequence is that standard algorithms minimizing a (non-pairwise) strongly proper loss, such as logistic regression and boosting algorithms (assuming a universal function class and appropriate regularization), are in fact consistent for bipartite ranking; moreover, our results allow us to quantify the bipartite ranking regret in terms of the corresponding surrogate regret. We also obtain tighter surrogate bounds under certain low-noise conditions via a recent result of Clemencon and Robbiano (2011).
Resumo:
Eleven general circulation models/global climate models (GCMs) - BCCR-BCCM2.0, INGV-ECHAM4, GFDL2.0, GFDL2.1, GISS, IPSL-CM4, MIROC3, MRI-CGCM2, NCAR-PCMI, UKMO-HADCM3 and UKMO-HADGEM1 - are evaluated for Indian climate conditions using the performance indicator, skill score (SS). Two climate variables, temperature T (at three levels, i.e. 500, 700, 850 mb) and precipitation rate (Pr) are considered resulting in four SS-based evaluation criteria (T500, T700, T850, Pr). The multicriterion decision-making method, technique for order preference by similarity to an ideal solution, is applied to rank 11 GCMs. Efforts are made to rank GCMs for the Upper Malaprabha catchment and two river basins, namely, Krishna and Mahanadi (covered by 17 and 15 grids of size 2.5 degrees x 2.5 degrees, respectively). Similar efforts are also made for India (covered by 73 grid points of size 2.5 degrees x 2.5 degrees) for which an ensemble of GFDL2.0, INGV-ECHAM4, UKMO-HADCM3, MIROC3, BCCR-BCCM2.0 and GFDL2.1 is found to be suitable. It is concluded that the proposed methodology can be applied to similar situations with ease.
Resumo:
Halogenated nucleosides can be incorporated into the newly synthesized DNA of replicating cells and therefore are commonly used in the detection of proliferating cells in living tissues. Dehalogenation of these modified nucleosides is one of the key pathways involved in DNA repair mediated by the uracil-DNA glycosylase. Herein, we report the first example of a selenium-mediated dehalogenation of halogenated nucleosides. We also show that the mechanism for the debromination is remarkably different from that of deiodination and that the presence of a ribose or deoxyribose moiety in the nucleosides facilitates the deiodination. The results described herein should help in understanding the metabolism of halogenated nucleosides in DNA and RNA.