5 resultados para ROC
em Indian Institute of Science - Bangalore - Índia
Resumo:
Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP(mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain)in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization.The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.
Resumo:
Land cover (LC) and land use (LU) dynamics induced by human and natural processes play a major role in global as well as regional patterns of landscapes influencing biodiversity, hydrology, ecology and climate. Changes in LC features resulting in forest fragmentations have posed direct threats to biodiversity, endangering the sustainability of ecological goods and services. Habitat fragmentation is of added concern as the residual spatial patterns mitigate or exacerbate edge effects. LU dynamics are obtained by classifying temporal remotely sensed satellite imagery of different spatial and spectral resolutions. This paper reviews five different image classification algorithms using spatio-temporal data of a temperate watershed in Himachal Pradesh, India. Gaussian Maximum Likelihood classifier was found to be apt for analysing spatial pattern at regional scale based on accuracy assessment through error matrix and ROC (receiver operating characteristic) curves. The LU information thus derived was then used to assess spatial changes from temporal data using principal component analysis and correspondence analysis based image differencing. The forest area dynamics was further studied by analysing the different types of fragmentation through forest fragmentation models. The computed forest fragmentation and landscape metrics show a decline of interior intact forests with a substantial increase in patch forest during 1972-2007.
Resumo:
The performance of postdetection integration (PDI) techniques for the detection of Global Navigation Satellite Systems (GNSS) signals in the presence of uncertainties in frequency offsets, noise variance, and unknown data-bits is studied. It is shown that the conventional PDI techniques are generally not robust to uncertainty in the data-bits and/or the noise variance. Two new modified PDI techniques are proposed, and they are shown to be robust to these uncertainties. The receiver operating characteristics (ROC) and sample complexity performance of the PDI techniques in the presence of model uncertainties are analytically derived. It is shown that the proposed methods significantly outperform existing methods, and hence they could become increasingly important as the GNSS receivers attempt to push the envelope on the minimum signal-to-noise ratio (SNR) for reliable detection.
Resumo:
The problem of bipartite ranking, where instances are labeled positive or negative and the goal is to learn a scoring function that minimizes the probability of mis-ranking a pair of positive and negative instances (or equivalently, that maximizes the area under the ROC curve), has been widely studied in recent years. A dominant theoretical and algorithmic framework for the problem has been to reduce bipartite ranking to pairwise classification; in particular, it is well known that the bipartite ranking regret can be formulated as a pairwise classification regret, which in turn can be upper bounded using usual regret bounds for classification problems. Recently, Kotlowski et al. (2011) showed regret bounds for bipartite ranking in terms of the regret associated with balanced versions of the standard (non-pairwise) logistic and exponential losses. In this paper, we show that such (non-pairwise) surrogate regret bounds for bipartite ranking can be obtained in terms of a broad class of proper (composite) losses that we term as strongly proper. Our proof technique is much simpler than that of Kotlowski et al. (2011), and relies on properties of proper (composite) losses as elucidated recently by Reid and Williamson (2010, 2011) and others. Our result yields explicit surrogate bounds (with no hidden balancing terms) in terms of a variety of strongly proper losses, including for example logistic, exponential, squared and squared hinge losses as special cases. An important consequence is that standard algorithms minimizing a (non-pairwise) strongly proper loss, such as logistic regression and boosting algorithms (assuming a universal function class and appropriate regularization), are in fact consistent for bipartite ranking; moreover, our results allow us to quantify the bipartite ranking regret in terms of the corresponding surrogate regret. We also obtain tighter surrogate bounds under certain low-noise conditions via a recent result of Clemencon and Robbiano (2011).
Resumo:
The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).