981 resultados para Ligante RANK


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A known limitation of the Probability Ranking Principle (PRP) is that it does not cater for dependence between documents. Recently, the Quantum Probability Ranking Principle (QPRP) has been proposed, which implicitly captures dependencies between documents through “quantum interference”. This paper explores whether this new ranking principle leads to improved performance for subtopic retrieval, where novelty and diversity is required. In a thorough empirical investigation, models based on the PRP, as well as other recently proposed ranking strategies for subtopic retrieval (i.e. Maximal Marginal Relevance (MMR) and Portfolio Theory(PT)), are compared against the QPRP. On the given task, it is shown that the QPRP outperforms these other ranking strategies. And unlike MMR and PT, one of the main advantages of the QPRP is that no parameter estimation/tuning is required; making the QPRP both simple and effective. This research demonstrates that the application of quantum theory to problems within information retrieval can lead to significant improvements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we define two models of users that require diversity in search results; these models are theoretically grounded in the notion of intrinsic and extrinsic diversity. We then examine Intent-Aware Expected Reciprocal Rank (ERR-IA), one of the official measures used to assess diversity in TREC 2011-12, with respect to the proposed user models. By analyzing ranking preferences as expressed by the user models and those estimated by ERR-IA, we investigate whether ERR-IA assesses document rankings according to the requirements of the diversity retrieval task expressed by the two models. Empirical results demonstrate that ERR-IA neglects query-intents coverage by attributing excessive importance to redundant relevant documents. ERR-IA behavior is contrary to the user models that require measures to first assess diversity through the coverage of intents, and then assess the redundancy of relevant intents. Furthermore, diversity should be considered separately from document relevance and the documents positions in the ranking.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Discounted Cumulative Gain (DCG) is a well-known ranking evaluation measure for models built with multiple relevance graded data. By handling tagging data used in recommendation systems as an ordinal relevance set of {negative,null,positive}, we propose to build a DCG based recommendation model. We present an efficient and novel learning-to-rank method by optimizing DCG for a recommendation model using the tagging data interpretation scheme. Evaluating the proposed method on real-world datasets, we demonstrate that the method is scalable and outperforms the benchmarking methods by generating a quality top-N item recommendation list.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To classify each stage for a progressing disease such as Alzheimer’s disease is a key issue for the disease prevention and treatment. In this study, we derived structural brain networks from diffusion-weighted MRI using whole-brain tractography since there is growing interest in relating connectivity measures to clinical, cognitive, and genetic data. Relatively little work has usedmachine learning to make inferences about variations in brain networks in the progression of the Alzheimer’s disease. Here we developed a framework to utilize generalized low rank approximations of matrices (GLRAM) and modified linear discrimination analysis for unsupervised feature learning and classification of connectivity matrices. We apply the methods to brain networks derived from DWI scans of 41 people with Alzheimer’s disease, 73 people with EMCI, 38 people with LMCI, 47 elderly healthy controls and 221 young healthy controls. Our results show that this new framework can significantly improve classification accuracy when combining multiple datasets; this suggests the value of using data beyond the classification task at hand to model variations in brain connectivity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A smoothed rank-based procedure is developed for the accelerated failure time model to overcome computational issues. The proposed estimator is based on an EM-type procedure coupled with the induced smoothing. "The proposed iterative approach converges provided the initial value is based on a consistent estimator, and the limiting covariance matrix can be obtained from a sandwich-type formula. The consistency and asymptotic normality of the proposed estimator are also established. Extensive simulations show that the new estimator is not only computationally less demanding but also more reliable than the other existing estimators.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ordinal qualitative data are often collected for phenotypical measurements in plant pathology and other biological sciences. Statistical methods, such as t tests or analysis of variance, are usually used to analyze ordinal data when comparing two groups or multiple groups. However, the underlying assumptions such as normality and homogeneous variances are often violated for qualitative data. To this end, we investigated an alternative methodology, rank regression, for analyzing the ordinal data. The rank-based methods are essentially based on pairwise comparisons and, therefore, can deal with qualitative data naturally. They require neither normality assumption nor data transformation. Apart from robustness against outliers and high efficiency, the rank regression can also incorporate covariate effects in the same way as the ordinary regression. By reanalyzing a data set from a wheat Fusarium crown rot study, we illustrated the use of the rank regression methodology and demonstrated that the rank regression models appear to be more appropriate and sensible for analyzing nonnormal data and data with outliers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rank-based inference is widely used because of its robustness. This article provides optimal rank-based estimating functions in analysis of clustered data with random cluster effects. The extensive simulation studies carried out to evaluate the performance of the proposed method demonstrate that it is robust to outliers and is highly efficient given the existence of strong cluster correlations. The performance of the proposed method is satisfactory even when the correlation structure is misspecified, or when heteroscedasticity in variance is present. Finally, a real dataset is analyzed for illustration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For clustered survival data, the traditional Gehan-type estimator is asymptotically equivalent to using only the between-cluster ranks, and the within-cluster ranks are ignored. The contribution of this paper is two fold: - (i) incorporating within-cluster ranks in censored data analysis, and; - (ii) applying the induced smoothing of Brown and Wang (2005, Biometrika) for computational convenience. Asymptotic properties of the resulting estimating functions are given. We also carry out numerical studies to assess the performance of the proposed approach and conclude that the proposed approach can lead to much improved estimators when strong clustering effects exist. A dataset from a litter-matched tumorigenesis experiment is used for illustration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With growing population and fast urbanization in Australia, it is a challenging task to maintain our water quality. It is essential to develop an appropriate statistical methodology in analyzing water quality data in order to draw valid conclusions and hence provide useful advices in water management. This paper is to develop robust rank-based procedures for analyzing nonnormally distributed data collected over time at different sites. To take account of temporal correlations of the observations within sites, we consider the optimally combined estimating functions proposed by Wang and Zhu (Biometrika, 93:459-464, 2006) which leads to more efficient parameter estimation. Furthermore, we apply the induced smoothing method to reduce the computational burden. Smoothing leads to easy calculation of the parameter estimates and their variance-covariance matrix. Analysis of water quality data from Total Iron and Total Cyanophytes shows the differences between the traditional generalized linear mixed models and rank regression models. Our analysis also demonstrates the advantages of the rank regression models for analyzing nonnormal data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which dearly demonstrates the advantages of the rank regression models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider rank regression for clustered data analysis and investigate the induced smoothing method for obtaining the asymptotic covariance matrices of the parameter estimators. We prove that the induced estimating functions are asymptotically unbiased and the resulting estimators are strongly consistent and asymptotically normal. The induced smoothing approach provides an effective way for obtaining asymptotic covariance matrices for between- and within-cluster estimators and for a combined estimator to take account of within-cluster correlations. We also carry out extensive simulation studies to assess the performance of different estimators. The proposed methodology is substantially Much faster in computation and more stable in numerical results than the existing methods. We apply the proposed methodology to a dataset from a randomized clinical trial.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider ranked-based regression models for clustered data analysis. A weighted Wilcoxon rank method is proposed to take account of within-cluster correlations and varying cluster sizes. The asymptotic normality of the resulting estimators is established. A method to estimate covariance of the estimators is also given, which can bypass estimation of the density function. Simulation studies are carried out to compare different estimators for a number of scenarios on the correlation structure, presence/absence of outliers and different correlation values. The proposed methods appear to perform well, in particular, the one incorporating the correlation in the weighting achieves the highest efficiency and robustness against misspecification of correlation structure and outliers. A real example is provided for illustration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider rank-based regression models for repeated measures. To account for possible withinsubject correlations, we decompose the total ranks into between- and within-subject ranks and obtain two different estimators based on between- and within-subject ranks. A simple perturbation method is then introduced to generate bootstrap replicates of the estimating functions and the parameter estimates. This provides a convenient way for combining the corresponding two types of estimating function for more efficient estimation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Adaptions of weighted rank regression to the accelerated failure time model for censored survival data have been successful in yielding asymptotically normal estimates and flexible weighting schemes to increase statistical efficiencies. However, for only one simple weighting scheme, Gehan or Wilcoxon weights, are estimating equations guaranteed to be monotone in parameter components, and even in this case are step functions, requiring the equivalent of linear programming for computation. The lack of smoothness makes standard error or covariance matrix estimation even more difficult. An induced smoothing technique overcame these difficulties in various problems involving monotone but pure jump estimating equations, including conventional rank regression. The present paper applies induced smoothing to the Gehan-Wilcoxon weighted rank regression for the accelerated failure time model, for the more difficult case of survival time data subject to censoring, where the inapplicability of permutation arguments necessitates a new method of estimating null variance of estimating functions. Smooth monotone parameter estimation and rapid, reliable standard error or covariance matrix estimation is obtained.