58 resultados para THRESHOLD SELECTION METHOD

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of instance selection is to identify which instances (examples, patterns) in a large dataset should be selected as representatives of the entire dataset, without significant loss of information. When a machine learning method is applied to the reduced dataset, the accuracy of the model should not be significantly worse than if the same method were applied to the entire dataset. The reducibility of any dataset, and hence the success of instance selection methods, surely depends on the characteristics of the dataset, as well as the machine learning method. This paper adopts a meta-learning approach, via an empirical study of 112 classification datasets from the UCI Repository [1], to explore the relationship between data characteristics, machine learning methods, and the success of instance selection method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five individual gene ranking methods: two-sample t-test, entropy test, receiver operating characteristic curve, Wilcoxon test, and signal to noise ratio. The most prominent discriminant genes serve as inputs to a range of classifiers including linear discriminant analysis, k-nearest neighbors, probabilistic neural network, support vector machine, and multilayer perceptron. Gene subsets selected by MAHP are compared with those of four competing approaches: information gain, symmetrical uncertainty, Bhattacharyya distance and ReliefF. Four benchmark microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, prostate and colon are utilized for experiments. As the number of samples in microarray data datasets are limited, the leave one out cross validation strategy is applied rather than the traditional cross validation. Experimental results demonstrate the significant dominance of the proposed MAHP against the competing methods in terms of both accuracy and stability. With a benefit of inexpensive computational cost, MAHP is useful for cancer diagnosis using DNA gene expression profiles in the real clinical practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Segmentation is the process of extraction of objects from an image. This paper proposes a new algorithm to construct intuitionistic fuzzy set (IFS) from multiple fuzzy sets as an application to image segmentation. Hesitation degree in IFS is formulated as the degree of ignorance (due to the lack of knowledge) to determine whether the chosen membership function is best for image segmentation. By minimizing entropy of IFS generated from various fuzzy sets, an image is thresholded. Experimental results are provided to show the effectiveness of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we investigate the parameters selection for Eigenfaces. Our focus is on the eigenvectors and threshold selection issues. We will propose a systematic approach in selecting the eigenvectors based on relative errors of the eigenvalues for the covariance matrix. In addition, we have proposed a method for selecting the classification threshold that utilizes the information obtained from the training data set. Experimentation was conducted on two benchmark face databases, ORL and AMP, with results indicating that the proposed automatic eigenvectors and threshold selection methods produce better recognition performance in terms of precision and recall rates. Furthermore, we show that the eigenvector selection method outperforms energy and stretching dimension methods in terms of selected number of eigenvectors and computation cost.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we investigate the parameter selection issues for Eigenfaces. Our focus is on the eigenvectors and threshold selection issues. We propose a systematic approach in selecting the eigenvectors based on the relative errors of the eigenvalues. In addition, we have designed a method for selecting the classification threshold that utilizes the information obtained from the training database effectively. Experimentation was conducted on the ORL and AMP face databases with results indicating that the automatic eigenvectors and threshold selection methods provide an optimum recognition in terms of precision and recall rates. Furthermore, we show that the eigenvector selection method outperforms energy and stretching dimension methods in terms of selected number of eigenvectors and computation cost.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Generalized Estimating Equations (GEE) method is one of the most commonly used statistical methods for the analysis of longitudinal data in epidemiological studies. A working correlation structure for the repeated measures of the outcome variable of a subject needs to be specified by this method. However, statistical criteria for selecting the best correlation structure and the best subset of explanatory variables in GEE are only available recently because the GEE method is developed on the basis of quasi-likelihood theory. Maximum likelihood based model selection methods, such as the widely used Akaike Information Criterion (AIC), are not applicable to GEE directly. Pan (2001) proposed a selection method called QIC which can be used to select the best correlation structure and the best subset of explanatory variables. Based on the QIC method, we developed a computing program to calculate the QIC value for a range of different distributions, link functions and correlation structures. This program was written in Stata software. In this article, we introduce this program and demonstrate how to use it to select the most parsimonious model in GEE analyses of longitudinal data through several representative examples.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Classifying user emails correctly from penetration of spam is an important research issue for anti-spam researchers. This paper has presented an effective and efficient email classification technique based on data filtering method. In our testing we have introduced an innovative filtering technique using instance selection method (ISM) to reduce the pointless data instances from training model and then classify the test data. The objective of ISM is to identify which instances (examples, patterns) in email corpora should be selected as representatives of the entire dataset, without significant loss of information. We have used WEKA interface in our integrated classification model and tested diverse classification algorithms. Our empirical studies show significant performance in terms of classification accuracy with reduction of false positive instances.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

One of the issues associated with pattern classification using data based machine learning systems is the “curse of dimensionality”. In this paper, the circle-segments method is proposed as a feature selection method to identify important input features before the entire data set is provided for learning with machine learning systems. Specifically, four machine learning systems are deployed for classification, viz. Multilayer Perceptron (MLP), Support Vector Machine (SVM), Fuzzy ARTMAP (FAM), and k-Nearest Neighbour (kNN). The integration between the circle-segments method and the machine learning systems has been applied to two case studies comprising one benchmark and one real data sets. Overall, the results after feature selection using the circle segments method demonstrate improvements in performance even with more than 50% of the input features eliminated from the original data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The lasso procedure is an estimator-shrinkage and variable selection method. This paper shows that there always exists an interval of tuning parameter values such that the corresponding mean squared prediction error for the lasso estimator is smaller than for the ordinary least squares estimator. For an estimator satisfying some condition such as unbiasedness, the paper defines the corresponding generalized lasso estimator. Its mean squared prediction error is shown to be smaller than that of the estimator for values of the tuning parameter in some interval. This implies that all unbiased estimators are not admissible. Simulation results for five models support the theoretical results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis proposes an innovative adaptive multi-classifier spam filtering model, with a grey-list analyser and a dynamic feature selection method, to overcome false-positive problems in email classification. It also presents additional techniques to minimize the added complexity. Empirical evidence indicates the success of this model over existing approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An in vitro selection method based on the autolytic cleavage of yeast tRNAPhe by Pb2+ was applied to obtain tRNA derivatives with the anticodon hairpin replaced by four single-stranded nucleotides. Based on the rates of the site-specific cleavage by Pb2+ and the presence of a specific UV-induced crosslink, certain tetranucieotide sequences allow proper folding of the rest of the tRNA molecule, wheras others do not. One such successful tetramer sequence was also used to replace the acceptor stem of yeast tRNAPhe and the anticodon hairpin of E.coli tRNAPhe without disrupting folding. These experiments suggest that certain tetramers may be able to replace structurally non essential hairpins in any RNA.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present improved algorithms for cut, fade, and dissolve detection which are fundamental steps in digital video analysis. In particular, we propose a new adaptive threshold determination method that is shown to reduce artifacts created by noise and motion in scene cut detection. We also describe new two-step algorithms for fade and dissolve detection, and introduce a method for eliminating false positives from a list of detected candidate transitions. In our detailed study of these gradual shot transitions, our objective has been to accurately classify the type of transitions (fade-in, fade-out, and dissolve) and to precisely locate the boundary of the transitions. This distinguishes our work from other early work in scene change detection which tends to focus primarily on identifying the existence of a transition rather than its precise temporal extent. We evaluate our improved algorithms against two other commonly used shot detection techniques on a comprehensive data set, and demonstrate the improved performance due to our enhancements.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Intelligent Water Drop (IWD) algorithm is a recent stochastic swarm-based method that is useful for solving combinatorial and function optimization problems. In this paper, we investigate the effectiveness of the selection method in the solution construction phase of the IWD algorithm. Instead of the fitness proportionate selection method in the original IWD algorithm, two ranking-based selection methods, namely linear ranking and exponential ranking, are proposed. Both ranking-based selection methods aim to solve the identified limitations of the fitness proportionate selection method as well as to enable the IWD algorithm to escape from local optima and ensure its search diversity. To evaluate the usefulness of the proposed ranking-based selection methods, a series of experiments pertaining to three combinatorial optimization problems, i.e., rough set feature subset selection, multiple knapsack and travelling salesman problems, is conducted. The results demonstrate that the exponential ranking selection method is able to preserve the search diversity, therefore improving the performance of the IWD algorithm. © 2014 Elsevier Ltd. All rights reserved.