64 resultados para K-nearest neighbors method

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five individual gene ranking methods: two-sample t-test, entropy test, receiver operating characteristic curve, Wilcoxon test, and signal to noise ratio. The most prominent discriminant genes serve as inputs to a range of classifiers including linear discriminant analysis, k-nearest neighbors, probabilistic neural network, support vector machine, and multilayer perceptron. Gene subsets selected by MAHP are compared with those of four competing approaches: information gain, symmetrical uncertainty, Bhattacharyya distance and ReliefF. Four benchmark microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, prostate and colon are utilized for experiments. As the number of samples in microarray data datasets are limited, the leave one out cross validation strategy is applied rather than the traditional cross validation. Experimental results demonstrate the significant dominance of the proposed MAHP against the competing methods in terms of both accuracy and stability. With a benefit of inexpensive computational cost, MAHP is useful for cancer diagnosis using DNA gene expression profiles in the real clinical practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

k-nearest neighbors (kNN) is a popular method for function approximation and classification. One drawback of this method is that the nearest neighbors can be all located on one side of the point in question x. An alternative natural neighbors method is expensive for more than three variables. In this paper we propose the use of the discrete Choquet integral for combining the values of the nearest neighbors so that redundant information is canceled out. We design a fuzzy measure based on location of the nearest neighbors, which favors neighbors located all around x.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new online neural-network-based regression model for noisy data is proposed in this paper. It is a hybrid system combining the Fuzzy ART (FA) and General Regression Neural Network (GRNN) models. Both the FA and GRNN models are fast incremental learning systems. The proposed hybrid model, denoted as GRNNFA-online, retains the online learning properties of both models. The kernel centers of the GRNN are obtained by compressing the training samples using the FA model. The width of each kernel is then estimated by the K-nearest-neighbors (kNN) method. A heuristic is proposed to tune the value of Kof the kNN dynamically based on the concept of gradient-descent. The performance of the GRNNFA-online model was evaluated using two benchmark datasets, i.e., OZONE and Friedman#1. The experimental results demonstrated the convergence of the prediction errors. Bootstrapping was employed to assess the performance statistically. The final prediction errors are analyzed and compared with those from other systems.Bootstrapping was employed to assess the performance statistically. The final prediction errors are analyzed and compared with those from other systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We discuss the problem of texture recognition based on the grey level co-occurrence matrix (GLCM). We performed a number of numerical experiments to establish whether the accuracy of classification is optimal when GLCM entries are aggregated into standard metrics like contrast, dissimilarity, homogeneity, entropy, etc., and compared these metrics to several alternative aggregation methods.We conclude that k nearest neighbors classification based on raw GLCM entries typically works better than classification based on the standard metrics for noiseless data, that metrics based on principal component analysis inprove classification, and that a simple change from the arithmetic to quadratic mean in calculating the standard metrics also improves classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature based camera model identification plays an important role for forensics investigations on images. The conventional feature based identification schemes suffer from the problem of unknown models, that is, some images are captured by the camera models previously unknown to the identification system. To address this problem, we propose a new scheme: Source Camera Identification with Unknown models (SCIU). It has the capability of identifying images of the unknown models as well as distinguishing images of the known models. The new SCIU scheme consists of three stages: 1) unknown detection; 2) unknown expansion; and 3) (K+1)-class classification. Unknown detection applies a k-nearest neighbours method to recognize a few sample images of unknown models from the unlabeled images. Unknown expansion further extends the set of unknown sample images using a self-training strategy. Then, we address a specific (K+1)-class classification, in which the sample images of unknown (1-class) and known models (K-class) are combined to train a classifier. In addition, we develop a parameter optimization method for unknown detection, and investigate the stopping criterion for unknown expansion. The experiments carried out on the Dresden image collection confirm the effectiveness of the proposed SCIU scheme. When unknown models present, the identification accuracy of SCIU is significantly better than the four state-of-art methods: 1) multi-class Support Vector Machine (SVM); 2) binary SVM; 3) combined classification framework; and 4) decision boundary carving.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the increase use of location-based services, location privacy has recently raised serious concerns. To protect a user from being identified, a cloaked spatial region that contains other k-1 nearest neighbors of the user is used to replace the accurate position. In this paper, we consider location-aware applications that services are different among regions. To search nearest neighbors, we define a novel distance measurement that combines the semantic distance and the Euclidean distance to address the privacy preserving issue in the above-mentioned applications. We also propose an algorithm kNNH to implement our proposed method. The experimental results further suggest that the proposed distance metric and the algorithm can successfully retain the utility of the location services while preserving users’ privacy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the increasing use of location-based services, location privacy has recently started raising serious concerns. Location perturbation and obfuscation are most widely used for location privacy preserving. To protect a user from being identified, a cloaked spatial region that contains other k - 1 nearest neighbors of the user is submitted to the location-based service provider, instead of the accurate position. In this paper, we consider the location-aware applications that services are different among regions. In such scenarios, the semantic distance between users should be considered besides the Euclidean distance for searching the neighbors of a user. We define a novel distance measurement that combines the semantic and the Euclidean distance to address the privacy-preserving issue in the aforementioned applications. We also present an algorithm kNNH to implement our proposed method. Moreover, we conduct performance study experiments on the proposed algorithm. The experimental results further suggest that the proposed distance metric and the algorithm can successfully retain the utility of the location services while preserving users' privacy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Protein mass spectrometry (MS) pattern recognition has recently emerged as a new method for cancer diagnosis. Unfortunately, classification performance may degrade owing to the enormously high dimensionality of the data. This paper investigates the use of Random Projection in protein MS data dimensionality reduction. The effectiveness of Random Projection (RP) is analyzed and compared against Principal Component Analysis (PCA) by using three classification algorithms, namely Support Vector Machine, Feed-forward Neural Networks and K-Nearest Neighbour. Three real-world cancer data sets are employed to evaluate the performances of RP and PCA. Through the investigations, RP method demonstrated better or at least comparable classification performance as PCA if the dimensionality of the projection matrix is sufficiently large. This paper also explores the use of RP as a pre-processing step prior to PCA. The results show that without sacrificing classification accuracy, performing RP prior to PCA significantly improves the computational time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the issues associated with pattern classification using data based machine learning systems is the “curse of dimensionality”. In this paper, the circle-segments method is proposed as a feature selection method to identify important input features before the entire data set is provided for learning with machine learning systems. Specifically, four machine learning systems are deployed for classification, viz. Multilayer Perceptron (MLP), Support Vector Machine (SVM), Fuzzy ARTMAP (FAM), and k-Nearest Neighbour (kNN). The integration between the circle-segments method and the machine learning systems has been applied to two case studies comprising one benchmark and one real data sets. Overall, the results after feature selection using the circle segments method demonstrate improvements in performance even with more than 50% of the input features eliminated from the original data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a comparative evaluation of popular multi-label classification methods on several multi-label problems from different domains. The methods include multi-label k-nearest neighbor, binary relevance, label power set, random k-label set ensemble learning, calibrated label ranking, hierarchy of multi-label classifiers and triple random ensemble multi-label classification algorithms. These multi-label learning algorithms are evaluated using several widely used MLC evaluation metrics. The evaluation results show that for each multi-label classification problem a particular MLC method can be recommended. The multi-label evaluation datasets used in this study are related to scene images, multimedia video frames, diagnostic medical report, email messages, emotional music data, biological genes and multi-structural proteins categorization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the context of collaborative filtering, the well known data sparsity issue makes two like-minded users have little similarity, and consequently renders the k nearest neighbour rule inapplicable. In this paper, we address the data sparsity problem in the neighbourhood-based CF methods by proposing an Adaptive-Maximum imputation method (AdaM). The basic idea is to identify an imputation area that can maximize the imputation benefit for recommendation purposes, while minimizing the imputation error brought in. To achieve the maximum imputation benefit, the imputation area is determined from both the user and the item perspectives; to minimize the imputation error, there is at least one real rating preserved for each item in the identified imputation area. A theoretical analysis is provided to prove that the proposed imputation method outperforms the conventional neighbourhood-based CF methods through more accurate neighbour identification. Experiment results on benchmark datasets show that the proposed method significantly outperforms the other related state-of-the-art imputation-based methods in terms of accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Texture classification is one of the most important tasks in computer vision field and it has been extensively investigated in the last several decades. Previous texture classification methods mainly used the template matching based methods such as Support Vector Machine and k-Nearest-Neighbour for classification. Given enough training images the state-of-the-art texture classification methods could achieve very high classification accuracies on some benchmark databases. However, when the number of training images is limited, which usually happens in real-world applications because of the high cost of obtaining labelled data, the classification accuracies of those state-of-the-art methods would deteriorate due to the overfitting effect. In this paper we aim to develop a novel framework that could correctly classify textural images with only a small number of training images. By taking into account the repetition and sparsity property of textures we propose a sparse representation based multi-manifold analysis framework for texture classification from few training images. A set of new training samples are generated from each training image by a scale and spatial pyramid, and then the training samples belonging to each class are modelled by a manifold based on sparse representation. We learn a dictionary of sparse representation and a projection matrix for each class and classify the test images based on the projected reconstruction errors. The framework provides a more compact model than the template matching based texture classification methods, and mitigates the overfitting effect. Experimental results show that the proposed method could achieve reasonably high generalization capability even with as few as 3 training images, and significantly outperforms the state-of-the-art texture classification approaches on three benchmark datasets. © 2014 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a method to classify EEG signals using features extracted by an integration of wavelet transform and the nonparametric Wilcoxon test. Orthogonal Haar wavelet coefficients are ranked based on the Wilcoxon test’s statistics. The most prominent discriminant wavelets are assembled to form a feature set that serves as inputs to the naïve Bayes classifier. Two benchmark datasets, named Ia and Ib, downloaded from the brain–computer interface (BCI) competition II are employed for the experiments. Classification performance is evaluated using accuracy, mutual information, Gini coefficient and F-measure. Widely used classifiers, including feedforward neural network, support vector machine, k-nearest neighbours, ensemble learning Adaboost and adaptive neuro-fuzzy inference system, are also implemented for comparisons. The proposed combination of Haar wavelet features and naïve Bayes classifier considerably dominates the competitive classification approaches and outperforms the best performance on the Ia and Ib datasets reported in the BCI competition II. Application of naïve Bayes also provides a low computational cost approach that promotes the implementation of a potential real-time BCI system.