25 resultados para MULTIFACTOR-DIMENSIONALITY REDUCTION

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization is one of the most effective methods for analyzing how high-dimensional data are distributed. Dimensionality reduction techniques, such as PCA, can be used to map high dimensional data to a two- or three-dimensional space. In this paper, we propose an algorithm called HyperMap that can be effectively applied to visualization. Our algorithm can be seen as a generalization of FastMap. It preserves its linear computation complexity, and overcomes several main shortcomings, especially in visualization. Since there are more than two pivot objects in each axis of a target space, more distance information needs to be preserved in each dimension. Then in visualization, the number of pivot objects can go beyond the limitation of six (2-pivot objects × 3-dimensions). Our HyperMap algorithm also gives more flexibility to the target space, such that the data distribution can be observed from various viewpoints. Its effectiveness is confirmed by empirical evaluations on both real and synthetic datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis investigates various machine learning approaches to reducing data dimensionality, and studies the impact of asymmetric data on learning in image retrieval. Efficient algorithms are proposed to reduce the data dimensionality. Integration strategies for one-class classification are designed to address asymmetric data issue and improve retrieval effectiveness.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a hybrid neural classifier combining the auto-encoder neural network and the Lattice Vector Quantization (LVQ) model is described. The auto-encoder network is used for dimensionality reduction by projecting high dimensional data into the 2D space. The LVQ model is used for data visualization by forming and adapting the granularity of a data map. The mapped data are employed to predict the target classes of new data samples. To improve classification accuracy, a majority voting scheme is adopted by the hybrid classifier. To demonstrate the applicability of the hybrid classifier, a series of experiments using simulated and real fault data from induction motors is conducted. The results show that the hybrid classifier is able to outperform the Multi-Layer Perceptron neural network, and to produce very good classification accuracy rates for various fault conditions of induction motors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Protein mass spectrometry (MS) pattern recognition has recently emerged as a new method for cancer diagnosis. Unfortunately, classification performance may degrade owing to the enormously high dimensionality of the data. This paper investigates the use of Random Projection in protein MS data dimensionality reduction. The effectiveness of Random Projection (RP) is analyzed and compared against Principal Component Analysis (PCA) by using three classification algorithms, namely Support Vector Machine, Feed-forward Neural Networks and K-Nearest Neighbour. Three real-world cancer data sets are employed to evaluate the performances of RP and PCA. Through the investigations, RP method demonstrated better or at least comparable classification performance as PCA if the dimensionality of the projection matrix is sufficiently large. This paper also explores the use of RP as a pre-processing step prior to PCA. The results show that without sacrificing classification accuracy, performing RP prior to PCA significantly improves the computational time.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a novel dimensionality reduction algorithm for kernel based classification. In the feature space, the proposed algorithm maximizes the ratio of the squared between-class distance and the sum of the within-class variances of the training samples for a given reduced dimension. This algorithm has lower complexity than the recently reported kernel dimension reduction(KDR) for supervised learning. We conducted several simulations with large training datasets, which demonstrate that the proposed algorithm has similar performance or is marginally better compared with KDR whilst having the advantage of computational efficiency. Further, we applied the proposed dimension reduction algorithm to face recognition in which the number of training samples is very small. This proposed face recognition approach based on the new algorithm outperforms the eigenface approach based on the principle component analysis (PCA), when the training data is complete, that is, representative of the whole dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recently DTW (dynamic time warping) has been recognized as the most robust distance function to measure the similarity between two time series, and this fact has spawned a flurry of research on this topic. Most indexing methods proposed for DTW are based on the R-tree structure. Because of high dimensionality and loose lower bounds for time warping distance, the pruning power of these tree structures are quite weak, resulting in inefficient search. In this paper, we propose a dimensionality reduction method motivated by observations about the inherent character of each time series. A very compact index file is constructed. By scanning the index file, we can get a very small candidate set, so that the number of page access is dramatically reduced. We demonstrate the effectiveness of our approach on real and synthetic datasets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The tree index structure is a traditional method for searching similar data in large datasets. It is based on the presupposition that most sub-trees are pruned in the searching process. As a result, the number of page accesses is reduced. However, time-series datasets generally have a very high dimensionality. Because of the so-called dimensionality curse, the pruning effectiveness is reduced in high dimensionality. Consequently, the tree index structure is not a suitable method for time-series datasets. In this paper, we propose a two-phase (filtering and refinement) method for searching time-series datasets. In the filtering step, a quantizing time-series is used to construct a compact file which is scanned for filtering out irrelevant. A small set of candidates is translated to the second step for refinement. In this step, we introduce an effective index compression method named grid-based datawise dimensionality reduction (DRR) which attempts to preserve the characteristics of the time-series. An experimental comparison with existing techniques demonstrates the utility of our approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

ISOMap is a popular method for nonlinear dimensionality reduction in batch mode, but need to run its entirety inefficiently if the data comes sequentially. In this paper, we present an extension of ISOMap, namely I-ISOMap, augmenting the existing ISOMap framework to the situation where additional points become available after initial manifold is constructed. The MDS step, as a key component in ISOMap, is adapted by introducing Spring model and sampling strategy. As a result, it consumes only linear time to obtain a stable layout due to the Spring model’s iterative nature. The proposed method outperforms earlier work by Law [1], where their MDS step runs within quadratic time. Experimental results show that I-ISOMap is a precise and efficient technique for capturing evolving manifold.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Information Bottleneck method can be used as a dimensionality reduction approach by grouping “similar” features together [1]. In application, a natural question is how many “features groups” will be appropriate. The dependency on prior knowledge restricts the applications of many Information Bottleneck algorithms. In this paper we alleviate this dependency by formulating the parameter determination as a model selection problem, and solve it using the minimum message length principle. An efficient encoding scheme is designed to describe the information bottleneck solutions and the original data, then the minimum message length principle is incorporated to automatically determine the optimal cardinality value. Empirical results in the documentation clustering scenario indicates that the proposed method works well for the determination of the optimal parameter value for information bottleneck method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Human identification by gait has created a great deal of interest in computer vision community due to its advantage of inconspicuous recognition at a relatively far distance. This paper provides a comprehensive survey of recent developments on gait recognition approaches. The survey emphasizes on three major issues involved in a general gait recognition system, namely gait image representation, feature dimensionality reduction and gait classification. Also, a review of the available public gait datasets is presented. The concluding discussions outline a number of research challenges and provide promising future directions for the field.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a human daily activity classification approach based on the sensory data collected from a single tri-axial accelerometer worn on waist belt. The classification algorithm was realized to distinguish 6 different activities including standing, jumping, sitting-down, walking, running and falling through three major steps: wavelet transformation, Principle Component Analysis (PCA)-based dimensionality reduction and followed by implementing a radial basis function (RBF) kernel Support Vector Machine (SVM) classifier. Two trials were conducted to evaluate different aspects of the classification scheme. In the first trial, the classifier was trained and evaluated by using a dataset of 420 samples collected from seven subjects by using a k-fold cross-validation method. The parameters σ and c of the RBF kernel were optimized through automatic searching in terms of yielding the highest recognition accuracy and robustness. In the second trial, the generation capability of the classifier was also validated by using the dataset collected from six new subjects. The average classification rates of 95% and 93% are obtained in trials 1 and 2, respectively. The results in trial 2 show the system is also good at classifying activity signals of new subjects. It can be concluded that the collective effects of the usage of single accelerometer sensing, the setting of the accelerometer placement and efficient classifier would make this wearable sensing system more realistic and more comfortable to be implemented for long-term human activity monitoring and classification in ambulatory environment, therefore, more acceptable by users.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and cooccurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and item-based processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a computational framework to automatically discover high-order temporal social patterns from very noisy and sparse location data. We introduce the concept of social footprint and present a method to construct a codebook, enabling the transformation of raw sensor data into a collection of social pages. Each page captures social activities of a user over regular time period, and represented as a sequence of encoded footprints. Computable patterns are then defined as repeated structures found in these sequences. To do so, we appeal to modeling tools in document analysis and propose a Latent Social theme Dirichlet Allocation (LSDA) model - a version of the Ngram topic model in [6] with extra modeling of personal context. This model can be viewed as a Bayesian clustering method, jointly discovering temporal collocation of footprints and exploiting statistical strength across social pages, to automatically discovery high-order patterns. Alternatively, it can be viewed as a dimensionality reduction method where the reduced latent space can be interpreted as the hidden social 'theme' - a more abstract perception of user's daily activities. Applying this framework to a real-world noisy dataset collected over 1.5 years, we show that many useful and interesting patterns can be computed. Interpretable social themes can also be deduced from the discovered patterns.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Learning a robust projection with a small number of training samples is still a challenging problem in face recognition, especially when the unseen faces have extreme variation in pose, illumination, and facial expression. To address this problem, we propose a framework formulated under statistical learning theory that facilitates robust learning of a discriminative projection. Dimensionality reduction using the projection matrix is combined with a linear classifier in the regularized framework of lasso regression. The projection matrix in conjunction with the classifier parameters are then found by solving an optimization problem over the Stiefel manifold. The experimental results on standard face databases suggest that the proposed method outperforms some recent regularized techniques when the number of training samples is small.