876 resultados para Feature extraction
Resumo:
In this paper, we develop a game theoretic approach for clustering features in a learning problem. Feature clustering can serve as an important preprocessing step in many problems such as feature selection, dimensionality reduction, etc. In this approach, we view features as rational players of a coalitional game where they form coalitions (or clusters) among themselves in order to maximize their individual payoffs. We show how Nash Stable Partition (NSP), a well known concept in the coalitional game theory, provides a natural way of clustering features. Through this approach, one can obtain some desirable properties of the clusters by choosing appropriate payoff functions. For a small number of features, the NSP based clustering can be found by solving an integer linear program (ILP). However, for large number of features, the ILP based approach does not scale well and hence we propose a hierarchical approach. Interestingly, a key result that we prove on the equivalence between a k-size NSP of a coalitional game and minimum k-cut of an appropriately constructed graph comes in handy for large scale problems. In this paper, we use feature selection problem (in a classification setting) as a running example to illustrate our approach. We conduct experiments to illustrate the efficacy of our approach.
Resumo:
We address the classical problem of delta feature computation, and interpret the operation involved in terms of Savitzky- Golay (SG) filtering. Features such as themel-frequency cepstral coefficients (MFCCs), obtained based on short-time spectra of the speech signal, are commonly used in speech recognition tasks. In order to incorporate the dynamics of speech, auxiliary delta and delta-delta features, which are computed as temporal derivatives of the original features, are used. Typically, the delta features are computed in a smooth fashion using local least-squares (LS) polynomial fitting on each feature vector component trajectory. In the light of the original work of Savitzky and Golay, and a recent article by Schafer in IEEE Signal Processing Magazine, we interpret the dynamic feature vector computation for arbitrary derivative orders as SG filtering with a fixed impulse response. This filtering equivalence brings in significantly lower latency with no loss in accuracy, as validated by results on a TIMIT phoneme recognition task. The SG filters involved in dynamic parameter computation can be viewed as modulation filters, proposed by Hermansky.
Resumo:
Analysis of high resolution satellite images has been an important research topic for urban analysis. One of the important features of urban areas in urban analysis is the automatic road network extraction. Two approaches for road extraction based on Level Set and Mean Shift methods are proposed. From an original image it is difficult and computationally expensive to extract roads due to presences of other road-like features with straight edges. The image is preprocessed to improve the tolerance by reducing the noise (the buildings, parking lots, vegetation regions and other open spaces) and roads are first extracted as elongated regions, nonlinear noise segments are removed using a median filter (based on the fact that road networks constitute large number of small linear structures). Then road extraction is performed using Level Set and Mean Shift method. Finally the accuracy for the road extracted images is evaluated based on quality measures. The 1m resolution IKONOS data has been used for the experiment.
Resumo:
We analyze the spectral zero-crossing rate (SZCR) properties of transient signals and show that SZCR contains accurate localization information about the transient. For a train of pulses containing transient events, the SZCR computed on a sliding window basis is useful in locating the impulse locations accurately. We present the properties of SZCR on standard stylized signal models and then show how it may be used to estimate the epochs in speech signals. We also present comparisons with some state-of-the-art techniques that are based on the group-delay function. Experiments on real speech show that the proposed SZCR technique is better than other group-delay-based epoch detectors. In the presence of noise, a comparison with the zero-frequency filtering technique (ZFF) and Dynamic programming projected Phase-Slope Algorithm (DYPSA) showed that performance of the SZCR technique is better than DYPSA and inferior to that of ZFF. For highpass-filtered speech, where ZFF performance suffers drastically, the identification rates of SZCR are better than those of DYPSA.
Resumo:
Classification of a large document collection involves dealing with a huge feature space where each distinct word is a feature. In such an environment, classification is a costly task both in terms of running time and computing resources. Further it will not guarantee optimal results because it is likely to overfit by considering every feature for classification. In such a context, feature selection is inevitable. This work analyses the feature selection methods, explores the relations among them and attempts to find a minimal subset of features which are discriminative for document classification.
Resumo:
In this paper, we present a methodology for identifying best features from a large feature space. In high dimensional feature space nearest neighbor search is meaningless. In this feature space we see quality and performance issue with nearest neighbor search. Many data mining algorithms use nearest neighbor search. So instead of doing nearest neighbor search using all the features we need to select relevant features. We propose feature selection using Non-negative Matrix Factorization(NMF) and its application to nearest neighbor search. Recent clustering algorithm based on Locally Consistent Concept Factorization(LCCF) shows better quality of document clustering by using local geometrical and discriminating structure of the data. By using our feature selection method we have shown further improvement of performance in the clustering.
Resumo:
Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.
Resumo:
Clustering has been the most popular method for data exploration. Clustering is partitioning the data set into sub-partitions based on some measures say the distance measure, each partition has its own significant information. There are a number of algorithms explored for this purpose, one such algorithm is the Particle Swarm Optimization(PSO) which is a population based heuristic search technique derived from swarm intelligence. In this paper we present an improved version of the Particle Swarm Optimization where, each feature of the data set is given significance accordingly by adding some random weights, which also minimizes the distortions in the dataset if any. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The experimental results shows that our proposed methodology performs significantly better than the previously performed experiments.
Resumo:
Identifying symmetry in scalar fields is a recent area of research in scientific visualization and computer graphics communities. Symmetry detection techniques based on abstract representations of the scalar field use only limited geometric information in their analysis. Hence they may not be suited for applications that study the geometric properties of the regions in the domain. On the other hand, methods that accumulate local evidence of symmetry through a voting procedure have been successfully used for detecting geometric symmetry in shapes. We extend such a technique to scalar fields and use it to detect geometrically symmetric regions in synthetic as well as real-world datasets. Identifying symmetry in the scalar field can significantly improve visualization and interactive exploration of the data. We demonstrate different applications of the symmetry detection method to scientific visualization: query-based exploration of scalar fields, linked selection in symmetric regions for interactive visualization, and classification of geometrically symmetric regions and its application to anomaly detection.
Resumo:
We consider the problem of finding the best features for value function approximation in reinforcement learning and develop an online algorithm to optimize the mean square Bellman error objective. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme and, on a slower timescale, also performs gradient search in the Grassman manifold of features. We present a proof of convergence of our algorithm. We show empirical results using our algorithm as well as a similar algorithm that uses temporal difference learning in place of the residual gradient scheme for the faster timescale updates.
Resumo:
Data clustering groups data so that data which are similar to each other are in the same group and data which are dissimilar to each other are in different groups. Since generally clustering is a subjective activity, it is possible to get different clusterings of the same data depending on the need. This paper attempts to find the best clustering of the data by first carrying out feature selection and using only the selected features, for clustering. A PSO (Particle Swarm Optimization)has been used for clustering but feature selection has also been carried out simultaneously. The performance of the above proposed algorithm is evaluated on some benchmark data sets. The experimental results shows the proposed methodology outperforms the previous approaches such as basic PSO and Kmeans for the clustering problem.
Resumo:
Acoustic signal variation and female preference for different signal components constitute the prerequisite framework to study the mechanisms of sexual selection that shape acoustic communication. Despite several studies of acoustic communication in crickets, information on both male calling song variation in the field and female preference in the same system is lacking for most species. Previous studies on acoustic signal variation either were carried out on populations maintained in the laboratory or did not investigate signal repeatability. We therefore used repeatability analysis to quantify variation in the spectral, temporal and amplitudinal characteristics of the male calling song of the field cricket Plebeiogryllus guttiventris in a wild population, at two temporal scales, within and across nights. Carrier frequency (CF) was the most repeatable character across nights, whereas chirp period (CP) had low repeatability across nights. We investigated whether female preferences were more likely to be based on features with high (CF) or low (CP) repeatability. Females showed no consistent preferences for CF but were significantly more attracted towards signals with short CPs. The attractiveness of lower CP calls disappeared, however, when traded off with sound pressure level (SPL). SPL was the only acoustic feature that was significantly positively correlated with male body size. Since relative SPL affects female phonotaxis strongly and can vary unpredictably based on male spacing, our results suggest that even strong female preferences for acoustic features may not necessarily translate into greater advantage for males possessing these features in the field. (C) 2013 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.
Resumo:
Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting `transients' in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.
Resumo:
Variable Endmember Constrained Least Square (VECLS) technique is proposed to account endmember variability in the linear mixture model by incorporating the variance for each class, the signals of which varies from pixel to pixel due to change in urban land cover (LC) structures. VECLS is first tested with a computer simulated three class endmember considering four bands having small, medium and large variability with three different spatial resolutions. The technique is next validated with real datasets of IKONOS, Landsat ETM+ and MODIS. The results show that correlation between actual and estimated proportion is higher by an average of 0.25 for the artificial datasets compared to a situation where variability is not considered. With IKONOS, Landsat ETM+ and MODIS data, the average correlation increased by 0.15 for 2 and 3 classes and by 0.19 for 4 classes, when compared to single endmember per class. (C) 2013 COSPAR. Published by Elsevier Ltd. All rights reserved.
Resumo:
Non-human primate populations, other than responding appropriately to naturally occurring challenges, also need to cope with anthropogenic factors such as environmental pollution, resource depletion, and habitat destruction. Populations and individuals are likely to show considerable variations in food extraction abilities, with some populations and individuals more efficient than others at exploiting a set of resources. In this study, we examined among urban free-ranging bonnet macaques, Macaca radiata (a) local differences in food extraction abilities, (b) between-individual variation and within-individual consistency in problem-solving success and the underlying problem-solving characteristics, and (c) behavioral patterns associated with higher efficiency in food extraction. When presented with novel food extraction tasks, the urban macaques having more frequent exposure to novel physical objects in their surroundings, extracted food material from PET bottles and also solved another food extraction task (i.e., extracting an orange from a wire mesh box), more often than those living under more natural conditions. Adults solved the tasks more frequently than juveniles, and females more frequently than males. Both solution-technique and problem-solving characteristics varied across individuals but remained consistent within each individual across the successive presentations of PET bottles. The macaques that solved the tasks showed lesser within-individual variation in their food extraction behavior as compared to those that failed to solve the tasks. A few macaques appropriately modified their problem-solving behavior in accordance with the task requirements and solved the modified versions of the tasks without trial-and-error learning. These observations are ecologically relevant - they demonstrate considerable local differences in food extraction abilities, between-individual variation and within-individual consistency in food extraction techniques among free-ranging bonnet macaques, possibly affecting the species' local adaptability and resilience to environmental changes.