968 resultados para Unsupervised classification
Resumo:
Segmental dynamic time warping (DTW) has been demonstrated to be a useful technique for finding acoustic similarity scores between segments of two speech utterances. Due to its high computational requirements, it had to be computed in an offline manner, limiting the applications of the technique. In this paper, we present results of parallelization of this task by distributing the workload in either a static or dynamic way on an 8-processor cluster and discuss the trade-offs among different distribution schemes. We show that online unsupervised pattern discovery using segmental DTW is plausible with as low as 8 processors. This brings the task within reach of today's general purpose multi-core servers. We also show results on a 32-processor system, and discuss factors affecting scalability of our methods.
Resumo:
Land cover (LC) refers to what is actually present on the ground and provide insights into the underlying solution for improving the conditions of many issues, from water pollution to sustainable economic development. One of the greatest challenges of modeling LC changes using remotely sensed (RS) data is of scale-resolution mismatch: that the spatial resolution of detail is less than what is required, and that this sub-pixel level heterogeneity is important but not readily knowable. However, many pixels consist of a mixture of multiple classes. The solution to mixed pixel problem typically centers on soft classification techniques that are used to estimate the proportion of a certain class within each pixel. However, the spatial distribution of these class components within the pixel remains unknown. This study investigates Orthogonal Subspace Projection - an unmixing technique and uses pixel-swapping algorithm for predicting the spatial distribution of LC at sub-pixel resolution. Both the algorithms are applied on many simulated and actual satellite images for validation. The accuracy on the simulated images is ~100%, while IRS LISS-III and MODIS data show accuracy of 76.6% and 73.02% respectively. This demonstrates the relevance of these techniques for applications such as urban-nonurban, forest-nonforest classification studies etc.
Resumo:
Structural alignments are the most widely used tools for comparing proteins with low sequence similarity. The main contribution of this paper is to derive various kernels on proteins from structural alignments, which do not use sequence information. Central to the kernels is a novel alignment algorithm which matches substructures of fixed size using spectral graph matching techniques. We derive positive semi-definite kernels which capture the notion of similarity between substructures. Using these as base more sophisticated kernels on protein structures are proposed. To empirically evaluate the kernels we used a 40% sequence non-redundant structures from 15 different SCOP superfamilies. The kernels when used with SVMs show competitive performance with CE, a state of the art structure comparison program.
Resumo:
This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.
Resumo:
The covalent linkage between the side-chain and the backbone nitrogen atom of proline leads to the formation of the five-membered pyrrolidine ring and hence restriction of the backbone torsional angle phi to values of -60 degrees +/- 30 degrees for the L-proline. Diproline segments constitute a chain fragment with considerably reduced conformational choices. In the current study, the conformational states for the diproline segment ((L)Pro-(L)Pro) found in proteins has been investigated with an emphasis on the cis and trans states for the Pro-Pro peptide bond. The occurrence of diproline segments in turns and other secondary structures has been studied and compared to that of Xaa-Pro-Yaa segments in proteins which gives us a better understanding on the restriction imposed on other residues by the diproline segment and the single proline residue. The study indicates that P(II)-P(II) and P(II)-alpha are the most favorable conformational states for the diproline segment. The analysis on Xaa-Pro-Yaa sequences reveals that the XaaPro peptide bond exists preferably as the trans conformer rather than the cis conformer. The present study may lead to a better understanding of the behavior of proline occurring in diproline segments which can facilitate various designed diproline-based synthetic templates for biological and structural studies. (C) 2011 Wiley Periodicals, Inc. Biopolymers 97: 54-64, 2012.
Resumo:
A technique is proposed for classifying respiratory volume waveforms(RVW) into normal and abnormal categories of respiratory pathways. The proposed method transforms the temporal sequence into frequency domain by using an orthogonal transform, namely discrete cosine transform (DCT) and the transformed signal is pole-zero modelled. A Bayes classifier using model pole angles as the feature vector performed satisfactorily when a limited number of RVWs recorded under deep and rapid (DR) manoeuvre are classified.
Resumo:
Earthquakes cause massive road damage which in turn causes adverse effects on the society. Previous studies have quantified the damage caused to residential and commercial buildings; however, not many studies have been conducted to quantify road damage caused by earthquakes. In this study, an attempt has been made to propose a new scale to classify and quantify the road damage due to earthquakes based on the data collected from major earthquakes in the past. The proposed classification for road damage due to earthquake is called as road damage scale (RDS). Earthquake details such as magnitude, distance of road damage from the epicenter, focal depth, and photographs of damaged roads have been collected from various sources with reported modified Mercalli intensity (MMI). The widely used MMI scale is found to be inadequate to clearly define the road damage. The proposed RDS is applied to various reported road damage and reclassified as per RDS. The correlation between RDS and earthquake parameters of magnitude, epicenter distance, hypocenter distance, and combination of magnitude with epicenter and hypocenter distance has been studied using available data. It is observed that the proposed RDS correlates well with the available earthquake data when compared with the MMI scale. Among several correlations, correlation between RDS and combination of magnitude and epicenter distance is appropriate. Summary of these correlations, their limitations, and the applicability of the proposed scale to forecast road damages and to carry out vulnerability analysis in urban areas is presented in the paper.
Resumo:
The widely used Bayesian classifier is based on the assumption of equal prior probabilities for all the classes. However, inclusion of equal prior probabilities may not guarantee high classification accuracy for the individual classes. Here, we propose a novel technique-Hybrid Bayesian Classifier (HBC)-where the class prior probabilities are determined by unmixing a supplemental low spatial-high spectral resolution multispectral (MS) data that are assigned to every pixel in a high spatial-low spectral resolution MS data in Bayesian classification. This is demonstrated with two separate experiments-first, class abundances are estimated per pixel by unmixing Moderate Resolution Imaging Spectroradiometer data to be used as prior probabilities, while posterior probabilities are determined from the training data obtained from ground. These have been used for classifying the Indian Remote Sensing Satellite LISS-III MS data through Bayesian classifier. In the second experiment, abundances obtained by unmixing Landsat Enhanced Thematic Mapper Plus are used as priors, and posterior probabilities are determined from the ground data to classify IKONOS MS images through Bayesian classifier. The results indicated that HBC systematically exploited the information from two image sources, improving the overall accuracy of LISS-III MS classification by 6% and IKONOS MS classification by 9%. Inclusion of prior probabilities increased the average producer's and user's accuracies by 5.5% and 6.5% in case of LISS-III MS with six classes and 12.5% and 5.4% in IKONOS MS for five classes considered.
Resumo:
In this paper, we give a brief review of pattern classification algorithms based on discriminant analysis. We then apply these algorithms to classify movement direction based on multivariate local field potentials recorded from a microelectrode array in the primary motor cortex of a monkey performing a reaching task. We obtain prediction accuracies between 55% and 90% using different methods which are significantly above the chance level of 12.5%.
Resumo:
Proving the unsatisfiability of propositional Boolean formulas has applications in a wide range of fields. Minimal Unsatisfiable Sets (MUS) are signatures of the property of unsatisfiability in formulas and our understanding of these signatures can be very helpful in answering various algorithmic and structural questions relating to unsatisfiability. In this paper, we explore some combinatorial properties of MUS and use them to devise a classification scheme for MUS. We also derive bounds on the sizes of MUS in Horn, 2-SAT and 3-SAT formulas.
Resumo:
In this paper, we consider the problem of time series classification. Using piecewise linear interpolation various novel kernels are obtained which can be used with Support vector machines for designing classifiers capable of deciding the class of a given time series. The approach is general and is applicable in many scenarios. We apply the method to the task of Online Tamil handwritten character recognition with promising results.