977 resultados para Document classification
Resumo:
Land cover (LC) refers to what is actually present on the ground and provide insights into the underlying solution for improving the conditions of many issues, from water pollution to sustainable economic development. One of the greatest challenges of modeling LC changes using remotely sensed (RS) data is of scale-resolution mismatch: that the spatial resolution of detail is less than what is required, and that this sub-pixel level heterogeneity is important but not readily knowable. However, many pixels consist of a mixture of multiple classes. The solution to mixed pixel problem typically centers on soft classification techniques that are used to estimate the proportion of a certain class within each pixel. However, the spatial distribution of these class components within the pixel remains unknown. This study investigates Orthogonal Subspace Projection - an unmixing technique and uses pixel-swapping algorithm for predicting the spatial distribution of LC at sub-pixel resolution. Both the algorithms are applied on many simulated and actual satellite images for validation. The accuracy on the simulated images is ~100%, while IRS LISS-III and MODIS data show accuracy of 76.6% and 73.02% respectively. This demonstrates the relevance of these techniques for applications such as urban-nonurban, forest-nonforest classification studies etc.
Resumo:
In general the objective of accurately encoding the input data and the objective of extracting good features to facilitate classification are not consistent with each other. As a result, good encoding methods may not be effective mechanisms for classification. In this paper, an earlier proposed unsupervised feature extraction mechanism for pattern classification has been extended to obtain an invertible map. The method of bimodal projection-based features was inspired by the general class of methods called projection pursuit. The principle of projection pursuit concentrates on projections that discriminate between clusters and not faithful representations. The basic feature map obtained by the method of bimodal projections has been extended to overcome this. The extended feature map is an embedding of the input space in the feature space. As a result, the inverse map exists and hence the representation of the input space in the feature space is exact. This map can be naturally expressed as a feedforward neural network.
Resumo:
Structural alignments are the most widely used tools for comparing proteins with low sequence similarity. The main contribution of this paper is to derive various kernels on proteins from structural alignments, which do not use sequence information. Central to the kernels is a novel alignment algorithm which matches substructures of fixed size using spectral graph matching techniques. We derive positive semi-definite kernels which capture the notion of similarity between substructures. Using these as base more sophisticated kernels on protein structures are proposed. To empirically evaluate the kernels we used a 40% sequence non-redundant structures from 15 different SCOP superfamilies. The kernels when used with SVMs show competitive performance with CE, a state of the art structure comparison program.
Resumo:
This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.
Resumo:
The covalent linkage between the side-chain and the backbone nitrogen atom of proline leads to the formation of the five-membered pyrrolidine ring and hence restriction of the backbone torsional angle phi to values of -60 degrees +/- 30 degrees for the L-proline. Diproline segments constitute a chain fragment with considerably reduced conformational choices. In the current study, the conformational states for the diproline segment ((L)Pro-(L)Pro) found in proteins has been investigated with an emphasis on the cis and trans states for the Pro-Pro peptide bond. The occurrence of diproline segments in turns and other secondary structures has been studied and compared to that of Xaa-Pro-Yaa segments in proteins which gives us a better understanding on the restriction imposed on other residues by the diproline segment and the single proline residue. The study indicates that P(II)-P(II) and P(II)-alpha are the most favorable conformational states for the diproline segment. The analysis on Xaa-Pro-Yaa sequences reveals that the XaaPro peptide bond exists preferably as the trans conformer rather than the cis conformer. The present study may lead to a better understanding of the behavior of proline occurring in diproline segments which can facilitate various designed diproline-based synthetic templates for biological and structural studies. (C) 2011 Wiley Periodicals, Inc. Biopolymers 97: 54-64, 2012.
Resumo:
This paper investigates a new Glowworm Swarm Optimization (GSO) clustering algorithm for hierarchical splitting and merging of automatic multi-spectral satellite image classification (land cover mapping problem). Amongst the multiple benefits and uses of remote sensing, one of the most important has been its use in solving the problem of land cover mapping. Image classification forms the core of the solution to the land cover mapping problem. No single classifier can prove to classify all the basic land cover classes of an urban region in a satisfactory manner. In unsupervised classification methods, the automatic generation of clusters to classify a huge database is not exploited to their full potential. The proposed methodology searches for the best possible number of clusters and its center using Glowworm Swarm Optimization (GSO). Using these clusters, we classify by merging based on parametric method (k-means technique). The performance of the proposed unsupervised classification technique is evaluated for Landsat 7 thematic mapper image. Results are evaluated in terms of the classification efficiency - individual, average and overall.
Resumo:
A technique is proposed for classifying respiratory volume waveforms(RVW) into normal and abnormal categories of respiratory pathways. The proposed method transforms the temporal sequence into frequency domain by using an orthogonal transform, namely discrete cosine transform (DCT) and the transformed signal is pole-zero modelled. A Bayes classifier using model pole angles as the feature vector performed satisfactorily when a limited number of RVWs recorded under deep and rapid (DR) manoeuvre are classified.
Resumo:
We present a fractal coding method to recognize online handwritten Tamil characters and propose a novel technique to increase the efficiency in terms of time while coding and decoding. This technique exploits the redundancy in data, thereby achieving better compression and usage of lesser memory. It also reduces the encoding time and causes little distortion during reconstruction. Experiments have been conducted to use these fractal codes to classify the online handwritten Tamil characters from the IWFHR 2006 competition dataset. In one approach, we use fractal coding and decoding process. A recognition accuracy of 90% has been achieved by using DTW for distortion evaluation during classification and encoding processes as compared to 78% using nearest neighbor classifier. In other experiments, we use the fractal code, fractal dimensions and features derived from fractal codes as features in separate classifiers. While the fractal code is successful as a feature, the other two features are not able to capture the wide within-class variations.
Resumo:
Earthquakes cause massive road damage which in turn causes adverse effects on the society. Previous studies have quantified the damage caused to residential and commercial buildings; however, not many studies have been conducted to quantify road damage caused by earthquakes. In this study, an attempt has been made to propose a new scale to classify and quantify the road damage due to earthquakes based on the data collected from major earthquakes in the past. The proposed classification for road damage due to earthquake is called as road damage scale (RDS). Earthquake details such as magnitude, distance of road damage from the epicenter, focal depth, and photographs of damaged roads have been collected from various sources with reported modified Mercalli intensity (MMI). The widely used MMI scale is found to be inadequate to clearly define the road damage. The proposed RDS is applied to various reported road damage and reclassified as per RDS. The correlation between RDS and earthquake parameters of magnitude, epicenter distance, hypocenter distance, and combination of magnitude with epicenter and hypocenter distance has been studied using available data. It is observed that the proposed RDS correlates well with the available earthquake data when compared with the MMI scale. Among several correlations, correlation between RDS and combination of magnitude and epicenter distance is appropriate. Summary of these correlations, their limitations, and the applicability of the proposed scale to forecast road damages and to carry out vulnerability analysis in urban areas is presented in the paper.