41 resultados para Supervised and Unsupervised Classification
Resumo:
Heat shock protein information resource (HSPIR) is a concerted database of six major heat shock proteins (HSPs), namely, Hsp70, Hsp40, Hsp60, Hsp90, Hsp100 and small HSP. The HSPs are essential for the survival of all living organisms, as they protect the conformations of proteins on exposure to various stress conditions. They are a highly conserved group of proteins involved in diverse physiological functions, including de novo folding, disaggregation and protein trafficking. Moreover, their critical role in the control of disease progression made them a prime target of research. Presently, limited information is available on HSPs in reference to their identification and structural classification across genera. To that extent, HSPIR provides manually curated information on sequence, structure, classification, ontology, domain organization, localization and possible biological functions extracted from UniProt, GenBank, Protein Data Bank and the literature. The database offers interactive search with incorporated tools, which enhances the analysis. HSPIR is a reliable resource for researchers exploring structure, function and evolution of HSPs.
Resumo:
This paper discusses a novel high-speed approach for human action recognition in H. 264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of our work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can effect in reduced hardware utilization and fast recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust in outdoor as well as indoor testing scenarios. We have tested our method on two benchmark action datasets and achieved more than 85% accuracy. The proposed algorithm classifies actions with speed (>2000 fps) approximately 100 times more than existing state-of-the-art pixel-domain algorithms.
Resumo:
This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of the proposed work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can result in reduced hardware utilization and faster recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust to outdoor as well as indoor testing scenarios. We have evaluated the performance of the proposed method on two benchmark action datasets and achieved more than 85 % accuracy. The proposed algorithm classifies actions with speed (> 2,000 fps) approximately 100 times faster than existing state-of-the-art pixel-domain algorithms.
Resumo:
Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.
Resumo:
In general the objective of accurately encoding the input data and the objective of extracting good features to facilitate classification are not consistent with each other. As a result, good encoding methods may not be effective mechanisms for classification. In this paper, an earlier proposed unsupervised feature extraction mechanism for pattern classification has been extended to obtain an invertible map. The method of bimodal projection-based features was inspired by the general class of methods called projection pursuit. The principle of projection pursuit concentrates on projections that discriminate between clusters and not faithful representations. The basic feature map obtained by the method of bimodal projections has been extended to overcome this. The extended feature map is an embedding of the input space in the feature space. As a result, the inverse map exists and hence the representation of the input space in the feature space is exact. This map can be naturally expressed as a feedforward neural network.
Resumo:
Acoustic feature based speech (syllable) rate estimation and syllable nuclei detection are important problems in automatic speech recognition (ASR), computer assisted language learning (CALL) and fluency analysis. A typical solution for both the problems consists of two stages. The first stage involves computing a short-time feature contour such that most of the peaks of the contour correspond to the syllabic nuclei. In the second stage, the peaks corresponding to the syllable nuclei are detected. In this work, instead of the peak detection, we perform a mode-shape classification, which is formulated as a supervised binary classification problem - mode-shapes representing the syllabic nuclei as one class and remaining as the other. We use the temporal correlation and selected sub-band correlation (TCSSBC) feature contour and the mode-shapes in the TCSSBC feature contour are converted into a set of feature vectors using an interpolation technique. A support vector machine classifier is used for the classification. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora in a five-fold cross validation setup. The average correlation coefficients for the syllable rate estimation turn out to be 0.6761, 0.6928 and 0.3604 for three corpora respectively, which outperform those obtained by the best of the existing peak detection techniques. Similarly, the average F-scores (syllable level) for the syllable nuclei detection are 0.8917, 0.8200 and 0.7637 for three corpora respectively. (C) 2016 Elsevier B.V. All rights reserved.
Resumo:
Ninety-two strong-motion earthquake records from the California region, U.S.A., have been statistically studied using principal component analysis in terms of twelve important standardized strong-motion characteristics. The first two principal components account for about 57 per cent of the total variance. Based on these two components the earthquake records are classified into nine groups in a two-dimensional principal component plane. Also a unidimensional engineering rating scale is proposed. The procedure can be used as an objective approach for classifying and rating future earthquakes.
Resumo:
This paper presents the site classification of Bangalore Mahanagar Palike (BMP) area using geophysical data and the evaluation of spectral acceleration at ground level using probabilistic approach. Site classification has been carried out using experimental data from the shallow geophysical method of Multichannel Analysis of Surface wave (MASW). One-dimensional (1-D) MASW survey has been carried out at 58 locations and respective velocity profiles are obtained. The average shear wave velocity for 30 m depth (Vs(30)) has been calculated and is used for the site classification of the BMP area as per NEHRP (National Earthquake Hazards Reduction Program). Based on the Vs(30) values major part of the BMP area can be classified as ``site class D'', and ``site class C'. A smaller portion of the study area, in and around Lalbagh Park, is classified as ``site class B''. Further, probabilistic seismic hazard analysis has been carried out to map the seismic hazard in terms spectral acceleration (S-a) at rock and the ground level considering the site classes and six seismogenic sources identified. The mean annual rate of exceedance and cumulative probability hazard curve for S. have been generated. The quantified hazard values in terms of spectral acceleration for short period and long period are mapped for rock, site class C and D with 10% probability of exceedance in 50 years on a grid size of 0.5 km. In addition to this, the Uniform Hazard Response Spectrum (UHRS) at surface level has been developed for the 5% damping and 10% probability of exceedance in 50 years for rock, site class C and D These spectral acceleration and uniform hazard spectrums can be used to assess the design force for important structures and also to develop the design spectrum.
Resumo:
We propose a novel technique for robust voiced/unvoiced segment detection in noisy speech, based on local polynomial regression. The local polynomial model is well-suited for voiced segments in speech. The unvoiced segments are noise-like and do not exhibit any smooth structure. This property of smoothness is used for devising a new metric called the variance ratio metric, which, after thresholding, indicates the voiced/unvoiced boundaries with 75% accuracy for 0dB global signal-to-noise ratio (SNR). A novelty of our algorithm is that it processes the signal continuously, sample-by-sample rather than frame-by-frame. Simulation results on TIMIT speech database (downsampled to 8kHz) for various SNRs are presented to illustrate the performance of the new algorithm. Results indicate that the algorithm is robust even in high noise levels.
Resumo:
Elephants use vocalizations for both long and short distance communication. Whereas the acoustic repertoire of the African elephant (Loxodonta africana) has been extensively studied in its savannah habitat, very little is known about the structure and social context of the vocalizations of the Asian elephant (Elephas maximus), which is mostly found in forests. In this study, the vocal repertoire of wild Asian elephants in southern India was examined. The calls could be classified into four mutually exclusive categories, namely, trumpets, chirps, roars, and rumbles, based on quantitative analyses of their spectral and temporal features. One of the call types, the rumble, exhibited high structural diversity, particularly in the direction and extent of frequency modulation of calls. Juveniles produced three of the four call types, including trumpets, roars, and rumbles, in the context of play and distress. Adults produced trumpets and roars in the context of disturbance, aggression, and play. Chirps were typically produced in situations of confusion and alarm. Rumbles were used for contact calling within and among herds, by matriarchs to assemble the herd, in close-range social interactions, and during disturbance and aggression. Spectral and temporal features of the four call types were similar between Asian and African elephants.
Resumo:
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.
Resumo:
The present study deals with the application of cluster analysis, Fuzzy Cluster Analysis (FCA) and Kohonen Artificial Neural Networks (KANN) methods for classification of 159 meteorological stations in India into meteorologically homogeneous groups. Eight parameters, namely latitude, longitude, elevation, average temperature, humidity, wind speed, sunshine hours and solar radiation, are considered as the classification criteria for grouping. The optimal number of groups is determined as 14 based on the Davies-Bouldin index approach. It is observed that the FCA approach performed better than the other two methodologies for the present study.
Resumo:
In this paper. we propose a novel method using wavelets as input to neural network self-organizing maps and support vector machine for classification of magnetic resonance (MR) images of the human brain. The proposed method classifies MR brain images as either normal or abnormal. We have tested the proposed approach using a dataset of 52 MR brain images. Good classification percentage of more than 94% was achieved using the neural network self-organizing maps (SOM) and 98% front support vector machine. We observed that the classification rate is high for a Support vector machine classifier compared to self-organizing map-based approach.
Resumo:
Background: Protein phosphorylation is a generic way to regulate signal transduction pathways in all kingdoms of life. In many organisms, it is achieved by the large family of Ser/Thr/Tyr protein kinases which are traditionally classified into groups and subfamilies on the basis of the amino acid sequence of their catalytic domains. Many protein kinases are multidomain in nature but the diversity of the accessory domains and their organization are usually not taken into account while classifying kinases into groups or subfamilies. Methodology: Here, we present an approach which considers amino acid sequences of complete gene products, in order to suggest refinements in sets of pre-classified sequences. The strategy is based on alignment-free similarity scores and iterative Area Under the Curve (AUC) computation. Similarity scores are computed by detecting common patterns between two sequences and scoring them using a substitution matrix, with a consistent normalization scheme. This allows us to handle full-length sequences, and implicitly takes into account domain diversity and domain shuffling. We quantitatively validate our approach on a subset of 212 human protein kinases. We then employ it on the complete repertoire of human protein kinases and suggest few qualitative refinements in the subfamily assignment stored in the KinG database, which is based on catalytic domains only. Based on our new measure, we delineate 37 cases of potential hybrid kinases: sequences for which classical classification based entirely on catalytic domains is inconsistent with the full-length similarity scores computed here, which implicitly consider multi-domain nature and regions outside the catalytic kinase domain. We also provide some examples of hybrid kinases of the protozoan parasite Entamoeba histolytica. Conclusions: The implicit consideration of multi-domain architectures is a valuable inclusion to complement other classification schemes. The proposed algorithm may also be employed to classify other families of enzymes with multidomain architecture.
Resumo:
Equilibrium sediment volume tests are conducted on field soils to classify them based on their degree of expansivity and/or to predict the liquid limit of soils. The present technical paper examines different equilibrium sediment volume tests, critically evaluating each of them. It discusses the settling behavior of fine-grained soils during the soil sediment formation to evolve a rationale for conducting the latest version of equilibrium sediment volume test. Probable limitations of equilibrium sediment volume test and the possible solution to overcome the same have also been indicated.