96 resultados para Multiclass classification

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maximum entropy approach to classification is very well studied in applied statistics and machine learning and almost all the methods that exists in literature are discriminative in nature. In this paper, we introduce a maximum entropy classification method with feature selection for large dimensional data such as text datasets that is generative in nature. To tackle the curse of dimensionality of large data sets, we employ conditional independence assumption (Naive Bayes) and we perform feature selection simultaneously, by enforcing a `maximum discrimination' between estimated class conditional densities. For two class problems, in the proposed method, we use Jeffreys (J) divergence to discriminate the class conditional densities. To extend our method to the multi-class case, we propose a completely new approach by considering a multi-distribution divergence: we replace Jeffreys divergence by Jensen-Shannon (JS) divergence to discriminate conditional densities of multiple classes. In order to reduce computational complexity, we employ a modified Jensen-Shannon divergence (JS(GM)), based on AM-GM inequality. We show that the resulting divergence is a natural generalization of Jeffreys divergence to a multiple distributions case. As far as the theoretical justifications are concerned we show that when one intends to select the best features in a generative maximum entropy approach, maximum discrimination using J-divergence emerges naturally in binary classification. Performance and comparative study of the proposed algorithms have been demonstrated on large dimensional text and gene expression datasets that show our methods scale up very well with large dimensional datasets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper aims at evaluating the methods of multiclass support vector machines (SVMs) for effective use in distance relay coordination. Also, it describes a strategy of supportive systems to aid the conventional protection philosophy in combating situations where protection systems have maloperated and/or information is missing and provide selective and secure coordinations. SVMs have considerable potential as zone classifiers of distance relay coordination. This typically requires a multiclass SVM classifier to effectively analyze/build the underlying concept between reach of different zones and the apparent impedance trajectory during fault. Several methods have been proposed for multiclass classification where typically several binary SVM classifiers are combined together. Some authors have extended binary SVM classification to one-step single optimization operation considering all classes at once. In this paper, one-step multiclass classification, one-against-all, and one-against-one multiclass methods are compared for their performance with respect to accuracy, number of iterations, number of support vectors, training, and testing time. The performance analysis of these three methods is presented on three data sets belonging to training and testing patterns of three supportive systems for a region and part of a network, which is an equivalent 526-bus system of the practical Indian Western grid.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We study consistency properties of surrogate loss functions for general multiclass classification problems, defined by a general loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be classification calibrated with respect to a loss matrix in this setting. We then introduce the notion of \emph{classification calibration dimension} of a multiclass loss matrix, which measures the smallest `size' of a prediction space for which it is possible to design a convex surrogate that is classification calibrated with respect to the loss matrix. We derive both upper and lower bounds on this quantity, and use these results to analyze various loss matrices. In particular, as one application, we provide a different route from the recent result of Duchi et al.\ (2010) for analyzing the difficulty of designing `low-dimensional' convex surrogates that are consistent with respect to pairwise subset ranking losses. We anticipate the classification calibration dimension may prove to be a useful tool in the study and design of surrogate losses for general multiclass learning problems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Even though several techniques have been proposed in the literature for achieving multiclass classification using Support Vector Machine(SVM), the scalability aspect of these approaches to handle large data sets still needs much of exploration. Core Vector Machine(CVM) is a technique for scaling up a two class SVM to handle large data sets. In this paper we propose a Multiclass Core Vector Machine(MCVM). Here we formulate the multiclass SVM problem as a Quadratic Programming(QP) problem defining an SVM with vector valued output. This QP problem is then solved using the CVM technique to achieve scalability to handle large data sets. Experiments done with several large synthetic and real world data sets show that the proposed MCVM technique gives good generalization performance as that of SVM at a much lesser computational expense. Further, it is observed that MCVM scales well with the size of the data set.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Crop type classification using remote sensing data plays a vital role in planning cultivation activities and for optimal usage of the available fertile land. Thus a reliable and precise classification of agricultural crops can help improve agricultural productivity. Hence in this paper a gene expression programming based fuzzy logic approach for multiclass crop classification using Multispectral satellite image is proposed. The purpose of this work is to utilize the optimization capabilities of GEP for tuning the fuzzy membership functions. The capabilities of GEP as a classifier is also studied. The proposed method is compared to Bayesian and Maximum likelihood classifier in terms of performance evaluation. From the results we can conclude that the proposed method is effective for classification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Remote sensing provides a lucid and effective means for crop coverage identification. Crop coverage identification is a very important technique, as it provides vital information on the type and extent of crop cultivated in a particular area. This information has immense potential in the planning for further cultivation activities and for optimal usage of the available fertile land. As the frontiers of space technology advance, the knowledge derived from the satellite data has also grown in sophistication. Further, image classification forms the core of the solution to the crop coverage identification problem. No single classifier can prove to satisfactorily classify all the basic crop cover mapping problems of a cultivated region. We present in this paper the experimental results of multiple classification techniques for the problem of crop cover mapping of a cultivated region. A detailed comparison of the algorithms inspired by social behaviour of insects and conventional statistical method for crop classification is presented in this paper. These include the Maximum Likelihood Classifier (MLC), Particle Swarm Optimisation (PSO) and Ant Colony Optimisation (ACO) techniques. The high resolution satellite image has been used for the experiments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A complete list of homogeneous operators in the Cowen-Douglas class B-n(D) is given. This classification is obtained from an explicit realization of all the homogeneous Hermitian holomorphic vector bundles on the unit disc under the action of the universal covering group of the bi-holomorphic automorphism group of the unit disc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ninety-two strong-motion earthquake records from the California region, U.S.A., have been statistically studied using principal component analysis in terms of twelve important standardized strong-motion characteristics. The first two principal components account for about 57 per cent of the total variance. Based on these two components the earthquake records are classified into nine groups in a two-dimensional principal component plane. Also a unidimensional engineering rating scale is proposed. The procedure can be used as an objective approach for classifying and rating future earthquakes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the site classification of Bangalore Mahanagar Palike (BMP) area using geophysical data and the evaluation of spectral acceleration at ground level using probabilistic approach. Site classification has been carried out using experimental data from the shallow geophysical method of Multichannel Analysis of Surface wave (MASW). One-dimensional (1-D) MASW survey has been carried out at 58 locations and respective velocity profiles are obtained. The average shear wave velocity for 30 m depth (Vs(30)) has been calculated and is used for the site classification of the BMP area as per NEHRP (National Earthquake Hazards Reduction Program). Based on the Vs(30) values major part of the BMP area can be classified as ``site class D'', and ``site class C'. A smaller portion of the study area, in and around Lalbagh Park, is classified as ``site class B''. Further, probabilistic seismic hazard analysis has been carried out to map the seismic hazard in terms spectral acceleration (S-a) at rock and the ground level considering the site classes and six seismogenic sources identified. The mean annual rate of exceedance and cumulative probability hazard curve for S. have been generated. The quantified hazard values in terms of spectral acceleration for short period and long period are mapped for rock, site class C and D with 10% probability of exceedance in 50 years on a grid size of 0.5 km. In addition to this, the Uniform Hazard Response Spectrum (UHRS) at surface level has been developed for the 5% damping and 10% probability of exceedance in 50 years for rock, site class C and D These spectral acceleration and uniform hazard spectrums can be used to assess the design force for important structures and also to develop the design spectrum.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a Chance-constraint Programming approach for constructing maximum-margin classifiers which are robust to interval-valued uncertainty in training examples. The methodology ensures that uncertain examples are classified correctly with high probability by employing chance-constraints. The main contribution of the paper is to pose the resultant optimization problem as a Second Order Cone Program by using large deviation inequalities, due to Bernstein. Apart from support and mean of the uncertain examples these Bernstein based relaxations make no further assumptions on the underlying uncertainty. Classifiers built using the proposed approach are less conservative, yield higher margins and hence are expected to generalize better than existing methods. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle interval-valued uncertainty than state-of-the-art.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for robust voiced/unvoiced segment detection in noisy speech, based on local polynomial regression. The local polynomial model is well-suited for voiced segments in speech. The unvoiced segments are noise-like and do not exhibit any smooth structure. This property of smoothness is used for devising a new metric called the variance ratio metric, which, after thresholding, indicates the voiced/unvoiced boundaries with 75% accuracy for 0dB global signal-to-noise ratio (SNR). A novelty of our algorithm is that it processes the signal continuously, sample-by-sample rather than frame-by-frame. Simulation results on TIMIT speech database (downsampled to 8kHz) for various SNRs are presented to illustrate the performance of the new algorithm. Results indicate that the algorithm is robust even in high noise levels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents two algorithms for smoothing and feature extraction for fingerprint classification. Deutsch's(2) Thinning algorithm (rectangular array) is used for thinning the digitized fingerprint (binary version). A simple algorithm is also suggested for classifying the fingerprints. Experimental results obtained using such algorithms are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Protein Kinase-Like Non-kinases (PKLNKs), which are closely related to protein kinases, lack the crucial catalytic aspartate in the catalytic loop, and hence cannot function as protein kinase, have been analysed. Using various sensitive sequence analysis methods, we have recognized 82 PKLNKs from four higher eukaryotic organisms, namely, Homo sapiens, Mus musculus, Rattus norvegicus, and Drosophila melanogaster. On the basis of their domain combination and function, PKLNKs have been classified mainly into four categories: (1) Ligand binding PKLNKs, (2) PKLNKs with extracellular protein-protein interaction domain, (3) PKLNKs involved in dimerization, and (4) PKLNKs with cytoplasmic protein-protein interaction module. While members of the first two classes of PKLNKs have transmembrane domain tethered to the PKLNK domain, members of the other two classes of PKLNKs are cytoplasmic in nature. The current classification scheme hopes to provide a convenient framework to classify the PKLNKs from other eukaryotes which would be helpful in deciphering their roles in cellular processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Elephants use vocalizations for both long and short distance communication. Whereas the acoustic repertoire of the African elephant (Loxodonta africana) has been extensively studied in its savannah habitat, very little is known about the structure and social context of the vocalizations of the Asian elephant (Elephas maximus), which is mostly found in forests. In this study, the vocal repertoire of wild Asian elephants in southern India was examined. The calls could be classified into four mutually exclusive categories, namely, trumpets, chirps, roars, and rumbles, based on quantitative analyses of their spectral and temporal features. One of the call types, the rumble, exhibited high structural diversity, particularly in the direction and extent of frequency modulation of calls. Juveniles produced three of the four call types, including trumpets, roars, and rumbles, in the context of play and distress. Adults produced trumpets and roars in the context of disturbance, aggression, and play. Chirps were typically produced in situations of confusion and alarm. Rumbles were used for contact calling within and among herds, by matriarchs to assemble the herd, in close-range social interactions, and during disturbance and aggression. Spectral and temporal features of the four call types were similar between Asian and African elephants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The i + 5-->i hydrogen bonded turn conformation (pi-turn) with the fifth residue adopting alpha L conformation is frequently found at the C-terminus of helices in proteins and hence is speculated to be a "helix termination signal." An analysis of the occurrence of i + 5-->i hydrogen bonded turn conformation at any general position in proteins (not specifically at the helix C-terminus), using coordinates of 228 protein crystal structures determined by X-ray crystallography to better than 2.5 A resolution is reported in this paper. Of 486 detected pi-turn conformations, 367 have the (i + 4)th residue in alpha L conformation, generally occurring at the C-terminus of alpha-helices, consistent with previous observations. However, a significant number (111) of pi-turn conformations occur with (i + 4)th residue in alpha R conformation also, generally occurring in alpha-helices as distortions either at the terminii or at the middle, a novel finding. These two sets of pi-turn conformations are referred to by the names pi alpha L and pi alpha R-turns, respectively, depending upon whether the (i + 4)th residue adopts alpha L or alpha R conformations. Four pi-turns, named pi alpha L'-turns, were noticed to be mirror images of pi alpha L-turns, and four more pi-turns, which have the (i + 4)th residue in beta conformation and denoted as pi beta-turns, occur as a part of hairpin bend connecting twisted beta-strands. Consecutive pi-turns occur, but only with pi alpha R-turns. The preference for amino acid residues is different in pi alpha L and pi alpha R-turns. However, both show a preference for Pro after the C-termini. Hydrophilic residues are preferred at positions i + 1, i + 2, and i + 3 of pi alpha L-turns, whereas positions i and i + 5 prefer hydrophobic residues. Residue i + 4 in pi alpha L-turns is mainly Gly and less often Asn. Although pi alpha R-turns generally occur as distortions in helices, their amino acid preference is different from that of helices. Poor helix formers, such as His, Tyr, and Asn, also were found to be preferred for pi alpha R-turns, whereas good helix former Ala is not preferred. pi-Turns in peptides provide a picture of the pi-turn at atomic resolution. Only nine peptide-based pi-turns are reported so far, and all of them belong to pi alpha L-turn type with an achiral residue in position i + 4. The results are of importance for structure prediction, modeling, and de novo design of proteins.