24 resultados para Classification and Regression Trees

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ninety-two strong-motion earthquake records from the California region, U.S.A., have been statistically studied using principal component analysis in terms of twelve important standardized strong-motion characteristics. The first two principal components account for about 57 per cent of the total variance. Based on these two components the earthquake records are classified into nine groups in a two-dimensional principal component plane. Also a unidimensional engineering rating scale is proposed. The procedure can be used as an objective approach for classifying and rating future earthquakes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the site classification of Bangalore Mahanagar Palike (BMP) area using geophysical data and the evaluation of spectral acceleration at ground level using probabilistic approach. Site classification has been carried out using experimental data from the shallow geophysical method of Multichannel Analysis of Surface wave (MASW). One-dimensional (1-D) MASW survey has been carried out at 58 locations and respective velocity profiles are obtained. The average shear wave velocity for 30 m depth (Vs(30)) has been calculated and is used for the site classification of the BMP area as per NEHRP (National Earthquake Hazards Reduction Program). Based on the Vs(30) values major part of the BMP area can be classified as ``site class D'', and ``site class C'. A smaller portion of the study area, in and around Lalbagh Park, is classified as ``site class B''. Further, probabilistic seismic hazard analysis has been carried out to map the seismic hazard in terms spectral acceleration (S-a) at rock and the ground level considering the site classes and six seismogenic sources identified. The mean annual rate of exceedance and cumulative probability hazard curve for S. have been generated. The quantified hazard values in terms of spectral acceleration for short period and long period are mapped for rock, site class C and D with 10% probability of exceedance in 50 years on a grid size of 0.5 km. In addition to this, the Uniform Hazard Response Spectrum (UHRS) at surface level has been developed for the 5% damping and 10% probability of exceedance in 50 years for rock, site class C and D These spectral acceleration and uniform hazard spectrums can be used to assess the design force for important structures and also to develop the design spectrum.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Elephants use vocalizations for both long and short distance communication. Whereas the acoustic repertoire of the African elephant (Loxodonta africana) has been extensively studied in its savannah habitat, very little is known about the structure and social context of the vocalizations of the Asian elephant (Elephas maximus), which is mostly found in forests. In this study, the vocal repertoire of wild Asian elephants in southern India was examined. The calls could be classified into four mutually exclusive categories, namely, trumpets, chirps, roars, and rumbles, based on quantitative analyses of their spectral and temporal features. One of the call types, the rumble, exhibited high structural diversity, particularly in the direction and extent of frequency modulation of calls. Juveniles produced three of the four call types, including trumpets, roars, and rumbles, in the context of play and distress. Adults produced trumpets and roars in the context of disturbance, aggression, and play. Chirps were typically produced in situations of confusion and alarm. Rumbles were used for contact calling within and among herds, by matriarchs to assemble the herd, in close-range social interactions, and during disturbance and aggression. Spectral and temporal features of the four call types were similar between Asian and African elephants.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our ability to infer the protein quaternary structure automatically from atom and lattice information is inadequate, especially for weak complexes, and heteromeric quaternary structures. Several approaches exist, but they have limited performance. Here, we present a new scheme to infer protein quaternary structure from lattice and protein information, with all-around coverage for strong, weak and very weak affinity homomeric and heteromeric complexes. The scheme combines naive Bayes classifier and point group symmetry under Boolean framework to detect quaternary structures in crystal lattice. It consistently produces >= 90% coverage across diverse benchmarking data sets, including a notably superior 95% coverage for recognition heteromeric complexes, compared with 53% on the same data set by current state-of-the-art method. The detailed study of a limited number of prediction-failed cases offers interesting insights into the intriguing nature of protein contacts in lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Proving the unsatisfiability of propositional Boolean formulas has applications in a wide range of fields. Minimal Unsatisfiable Sets (MUS) are signatures of the property of unsatisfiability in formulas and our understanding of these signatures can be very helpful in answering various algorithmic and structural questions relating to unsatisfiability. In this paper, we explore some combinatorial properties of MUS and use them to devise a classification scheme for MUS. We also derive bounds on the sizes of MUS in Horn, 2-SAT and 3-SAT formulas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses an approach for river mapping and flood evaluation based on multi-temporal time series analysis of satellite images utilizing pixel spectral information for image classification and region-based segmentation for extracting water-covered regions. Analysis of MODIS satellite images is applied in three stages: before flood, during flood and after flood. Water regions are extracted from the MODIS images using image classification (based on spectral information) and image segmentation (based on spatial information). Multi-temporal MODIS images from ``normal'' (non-flood) and flood time-periods are processed in two steps. In the first step, image classifiers such as Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) separate the image pixels into water and non-water groups based on their spectral features. The classified image is then segmented using spatial features of the water pixels to remove the misclassified water. From the results obtained, we evaluate the performance of the method and conclude that the use of image classification (SVM and ANN) and region-based image segmentation is an accurate and reliable approach for the extraction of water-covered regions. (c) 2012 COSPAR. Published by Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Subsurface lithology and seismic site classification of Lucknow urban center located in the central part of the Indo-Gangetic Basin (IGB) are presented based on detailed shallow subsurface investigations and borehole analysis. These are done by carrying out 47 seismic surface wave tests using multichannel analysis of surface waves (MASW) and 23 boreholes drilled up to 30 m with standard penetration test (SPT) N values. Subsurface lithology profiles drawn from the drilled boreholes show low- to medium-compressibility clay and silty to poorly graded sand available till depth of 30 m. In addition, deeper boreholes (depth >150 m) were collected from the Lucknow Jal Nigam (Water Corporation), Government of Uttar Pradesh to understand deeper subsoil stratification. Deeper boreholes in this paper refer to those with depth over 150 m. These reports show the presence of clay mix with sand and Kankar at some locations till a depth of 150 m, followed by layers of sand, clay, and Kankar up to 400 m. Based on the available details, shallow and deeper cross-sections through Lucknow are presented. Shear wave velocity (SWV) and N-SPT values were measured for the study area using MASW and SPT testing. Measured SWV and N-SPT values for the same locations were found to be comparable. These values were used to estimate 30 m average values of N-SPT (N-30) and SWV (V-s(30)) for seismic site classification of the study area as per the National Earthquake Hazards Reduction Program (NEHRP) soil classification system. Based on the NEHRP classification, the entire study area is classified into site class C and D based on V-s(30) and site class D and E based on N-30. The issue of larger amplification during future seismic events is highlighted for a major part of the study area which comes under site class D and E. Also, the mismatch of site classes based on N-30 and V-s(30) raises the question of the suitability of the NEHRP classification system for the study region. Further, 17 sets of SPT and SWV data are used to develop a correlation between N-SPT and SWV. This represents a first attempt of seismic site classification and correlation between N-SPT and SWV in the Indo-Gangetic Basin.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present work presents the results of experimental investigation of semi-solid rheocasting of A356 Al alloy using a cooling slope. The experiments have been carried out following Taguchi method of parameter design (orthogonal array of L-9 experiments). Four key process variables (slope angle, pouring temperature, wall temperature, and length of travel of the melt) at three different levels have been considered for the present experimentation. Regression analysis and analysis of variance (ANOVA) has also been performed to develop a mathematical model for degree of sphericity evolution of primary alpha-Al phase and to find the significance and percentage contribution of each process variable towards the final outcome of degree of sphericity, respectively. The best processing condition has been identified for optimum degree of sphericity (0.83) as A(3), B-3, C-2, D-1 i.e., slope angle of 60 degrees, pouring temperature of 650 degrees C, wall temperature 60 degrees C, and 500 mm length of travel of the melt, based on mean response and signal to noise ratio (SNR). ANOVA results shows that the length of travel has maximum impact on degree of sphericity evolution. The predicted sphericity obtained from the developed regression model and the values obtained experimentally are found to be in good agreement with each other. The sphericity values obtained from confirmation experiment, performed at 95% confidence level, ensures that the optimum result is correct and also the confirmation experiment values are within permissible limits. (c) 2014 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clock synchronization in wireless sensor networks (WSNs) assures that sensor nodes have the same reference clock time. This is necessary not only for various WSN applications but also for many system level protocols for WSNs such as MAC protocols, and protocols for sleep scheduling of sensor nodes. Clock value of a node at a particular instant of time depends on its initial value and the frequency of the crystal oscillator used in the sensor node. The frequency of the crystal oscillator varies from node to node, and may also change over time depending upon many factors like temperature, humidity, etc. As a result, clock values of different sensor nodes diverge from each other and also from the real time clock, and hence, there is a requirement for clock synchronization in WSNs. Consequently, many clock synchronization protocols for WSNs have been proposed in the recent past. These protocols differ from each other considerably, and so, there is a need to understand them using a common platform. Towards this goal, this survey paper categorizes the features of clock synchronization protocols for WSNs into three types, viz, structural features, technical features, and global objective features. Each of these categories has different options to further segregate the features for better understanding. The features of clock synchronization protocols that have been used in this survey include all the features which have been used in existing surveys as well as new features such as how the clock value is propagated, when the clock value is propagated, and when the physical clock is updated, which are required for better understanding of the clock synchronization protocols in WSNs in a systematic way. This paper also gives a brief description of a few basic clock synchronization protocols for WSNs, and shows how these protocols fit into the above classification criteria. In addition, the recent clock synchronization protocols for WSNs, which are based on the above basic clock synchronization protocols, are also given alongside the corresponding basic clock synchronization protocols. Indeed, the proposed model for characterizing the clock synchronization protocols in WSNs can be used not only for analyzing the existing protocols but also for designing new clock synchronization protocols. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gaussian processes (GPs) are promising Bayesian methods for classification and regression problems. Design of a GP classifier and making predictions using it is, however, computationally demanding, especially when the training set size is large. Sparse GP classifiers are known to overcome this limitation. In this letter, we propose and study a validation-based method for sparse GP classifier design. The proposed method uses a negative log predictive (NLP) loss measure, which is easy to compute for GP models. We use this measure for both basis vector selection and hyperparameter adaptation. The experimental results on several real-world benchmark data sets show better orcomparable generalization performance over existing methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach for identifying the faulted line section and fault location on transmission systems using support vector machines (SVMs) for diagnosis/post-fault analysis purpose. Power system disturbances are often caused by faults on transmission lines. When fault occurs on a transmission system, the protective relay detects the fault and initiates the tripping operation, which isolates the affected part from the rest of the power system. Based on the fault section identified, rapid and corrective restoration procedures can thus be taken to minimize the power interruption and limit the impact of outage on the system. The approach is particularly important for post-fault diagnosis of any mal-operation of relays following a disturbance in the neighboring line connected to the same substation. This may help in improving the fault monitoring/diagnosis process, thus assuring secure operation of the power systems. In this paper we compare SVMs with radial basis function neural networks (RBFNN) in data sets corresponding to different faults on a transmission system. Classification and regression accuracy is reported for both strategies. Studies on a practical 24-Bus equivalent EHV transmission system of the Indian Southern region is presented for indicating the improved generalization with the large margin classifiers in enhancing the efficacy of the chosen model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach for identifying the faulted line section and fault location on transmission systems using support vector machines (SVMs) for diagnosis/post-fault analysis purpose. Power system disturbances are often caused by faults on transmission lines. When fault occurs on a transmission system, the protective relay detects the fault and initiates the tripping operation, which isolates the affected part from the rest of the power system. Based on the fault section identified, rapid and corrective restoration procedures can thus be taken to minimize the power interruption and limit the impact of outage on the system. The approach is particularly important for post-fault diagnosis of any mal-operation of relays following a disturbance in the neighboring line connected to the same substation. This may help in improving the fault monitoring/diagnosis process, thus assuring secure operation of the power systems. In this paper we compare SVMs with radial basis function neural networks (RBFNN) in data sets corresponding to different faults on a transmission system. Classification and regression accuracy is reported for both strategies. Studies on a practical 24-Bus equivalent EHV transmission system of the Indian Southern region is presented for indicating the improved generalization with the large margin classifiers in enhancing the efficacy of the chosen model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The hot dog fold has been found in more than sixty proteins since the first report of its existence about a decade ago. The fold appears to have a strong association with fatty acid biosynthesis, its regulation and metabolism, as the proteins with this fold are predominantly coenzyme A-binding enzymes with a variety of substrates located at their active sites. Results: We have analyzed the structural features and sequences of proteins having the hot dog fold. This study reveals that though the basic architecture of the fold is well conserved in these proteins, significant differences exist in their sequence, nature of substrate and oligomerization. Segments with certain conserved sequence motifs seem to play crucial structural and functional roles in various classes of these proteins. Conclusion: The analysis led to predictions regarding the functional classification and identification of possible catalytic residues of a number of hot dog fold-containing hypothetical proteins whose structures were determined in high throughput structural genomics projects.