208 resultados para Functional Classification Trees
Resumo:
The q-Gaussian distribution results from maximizing certain generalizations of Shannon entropy under some constraints. The importance of q-Gaussian distributions stems from the fact that they exhibit power-law behavior, and also generalize Gaussian distributions. In this paper, we propose a Smoothed Functional (SF) scheme for gradient estimation using q-Gaussian distribution, and also propose an algorithm for optimization based on the above scheme. Convergence results of the algorithm are presented. Performance of the proposed algorithm is shown by simulation results on a queuing model.
Resumo:
Glycosyl hydrolase family 1 beta-glucosidases are important enzymes that serve many diverse functions in plants including defense, whereby hydrolyzing the defensive compounds such as hydroxynitrile glucosides. A hydroxynitrile glucoside cleaving beta-glucosidase gene (Llbglu1) was isolated from Leucaena leucocephala, cloned into pET-28a (+) and expressed in E. coli BL21 (DE3) cells. The recombinant enzyme was purified by Ni-NTA affinity chromatography. The optimal temperature and pH for this beta-glucosidase were found to be 45 A degrees C and 4.8, respectively. The purified Llbglu1 enzyme hydrolyzed the synthetic glycosides, pNPGlucoside (pNPGlc) and pNPGalactoside (pNPGal). Also, the enzyme hydrolyzed amygdalin, a hydroxynitrile glycoside and a few of the tested flavonoid and isoflavonoid glucosides. The kinetic parameters K (m) and V (max) were found to be 38.59 mu M and 0.8237 mu M/mg/min for pNPGlc, whereas for pNPGal the values were observed as 1845 mu M and 0.1037 mu M/mg/min. In the present study, a three dimensional (3D) model of the Llbglu1 was built by MODELLER software to find out the substrate binding sites and the quality of the model was examined using the program PROCHEK. Docking studies indicated that conserved active site residues are Glu 199, Glu 413, His 153, Asn 198, Val 270, Asn 340, and Trp 462. Docking of rhodiocyanoside A with the modeled Llbglu1 resulted in a binding with free energy change (Delta G) of -5.52 kcal/mol on which basis rhodiocyanoside A could be considered as a potential substrate.
Resumo:
In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T-2) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly.
Resumo:
We consider the problem of optimal routing in a multi-stage network of queues with constraints on queue lengths. We develop three algorithms for probabilistic routing for this problem using only the total end-to-end delays. These algorithms use the smoothed functional (SF) approach to optimize the routing probabilities. In our model all the queues are assumed to have constraints on the average queue length. We also propose a novel quasi-Newton based SF algorithm. Policies like Join Shortest Queue or Least Work Left work only for unconstrained routing. Besides assuming knowledge of the queue length at all the queues. If the only information available is the expected end-to-end delay as with our case such policies cannot be used. We also give simulation results showing the performance of the SF algorithms for this problem.
Resumo:
In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.
Resumo:
The present approach uses stopwords and the gaps that oc- cur between successive stopwords –formed by contentwords– as features for sentiment classification.
Resumo:
Time series classification deals with the problem of classification of data that is multivariate in nature. This means that one or more of the attributes is in the form of a sequence. The notion of similarity or distance, used in time series data, is significant and affects the accuracy, time, and space complexity of the classification algorithm. There exist numerous similarity measures for time series data, but each of them has its own disadvantages. Instead of relying upon a single similarity measure, our aim is to find the near optimal solution to the classification problem by combining different similarity measures. In this work, we use genetic algorithms to combine the similarity measures so as to get the best performance. The weightage given to different similarity measures evolves over a number of generations so as to get the best combination. We test our approach on a number of benchmark time series datasets and present promising results.
Resumo:
This paper presents a new hierarchical clustering algorithm for crop stage classification using hyperspectral satellite image. Amongst the multiple benefits and uses of remote sensing, one of the important application is to solve the problem of crop stage classification. Modern commercial imaging satellites, owing to their large volume of satellite imagery, offer greater opportunities for automated image analysis. Hence, we propose a unsupervised algorithm namely Hierarchical Artificial Immune System (HAIS) of two steps: splitting the cluster centers and merging them. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The classification results have been compared with K-means and Artificial Immune System algorithms. From the results obtained, we conclude that the proposed hierarchical clustering algorithm is accurate.
Resumo:
Subsurface lithology and seismic site classification of Lucknow urban center located in the central part of the Indo-Gangetic Basin (IGB) are presented based on detailed shallow subsurface investigations and borehole analysis. These are done by carrying out 47 seismic surface wave tests using multichannel analysis of surface waves (MASW) and 23 boreholes drilled up to 30 m with standard penetration test (SPT) N values. Subsurface lithology profiles drawn from the drilled boreholes show low- to medium-compressibility clay and silty to poorly graded sand available till depth of 30 m. In addition, deeper boreholes (depth >150 m) were collected from the Lucknow Jal Nigam (Water Corporation), Government of Uttar Pradesh to understand deeper subsoil stratification. Deeper boreholes in this paper refer to those with depth over 150 m. These reports show the presence of clay mix with sand and Kankar at some locations till a depth of 150 m, followed by layers of sand, clay, and Kankar up to 400 m. Based on the available details, shallow and deeper cross-sections through Lucknow are presented. Shear wave velocity (SWV) and N-SPT values were measured for the study area using MASW and SPT testing. Measured SWV and N-SPT values for the same locations were found to be comparable. These values were used to estimate 30 m average values of N-SPT (N-30) and SWV (V-s(30)) for seismic site classification of the study area as per the National Earthquake Hazards Reduction Program (NEHRP) soil classification system. Based on the NEHRP classification, the entire study area is classified into site class C and D based on V-s(30) and site class D and E based on N-30. The issue of larger amplification during future seismic events is highlighted for a major part of the study area which comes under site class D and E. Also, the mismatch of site classes based on N-30 and V-s(30) raises the question of the suitability of the NEHRP classification system for the study region. Further, 17 sets of SPT and SWV data are used to develop a correlation between N-SPT and SWV. This represents a first attempt of seismic site classification and correlation between N-SPT and SWV in the Indo-Gangetic Basin.
Resumo:
Density functional theory (DFT) calculations are being performed to investigate the geometric, vibrational, and electronic properties of the chlorogenic acid isomer 3-CQA (1R,3R,4S,5R)-3-{(2E)-3-(3,4-dihydroxyphenyl)prop-2-enoyl]oxy}-1,4, 5-trihydroxycyclohexanecarboxylic acid), a major phenolic compound in coffee. DFT calculations with the 6-311G(d,p) basis set produce very good results. The electrostatic potential mapped onto an isodensity surface has been obtained. A natural bond orbital analysis (NBO) has been performed in order to study intramolecular bonding, interactions among bonds, and delocalization of unpaired electrons. HOMO-LUMO studies give insights into the interaction of the molecule with other species. The calculated HOMO and LUMO energies indicate that a charge transfer occurs within the molecule. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Supramolecular chemistry is an emerging tool for devising materials that can perform specified functions. The self-assembly of facially amphiphilic bile acid molecules has been extensively utilized for the development of functional soft materials. Supramolecular hydrogels derived from the bile acid backbone act as useful templates for the intercalation of multiple components. Based on this, synthesis of gel-nanoparticle hybrid materials, photoluminescent coating materials, development of a new enzyme assay technique, etc. were achieved in the author's laboratory. The present account highlights some of these achievements.
Resumo:
This paper presents an efficient approach to the modeling and classification of vehicles using the magnetic signature of the vehicle. A database was created using the magnetic signature collected over a wide range of vehicles(cars). A vehicle is modeled as an array of magnetic dipoles. The strength of the magnetic dipole and the separation between the magnetic dipoles varies for different vehicles and is dependent on the metallic composition and configuration of the vehicle. Based on the magnetic dipole data model, we present a novel method to extract a feature vector from the magnetic signature. In the classification of vehicles, a linear support vector machine configuration is used to classify the vehicles based on the obtained feature vectors.
Resumo:
The goal of optimization in vehicle design is often blurred by the myriads of requirements belonging to attributes that may not be quite related. If solutions are sought by optimizing attribute performance-related objectives separately starting with a common baseline design configuration as in a traditional design environment, it becomes an arduous task to integrate the potentially conflicting solutions into one satisfactory design. It may be thus more desirable to carry out a combined multi-disciplinary design optimization (MDO) with vehicle weight as an objective function and cross-functional attribute performance targets as constraints. For the particular case of vehicle body structure design, the initial design is likely to be arrived at taking into account styling, packaging and market-driven requirements. The problem with performing a combined cross-functional optimization is the time associated with running such CAE algorithms that can provide a single optimal solution for heterogeneous areas such as NVH and crash safety. In the present paper, a practical MDO methodology is suggested that can be applied to weight optimization of automotive body structures by specifying constraints on frequency and crash performance. Because of the reduced number of cases to be analyzed for crash safety in comparison with other MDO approaches, the present methodology can generate a single size-optimized solution without having to take recourse to empirical techniques such as response surface-based prediction of crash performance and associated successive response surface updating for convergence. An example of weight optimization of spaceframe-based BIW of an aluminum-intensive vehicle is given to illustrate the steps involved in the current optimization process.
Resumo:
Effective conservation and management of natural resources requires up-to-date information of the land cover (LC) types and their dynamics. The LC dynamics are being captured using multi-resolution remote sensing (RS) data with appropriate classification strategies. RS data with important environmental layers (either remotely acquired or derived from ground measurements) would however be more effective in addressing LC dynamics and associated changes. These ancillary layers provide additional information for delineating LC classes' decision boundaries compared to the conventional classification techniques. This communication ascertains the possibility of improved classification accuracy of RS data with ancillary and derived geographical layers such as vegetation index, temperature, digital elevation model (DEM), aspect, slope and texture. This has been implemented in three terrains of varying topography. The study would help in the selection of appropriate ancillary data depending on the terrain for better classified information.
Resumo:
In this paper, we describe a method for feature extraction and classification of characters manually isolated from scene or natural images. Characters in a scene image may be affected by low resolution, uneven illumination or occlusion. We propose a novel method to perform binarization on gray scale images by minimizing energy functional. Discrete Cosine Transform and Angular Radial Transform are used to extract the features from characters after normalization for scale and translation. We have evaluated our method on the complete test set of Chars74k dataset for English and Kannada scripts consisting of handwritten and synthesized characters, as well as characters extracted from camera captured images. We utilize only synthesized and handwritten characters from this dataset as training set. Nearest neighbor classification is used in our experiments.