52 resultados para machine learning, decision tree, concept drift, ensemble learning, classication, random forest

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein-protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT (N-1)(60)] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters (N-1)(60) and peck ground acceleration (a(max)/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in top-down fashion. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy for assessing the hyperplanes in such a way that the geometric structure in the data is taken into account. At each node of the decision tree, we find the clustering hyperplanes for both the classes and use their angle bisectors as the split rule at that node. We show through empirical studies that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node are solutions of an interesting optimization problem and hence argue that this is a principled method of learning a decision tree.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Models of river flow time series are essential in efficient management of a river basin. It helps policy makers in developing efficient water utilization strategies to maximize the utility of scarce water resource. Time series analysis has been used extensively for modeling river flow data. The use of machine learning techniques such as support-vector regression and neural network models is gaining increasing popularity. In this paper we compare the performance of these techniques by applying it to a long-term time-series data of the inflows into the Krishnaraja Sagar reservoir (KRS) from three tributaries of the river Cauvery. In this study flow data over a period of 30 years from three different observation points established in upper Cauvery river sub-basin is analyzed to estimate their contribution to KRS. Specifically, ANN model uses a multi-layer feed forward network trained with a back-propagation algorithm and support vector regression with epsilon intensive-loss function is used. Auto-regressive moving average models are also applied to the same data. The performance of different techniques is compared using performance metrics such as root mean squared error (RMSE), correlation, normalized root mean squared error (NRMSE) and Nash-Sutcliffe Efficiency (NSE).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The minimum cost classifier when general cost functionsare associated with the tasks of feature measurement and classification is formulated as a decision graph which does not reject class labels at intermediate stages. Noting its complexities, a heuristic procedure to simplify this scheme to a binary decision tree is presented. The optimizationof the binary tree in this context is carried out using ynamicprogramming. This technique is applied to the voiced-unvoiced-silence classification in speech processing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new automata model Mr,k, with a conceptually significant innovation in the form of multi-state alternatives at each instance, is proposed in this study. Computer simulations of the Mr,k, model in the context of feature selection in an unsupervised environment has demonstrated the superiority of the model over similar models without this multi-state-choice innovation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Land cover (LC) changes play a major role in global as well as at regional scale patterns of the climate and biogeochemistry of the Earth system. LC information presents critical insights in understanding of Earth surface phenomena, particularly useful when obtained synoptically from remote sensing data. However, for developing countries and those with large geographical extent, regular LC mapping is prohibitive with data from commercial sensors (high cost factor) of limited spatial coverage (low temporal resolution and band swath). In this context, free MODIS data with good spectro-temporal resolution meet the purpose. LC mapping from these data has continuously evolved with advances in classification algorithms. This paper presents a comparative study of two robust data mining techniques, the multilayer perceptron (MLP) and decision tree (DT) on different products of MODIS data corresponding to Kolar district, Karnataka, India. The MODIS classified images when compared at three different spatial scales (at district level, taluk level and pixel level) shows that MLP based classification on minimum noise fraction components on MODIS 36 bands provide the most accurate LC mapping with 86% accuracy, while DT on MODIS 36 bands principal components leads to less accurate classification (69%).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design and operation of the minimum cost classifier, where the total cost is the sum of the measurement cost and the classification cost, is computationally complex. Noting the difficulties associated with this approach, decision tree design directly from a set of labelled samples is proposed in this paper. The feature space is first partitioned to transform the problem to one of discrete features. The resulting problem is solved by a dynamic programming algorithm over an explicitly ordered state space of all outcomes of all feature subsets. The solution procedure is very general and is applicable to any minimum cost pattern classification problem in which each feature has a finite number of outcomes. These techniques are applied to (i) voiced, unvoiced, and silence classification of speech, and (ii) spoken vowel recognition. The resulting decision trees are operationally very efficient and yield attractive classification accuracies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a novel algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess goodness of hyperplanes at each node. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy, based on some recent variants of SVM, to assess the hyperplanes in such a way that the geometric structure in the data is taken into account. We show through empirical studies that our method is effective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present a novel macroblock mode decision algorithm to speedup H.264/SVC Intra frame encoding. We replace the complex mode-decision calculations by a classifier which has been trained specifically to minimize the reduction in RD performance. This results in a significant speedup in encoding. The results show that machine learning has a great potential and can reduce the complexity substantially with negligible impact on quality. The results show that the proposed method reduces encoding time to about 70% in base layer and up to 50% in enhancement layer of the reference implementation with a negligible loss in quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose a new algorithm for learning polyhedral classifiers. In contrast to existing methods for learning polyhedral classifier which solve a constrained optimization problem, our method solves an unconstrained optimization problem. Our method is based on a logistic function based model for the posterior probability function. We propose an alternating optimization algorithm, namely, SPLA1 (Single Polyhedral Learning Algorithm1) which maximizes the loglikelihood of the training data to learn the parameters. We also extend our method to make it independent of any user specified parameter (e.g., number of hyperplanes required to form a polyhedral set) in SPLA2. We show the effectiveness of our approach with experiments on various synthetic and real world datasets and compare our approach with a standard decision tree method (OC1) and a constrained optimization based method for learning polyhedral sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a machine learning approach to measure the visual quality of JPEG-coded images. The features for predicting the perceived image quality are extracted by considering key human visual sensitivity (HVS) factors such as edge amplitude, edge length, background activity and background luminance. Image quality assessment involves estimating the functional relationship between HVS features and subjective test scores. The quality of the compressed images are obtained without referring to their original images ('No Reference' metric). Here, the problem of quality estimation is transformed to a classification problem and solved using extreme learning machine (ELM) algorithm. In ELM, the input weights and the bias values are randomly chosen and the output weights are analytically calculated. The generalization performance of the ELM algorithm for classification problems with imbalance in the number of samples per quality class depends critically on the input weights and the bias values. Hence, we propose two schemes, namely the k-fold selection scheme (KS-ELM) and the real-coded genetic algorithm (RCGA-ELM) to select the input weights and the bias values such that the generalization performance of the classifier is a maximum. Results indicate that the proposed schemes significantly improve the performance of ELM classifier under imbalance condition for image quality assessment. The experimental results prove that the estimated visual quality of the proposed RCGA-ELM emulates the mean opinion score very well. The experimental results are compared with the existing JPEG no-reference image quality metric and full-reference structural similarity image quality metric.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning automata are adaptive decision making devices that are found useful in a variety of machine learning and pattern recognition applications. Although most learning automata methods deal with the case of finitely many actions for the automaton, there are also models of continuous-action-set learning automata (CALA). A team of such CALA can be useful in stochastic optimization problems where one has access only to noise-corrupted values of the objective function. In this paper, we present a novel formulation for noise-tolerant learning of linear classifiers using a CALA team. We consider the general case of nonuniform noise, where the probability that the class label of an example is wrong may be a function of the feature vector of the example. The objective is to learn the underlying separating hyperplane given only such noisy examples. We present an algorithm employing a team of CALA and prove, under some conditions on the class conditional densities, that the algorithm achieves noise-tolerant learning as long as the probability of wrong label for any example is less than 0.5. We also present some empirical results to illustrate the effectiveness of the algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper(1) presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l(1) norm regularization for promoting sparsity within RKHS norms of each group and l(s), s >= 2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels-hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O (m(2)n(tot) log n(max)/epsilon(2)) where m is no. training data points, n(max), n(tot) are the maximum no. kernels in any group, total no. kernels respectively and epsilon is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efficiency.