995 resultados para classifier design
Resumo:
Gaussian processes (GPs) are promising Bayesian methods for classification and regression problems. Design of a GP classifier and making predictions using it is, however, computationally demanding, especially when the training set size is large. Sparse GP classifiers are known to overcome this limitation. In this letter, we propose and study a validation-based method for sparse GP classifier design. The proposed method uses a negative log predictive (NLP) loss measure, which is easy to compute for GP models. We use this measure for both basis vector selection and hyperparameter adaptation. The experimental results on several real-world benchmark data sets show better orcomparable generalization performance over existing methods.
Resumo:
Structural Support Vector Machines (SSVMs) and Conditional Random Fields (CRFs) are popular discriminative methods used for classifying structured and complex objects like parse trees, image segments and part-of-speech tags. The datasets involved are very large dimensional, and the models designed using typical training algorithms for SSVMs and CRFs are non-sparse. This non-sparse nature of models results in slow inference. Thus, there is a need to devise new algorithms for sparse SSVM and CRF classifier design. Use of elastic net and L1-regularizer has already been explored for solving primal CRF and SSVM problems, respectively, to design sparse classifiers. In this work, we focus on dual elastic net regularized SSVM and CRF. By exploiting the weakly coupled structure of these convex programming problems, we propose a new sequential alternating proximal (SAP) algorithm to solve these dual problems. This algorithm works by sequentially visiting each training set example and solving a simple subproblem restricted to a small subset of variables associated with that example. Numerical experiments on various benchmark sequence labeling datasets demonstrate that the proposed algorithm scales well. Further, the classifiers designed are sparser than those designed by solving the respective primal problems and demonstrate comparable generalization performance. Thus, the proposed SAP algorithm is a useful alternative for sparse SSVM and CRF classifier design.
Resumo:
The design of binary morphological operators that are translation-invariant and locally defined by a finite neighborhood window corresponds to the problem of designing Boolean functions. As in any supervised classification problem, morphological operators designed from a training sample also suffer from overfitting. Large neighborhood tends to lead to performance degradation of the designed operator. This work proposes a multilevel design approach to deal with the issue of designing large neighborhood-based operators. The main idea is inspired by stacked generalization (a multilevel classifier design approach) and consists of, at each training level, combining the outcomes of the previous level operators. The final operator is a multilevel operator that ultimately depends on a larger neighborhood than of the individual operators that have been combined. Experimental results show that two-level operators obtained by combining operators designed on subwindows of a large window consistently outperform the single-level operators designed on the full window. They also show that iterating two-level operators is an effective multilevel approach to obtain better results.
Resumo:
Analyzing “nuggety” gold samples commonly produces erratic fire assay results, due to random inclusion or exclusion of coarse gold in analytical samples. Preconcentrating gold samples might allow the nuggets to be concentrated and fire assayed separately. In this investigation synthetic gold samples were made using similar density tungsten powder and silica, and were preconcentrated using two approaches: an air jig and an air classifier. Current analytical gold sampling method is time and labor intensive and our aim is to design a set-up for rapid testing. It was observed that the preliminary air classifier design showed more promise than the air jig in terms of control over mineral recovery and preconcentrating bulk ore sub-samples. Hence the air classifier was modified with the goal of producing 10-30 grams samples aiming to capture all of the high density metallic particles, tungsten in this case. Effects of air velocity and feed rate on the recovery of tungsten from synthetic tungsten-silica mixtures were studied. The air classifier achieved optimal high density metal recovery of 97.7% at an air velocity of 0.72 m/s and feed rate of 160 g/min. Effects of density on classification were investigated by using iron as the dense metal instead of tungsten and the recovery was seen to drop from 96.13% to 20.82%. Preliminary investigations suggest that preconcentration of gold samples is feasible using the laboratory designed air classifier.
Resumo:
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Resumo:
This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In this study we outline experiments conducted to improve the classifier design. The major improvement came from using Support Vector machines as a "smart gate" sorting the outputs of several different targeting peptide detection networks. Our final model (PProwler v1.2) gives MCC values of 0.873 for non-plant and 0.849 for plant proteins. The model improves upon the accuracy of our previous subcellular localization predictor (PProwler v1.1) by 2% for plant data (which represents 7.5% improvement upon TargetP).
Resumo:
Usually, data mining projects that are based on decision trees for classifying test cases will use the probabilities provided by these decision trees for ranking classified test cases. We have a need for a better method for ranking test cases that have already been classified by a binary decision tree because these probabilities are not always accurate and reliable enough. A reason for this is that the probability estimates computed by existing decision tree algorithms are always the same for all the different cases in a particular leaf of the decision tree. This is only one reason why the probability estimates given by decision tree algorithms can not be used as an accurate means of deciding if a test case has been correctly classified. Isabelle Alvarez has proposed a new method that could be used to rank the test cases that were classified by a binary decision tree [Alvarez, 2004]. In this paper we will give the results of a comparison of different ranking methods that are based on the probability estimate, the sensitivity of a particular case or both.
Resumo:
This paper presents the design and development of a novel optical vehicle classifier system, which is based on interruption of laser beams, that is suitable for use in places with poor transportation infrastructure. The system can estimate the speed, axle count, wheelbase, tire diameter, and the lane of motion of a vehicle. The design of the system eliminates the need for careful optical alignment, whereas the proposed estimation strategies render the estimates insensitive to angular mounting errors and to unevenness of the road. Strategies to estimate vehicular parameters are described along with the optimization of the geometry of the system to minimize estimation errors due to quantization. The system is subsequently fabricated, and the proposed features of the system are experimentally demonstrated. The relative errors in the estimation of velocity and tire diameter are shown to be within 0.5% and to change by less than 17% for angular mounting errors up to 30 degrees. In the field, the classifier demonstrates accuracy better than 97.5% and 94%, respectively, in the estimation of the wheelbase and lane of motion and can classify vehicles with an average accuracy of over 89.5%.
Resumo:
We have discovered a novel approach of intrusion detection system using an intelligent data classifier based on a self organizing map (SOM). We have surveyed all other unsupervised intrusion detection methods, different alternative SOM based techniques and KDD winner IDS methods. This paper provides a robust designed and implemented intelligent data classifier technique based on a single large size (30x30) self organizing map (SOM) having the capability to detect all types of attacks given in the DARPA Archive 1999 the lowest false positive rate being 0.04 % and higher detection rate being 99.73% tested using full KDD data sets and 89.54% comparable detection rate and 0.18% lowest false positive rate tested using corrected data sets.
Resumo:
In this paper, a computer-aided diagnostic (CAD) system for the classification of hepatic lesions from computed tomography (CT) images is presented. Regions of interest (ROIs) taken from nonenhanced CT images of normal liver, hepatic cysts, hemangiomas, and hepatocellular carcinomas have been used as input to the system. The proposed system consists of two modules: the feature extraction and the classification modules. The feature extraction module calculates the average gray level and 48 texture characteristics, which are derived from the spatial gray-level co-occurrence matrices, obtained from the ROIs. The classifier module consists of three sequentially placed feed-forward neural networks (NNs). The first NN classifies into normal or pathological liver regions. The pathological liver regions are characterized by the second NN as cyst or "other disease." The third NN classifies "other disease" into hemangioma or hepatocellular carcinoma. Three feature selection techniques have been applied to each individual NN: the sequential forward selection, the sequential floating forward selection, and a genetic algorithm for feature selection. The comparative study of the above dimensionality reduction methods shows that genetic algorithms result in lower dimension feature vectors and improved classification performance.
Resumo:
Inspection of solder joints has been a critical process in the electronic manufacturing industry to reduce manufacturing cost, improve yield, and ensure product quality and reliability. This paper proposes two inspection modules for an automatic solder joint classification system. The “front-end” inspection system includes illumination normalisation, localisation and segmentation. The “back-end” inspection involves the classification of solder joints using the Log Gabor filter and classifier fusion. Five different levels of solder quality with respect to the amount of solder paste have been defined. The Log Gabor filter has been demonstrated to achieve high recognition rates and is resistant to misalignment. This proposed system does not need any special illumination system, and the images are acquired by an ordinary digital camera. This system could contribute to the development of automated non-contact, non-destructive and low cost solder joint quality inspection systems.
Resumo:
Design of speaker identification schemes for a small number of speakers (around 10) with a high degree of accuracy in controlled environment is a practical proposition today. When the number of speakers is large (say 50–100), many of these schemes cannot be directly extended, as both recognition error and computation time increase monotonically with population size. The feature selection problem is also complex for such schemes. Though there were earlier attempts to rank order features based on statistical distance measures, it has been observed only recently that the best two independent measurements are not the same as the combination in two's for pattern classification. We propose here a systematic approach to the problem using the decision tree or hierarchical classifier with the following objectives: (1) Design of optimal policy at each node of the tree given the tree structure i.e., the tree skeleton and the features to be used at each node. (2) Determination of the optimal feature measurement and decision policy given only the tree skeleton. Applicability of optimization procedures such as dynamic programming in the design of such trees is studied. The experimental results deal with the design of a 50 speaker identification scheme based on this approach.
Resumo:
The design and operation of the minimum cost classifier, where the total cost is the sum of the measurement cost and the classification cost, is computationally complex. Noting the difficulties associated with this approach, decision tree design directly from a set of labelled samples is proposed in this paper. The feature space is first partitioned to transform the problem to one of discrete features. The resulting problem is solved by a dynamic programming algorithm over an explicitly ordered state space of all outcomes of all feature subsets. The solution procedure is very general and is applicable to any minimum cost pattern classification problem in which each feature has a finite number of outcomes. These techniques are applied to (i) voiced, unvoiced, and silence classification of speech, and (ii) spoken vowel recognition. The resulting decision trees are operationally very efficient and yield attractive classification accuracies.