89 resultados para Naïve Bayes classifier
Resumo:
In this paper, we propose a novel heuristic approach to segment recognizable symbols from online Kannada word data and perform recognition of the entire word. Two different estimates of first derivative are extracted from the preprocessed stroke groups and used as features for classification. Estimate 2 proved better resulting in 88% accuracy, which is 3% more than that achieved with estimate 1. Classification is performed by statistical dynamic space warping (SDSW) classifier which uses X, Y co-ordinates and their first derivatives as features. Classifier is trained with data from 40 writers. 295 classes are handled covering Kannada aksharas, with Kannada numerals, Indo-Arabic numerals, punctuations and other special symbols like $ and #. Classification accuracies obtained are 88% at the akshara level and 80% at the word level, which shows the scope for further improvement in segmentation algorithm
Resumo:
The problem of on-line recognition and retrieval of relatively weak industrial signals such as partial discharges (PD), buried in excessive noise, has been addressed in this paper. The major bottleneck being the recognition and suppression of stochastic pulsive interference (PI) due to the overlapping broad band frequency spectrum of PI and PD pulses. Therefore, on-line, onsite, PD measurement is hardly possible in conventional frequency based DSP techniques. The observed PD signal is modeled as a linear combination of systematic and random components employing probabilistic principal component analysis (PPCA) and the pdf of the underlying stochastic process is obtained. The PD/PI pulses are assumed as the mean of the process and modeled instituting non-parametric methods, based on smooth FIR filters, and a maximum aposteriori probability (MAP) procedure employed therein, to estimate the filter coefficients. The classification of the pulses is undertaken using a simple PCA classifier. The methods proposed by the authors were found to be effective in automatic retrieval of PD pulses completely rejecting PI.
Resumo:
In this paper, we present an unrestricted Kannada online handwritten character recognizer which is viable for real time applications. It handles Kannada and Indo-Arabic numerals, punctuation marks and special symbols like $, &, # etc, apart from all the aksharas of the Kannada script. The dataset used has handwriting of 69 people from four different locations, making the recognition writer independent. It was found that for the DTW classifier, using smoothed first derivatives as features, enhanced the performance to 89% as compared to preprocessed co-ordinates which gave 85%, but was too inefficient in terms of time. To overcome this, we used Statistical Dynamic Time Warping (SDTW) and achieved 46 times faster classification with comparable accuracy i.e. 88%, making it fast enough for practical applications. The accuracies reported are raw symbol recognition results from the classifier. Thus, there is good scope of improvement in actual applications. Where domain constraints such as fixed vocabulary, language models and post processing can be employed. A working demo is also available on tablet PC for recognition of Kannada words.
Resumo:
In this paper, we compare the experimental results for Tamil online handwritten character recognition using HMM and Statistical Dynamic Time Warping (SDTW) as classifiers. HMM was used for a 156-class problem. Different feature sets and values for the HMM states & mixtures were tried and the best combination was found to be 16 states & 14 mixtures, giving an accuracy of 85%. The features used in this combination were retained and a SDTW model with 20 states and single Gaussian was used as classifier. Also, the symbol set was increased to include numerals, punctuation marks and special symbols like $, & and #, taking the number of classes to 188. It was found that, with a small addition to the feature set, this simple SDTW classifier performed on par with the more complicated HMM model, giving an accuracy of 84%. Mixture density estimation computations was reduced by 11 times. The recognition is writer independent, as the dataset used is quite large, with a variety of handwriting styles.
Resumo:
The design and operation of the minimum cost classifier, where the total cost is the sum of the measurement cost and the classification cost, is computationally complex. Noting the difficulties associated with this approach, decision tree design directly from a set of labelled samples is proposed in this paper. The feature space is first partitioned to transform the problem to one of discrete features. The resulting problem is solved by a dynamic programming algorithm over an explicitly ordered state space of all outcomes of all feature subsets. The solution procedure is very general and is applicable to any minimum cost pattern classification problem in which each feature has a finite number of outcomes. These techniques are applied to (i) voiced, unvoiced, and silence classification of speech, and (ii) spoken vowel recognition. The resulting decision trees are operationally very efficient and yield attractive classification accuracies.
Resumo:
Land cover (LC) and land use (LU) dynamics induced by human and natural processes play a major role in global as well as regional patterns of landscapes influencing biodiversity, hydrology, ecology and climate. Changes in LC features resulting in forest fragmentations have posed direct threats to biodiversity, endangering the sustainability of ecological goods and services. Habitat fragmentation is of added concern as the residual spatial patterns mitigate or exacerbate edge effects. LU dynamics are obtained by classifying temporal remotely sensed satellite imagery of different spatial and spectral resolutions. This paper reviews five different image classification algorithms using spatio-temporal data of a temperate watershed in Himachal Pradesh, India. Gaussian Maximum Likelihood classifier was found to be apt for analysing spatial pattern at regional scale based on accuracy assessment through error matrix and ROC (receiver operating characteristic) curves. The LU information thus derived was then used to assess spatial changes from temporal data using principal component analysis and correspondence analysis based image differencing. The forest area dynamics was further studied by analysing the different types of fragmentation through forest fragmentation models. The computed forest fragmentation and landscape metrics show a decline of interior intact forests with a substantial increase in patch forest during 1972-2007.
Resumo:
In this paper we propose a new algorithm for learning polyhedral classifiers. In contrast to existing methods for learning polyhedral classifier which solve a constrained optimization problem, our method solves an unconstrained optimization problem. Our method is based on a logistic function based model for the posterior probability function. We propose an alternating optimization algorithm, namely, SPLA1 (Single Polyhedral Learning Algorithm1) which maximizes the loglikelihood of the training data to learn the parameters. We also extend our method to make it independent of any user specified parameter (e.g., number of hyperplanes required to form a polyhedral set) in SPLA2. We show the effectiveness of our approach with experiments on various synthetic and real world datasets and compare our approach with a standard decision tree method (OC1) and a constrained optimization based method for learning polyhedral sets.
Resumo:
In this paper, knowledge-based approach using Support Vector Machines (SVMs) are used for estimating the coordinated zonal settings of a distance relay. The approach depends on the detailed simulation studies of apparent impedance loci as seen by distance relay during disturbance, considering various operating conditions including fault resistance. In a distance relay, the impedance loci given at the relay location is obtained from extensive transient stability studies. SVMs are used as a pattern classifier for obtaining distance relay co-ordination. The scheme utilizes the apparent impedance values observed during a fault as inputs. An improved performance with the use of SVMs, keeping the reach when faced with different fault conditions as well as system power flow changes, are illustrated with an equivalent 265 bus system of a practical Indian Western Grid.
Resumo:
Given the increasing cost of designing and building new highway pavements, reliability analysis has become vital to ensure that a given pavement performs as expected in the field. Recognizing the importance of failure analysis to safety, reliability, performance, and economy, back analysis has been employed in various engineering applications to evaluate the inherent uncertainties of the design and analysis. The probabilistic back analysis method formulated on Bayes' theorem and solved using the Markov chain Monte Carlo simulation method with a Metropolis-Hastings algorithm has proved to be highly efficient to address this issue. It is also quite flexible and is applicable to any type of prior information. In this paper, this method has been used to back-analyze the parameters that influence the pavement life and to consider the uncertainty of the mechanistic-empirical pavement design model. The load-induced pavement structural responses (e.g., stresses, strains, and deflections) used to predict the pavement life are estimated using the response surface methodology model developed based on the results of linear elastic analysis. The failure criteria adopted for the analysis were based on the factor of safety (FOS), and the study was carried out for different sample sizes and jumping distributions to estimate the most robust posterior statistics. From the posterior statistics of the case considered, it was observed that after approximately 150 million standard axle load repetitions, the mean values of the pavement properties decrease as expected, with a significant decrease in the values of the elastic moduli of the expected layers. An analysis of the posterior statistics indicated that the parameters that contribute significantly to the pavement failure were the moduli of the base and surface layer, which is consistent with the findings from other studies. After the back analysis, the base modulus parameters show a significant decrease of 15.8% and the surface layer modulus a decrease of 3.12% in the mean value. The usefulness of the back analysis methodology is further highlighted by estimating the design parameters for specified values of the factor of safety. The analysis revealed that for the pavement section considered, a reliability of 89% and 94% can be achieved by adopting FOS values of 1.5 and 2, respectively. The methodology proposed can therefore be effectively used to identify the parameters that are critical to pavement failure in the design of pavements for specified levels of reliability. DOI: 10.1061/(ASCE)TE.1943-5436.0000455. (C) 2013 American Society of Civil Engineers.
Resumo:
The study extends the first order reliability method (FORM) and inverse FORM to update reliability models for existing, statically loaded structures based on measured responses. Solutions based on Bayes' theorem, Markov chain Monte Carlo simulations, and inverse reliability analysis are developed. The case of linear systems with Gaussian uncertainties and linear performance functions is shown to be exactly solvable. FORM and inverse reliability based methods are subsequently developed to deal with more general problems. The proposed procedures are implemented by combining Matlab based reliability modules with finite element models residing on the Abaqus software. Numerical illustrations on linear and nonlinear frames are presented. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.
Resumo:
This paper considers sequential hypothesis testing in a decentralized framework. We start with two simple decentralized sequential hypothesis testing algorithms. One of which is later proved to be asymptotically Bayes optimal. We also consider composite versions of decentralized sequential hypothesis testing. A novel nonparametric version for decentralized sequential hypothesis testing using universal source coding theory is developed. Finally we design a simple decentralized multihypothesis sequential detection algorithm.
Resumo:
This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.
Resumo:
In this paper we propose a new algorithm for learning polyhedral classifiers which we call as Polyceptron. It is a Perception like algorithm which updates the parameters only when the current classifier misclassifies any training data. We give both batch and online version of Polyceptron algorithm. Finally we give experimental results to show the effectiveness of our approach.
Resumo:
In this paper, we present a fast learning neural network classifier for human action recognition. The proposed classifier is a fully complex-valued neural network with a single hidden layer. The neurons in the hidden layer employ the fully complex-valued hyperbolic secant as an activation function. The parameters of the hidden layer are chosen randomly and the output weights are estimated analytically as a minimum norm least square solution to a set of linear equations. The fast leaning fully complex-valued neural classifier is used for recognizing human actions accurately. Optical flow-based features extracted from the video sequences are utilized to recognize 10 different human actions. The feature vectors are computationally simple first order statistics of the optical flow vectors, obtained from coarse to fine rectangular patches centered around the object. The results indicate the superior performance of the complex-valued neural classifier for action recognition. The superior performance of the complex neural network for action recognition stems from the fact that motion, by nature, consists of two components, one along each of the axes.