930 resultados para Classification algorithms
Resumo:
This special issue is focused on the assessment of algorithms for the observation of Earth’s climate from environ- mental satellites. Climate data records derived by remote sensing are increasingly a key source of insight into the workings of and changes in Earth’s climate system. Producers of data sets must devote considerable effort and expertise to maximise the true climate signals in their products and minimise effects of data processing choices and changing sensors. A key choice is the selection of algorithm(s) for classification and/or retrieval of the climate variable. Within the European Space Agency Climate Change Initiative, science teams undertook systematic assessment of algorithms for a range of essential climate variables. The papers in the special issue report some of these exercises (for ocean colour, aerosol, ozone, greenhouse gases, clouds, soil moisture, sea surface temper- ature and glaciers). The contributions show that assessment exercises must be designed with care, considering issues such as the relative importance of different aspects of data quality (accuracy, precision, stability, sensitivity, coverage, etc.), the availability and degree of independence of validation data and the limitations of validation in characterising some important aspects of data (such as long-term stability or spatial coherence). As well as re- quiring a significant investment of expertise and effort, systematic comparisons are found to be highly valuable. They reveal the relative strengths and weaknesses of different algorithmic approaches under different observa- tional contexts, and help ensure that scientific conclusions drawn from climate data records are not influenced by observational artifacts, but are robust.
Resumo:
Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.
Resumo:
This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.
Resumo:
There is an increasing interest in the application of Evolutionary Algorithms (EAs) to induce classification rules. This hybrid approach can benefit areas where classical methods for rule induction have not been very successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when one or more classes heavily outnumber other classes. Frequently, classical machine learning (ML) classifiers are not able to learn in the presence of imbalanced data sets, inducing classification models that always predict the most numerous classes. In this work, we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are fed to classical ML systems that produce rule sets. The rule sets are combined creating a pool of rules and an EA is used to build a classifier from this pool of rules. This hybrid approach has some advantages over undersampling, since it reduces the amount of discarded information, and some advantages over oversampling, since it avoids overfitting. The proposed approach was experimentally analysed and the experimental results show an improvement in the classification performance measured as the area under the receiver operating characteristics (ROC) curve.
Resumo:
Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.
Resumo:
This paper reports on a sensor array able to distinguish tastes and used to classify red wines. The array comprises sensing units made from Langmuir-Blodgett (LB) films of conducting polymers and lipids and layer-by-layer (LBL) films from chitosan deposited onto gold interdigitated electrodes. Using impedance spectroscopy as the principle of detection, we show that distinct clusters can be identified in principal component analysis (PCA) plots for six types of red wine. Distinction can be made with regard to vintage, vineyard and brands of the red wine. Furthermore, if the data are treated with artificial neural networks (ANNs), this artificial tongue can identify wine samples stored under different conditions. This is illustrated by considering 900 wine samples, obtained with 30 measurements for each of the five bottles of the six wines, which could be recognised with 100% accuracy using the algorithms Standard Backpropagation and Backpropagation momentum in the ANNs. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
This article presents a quantitative and objective approach to cat ganglion cell characterization and classification. The combination of several biologically relevant features such as diameter, eccentricity, fractal dimension, influence histogram, influence area, convex hull area, and convex hull diameter are derived from geometrical transforms and then processed by three different clustering methods (Ward's hierarchical scheme, K-means and genetic algorithm), whose results are then combined by a voting strategy. These experiments indicate the superiority of some features and also suggest some possible biological implications.
Resumo:
This paper presents a method to enhance microcalcifications and classify their borders by applying the wavelet transform. Decomposing an image and removing its low frequency sub-band the microcalcifications are enhanced. Analyzing the effects of perturbations on high frequency subband it's possible to classify its borders as smooth, rugged or undefined. Results show a false positive reduction of 69.27% using a region growing algorithm. © 2008 IEEE.
Resumo:
This paper describes an investigation of the hybrid PSO/ACO algorithm to classify automatically the well drilling operation stages. The method feasibility is demonstrated by its application to real mud-logging dataset. The results are compared with bio-inspired methods, and rule induction and decision tree algorithms for data mining. © 2009 Springer Berlin Heidelberg.
Resumo:
In this paper we would like to shed light the problem of efficiency and effectiveness of image classification in large datasets. As the amount of data to be processed and further classified has increased in the last years, there is a need for faster and more precise pattern recognition algorithms in order to perform online and offline training and classification procedures. We deal here with the problem of moist area classification in radar image in a fast manner. Experimental results using Optimum-Path Forest and its training set pruning algorithm also provided and discussed. © 2011 IEEE.
Resumo:
The correct classification of sugar according to its physico-chemical characteristics directly influences the value of the product and its acceptance by the market. This study shows that using an electronic tongue system along with established techniques of supervised learning leads to the correct classification of sugar samples according to their qualities. In this paper, we offer two new real, public and non-encoded sugar datasets whose attributes were automatically collected using an electronic tongue, with and without pH controlling. Moreover, we compare the performance achieved by several established machine learning methods. Our experiments were diligently designed to ensure statistically sound results and they indicate that k-nearest neighbors method outperforms other evaluated classifiers and, hence, it can be used as a good baseline for further comparison. © 2012 IEEE.
Resumo:
The efficiency in image classification tasks can be improved using combined information provided by several sources, such as shape, color, and texture visual properties. Although many works proposed to combine different feature vectors, we model the descriptor combination as an optimization problem to be addressed by evolutionary-based techniques, which compute distances between samples that maximize their separability in the feature space. The robustness of the proposed technique is assessed by the Optimum-Path Forest classifier. Experiments showed that the proposed methodology can outperform individual information provided by single descriptors in well-known public datasets. © 2012 IEEE.
Resumo:
This article deals with classification problems involving unequal probabilities in each class and discusses metrics to systems that use multilayer perceptrons neural networks (MLP) for the task of classifying new patterns. In addition we propose three new pruning methods that were compared to other seven existing methods in the literature for MLP networks. All pruning algorithms presented in this paper have been modified by the authors to do pruning of neurons, in order to produce fully connected MLP networks but being small in its intermediary layer. Experiments were carried out involving the E. coli unbalanced classification problem and ten pruning methods. The proposed methods had obtained good results, actually, better results than another pruning methods previously defined at the MLP neural network area. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
In this letter, we present different approaches for music genre classification. The proposed techniques, which are composed of a feature extraction stage followed by a classification procedure, explore both the variations of parameters used as input and the classifier architecture. Tests were carried out with three styles of music, namely blues, classical, and lounge, which are considered informally by some musicians as being “big dividers” among music genres, showing the efficacy of the proposed algorithms and establishing a relationship between the relevance of each set of parameters for each music style and each classifier. In contrast to other works, entropies and fractal dimensions are the features adopted for the classifications.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)