930 resultados para classification algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses ECG signal classification after parametrizing the ECG waveforms in the wavelet domain. Signal decomposition using perfect reconstruction quadrature mirror filter banks can provide a very parsimonious representation of ECG signals. In the current work, the filter parameters are adjusted by a numerical optimization algorithm in order to minimize a cost function associated to the filter cut-off sharpness. The goal consists of achieving a better compromise between frequency selectivity and time resolution at each decomposition level than standard orthogonal filter banks such as those of the Daubechies and Coiflet families. Our aim is to optimally decompose the signals in the wavelet domain so that they can be subsequently used as inputs for training to a neural network classifier.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The latest version of CATH (class, architecture, topology, homology) (version 3.2), released in July 2008 (http://www.cathdb.info), contains 1 14215 domains, 2178 Homologous superfamilies and 1110 fold groups. We have assigned 20 330 new domains, 87 new homologous superfamilies and 26 new folds since CATH release version 3.1. A total of 28 064 new domains have been assigned since our NAR 2007 database publication (CATH version 3.0). The CATH website has been completely redesigned and includes more comprehensive documentation. We have revisited the CATH architecture level as part of the development of a `Protein Chart` and present information on the population of each architecture. The CATHEDRAL structure comparison algorithm has been improved and used to characterize structural diversity in CATH superfamilies and structural overlaps between superfamilies. Although the majority of superfamilies in CATH are not structurally diverse and do not overlap significantly with other superfamilies, similar to 4% of superfamilies are very diverse and these are the superfamilies that are most highly populated in both the PDB and in the genomes. Information on the degree of structural diversity in each superfamily and structural overlaps between superfamilies can now be downloaded from the CATH website.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the service life of water supply network (WSN) growth, the growing phenomenon of aging pipe network has become exceedingly serious. As urban water supply network is hidden underground asset, it is difficult for monitoring staff to make a direct classification towards the faults of pipe network by means of the modern detecting technology. In this paper, based on the basic property data (e.g. diameter, material, pressure, distance to pump, distance to tank, load, etc.) of water supply network, decision tree algorithm (C4.5) has been carried out to classify the specific situation of water supply pipeline. Part of the historical data was used to establish a decision tree classification model, and the remaining historical data was used to validate this established model. Adopting statistical methods were used to access the decision tree model including basic statistical method, Receiver Operating Characteristic (ROC) and Recall-Precision Curves (RPC). These methods has been successfully used to assess the accuracy of this established classification model of water pipe network. The purpose of classification model was to classify the specific condition of water pipe network. It is important to maintain the pipeline according to the classification results including asset unserviceable (AU), near perfect condition (NPC) and serious deterioration (SD). Finally, this research focused on pipe classification which plays a significant role in maintaining water supply networks in the future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

LEÃO, Adriano de Castro; DÓRIA NETO, Adrião Duarte; SOUSA, Maria Bernardete Cordeiro de. New developmental stages for common marmosets (Callithrix jacchus) using mass and age variables obtained by K-means algorithm and self-organizing maps (SOM). Computers in Biology and Medicine, v. 39, p. 853-859, 2009

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we deal with the problem of feature selection by introducing a new approach based on Gravitational Search Algorithm (GSA). The proposed algorithm combines the optimization behavior of GSA together with the speed of Optimum-Path Forest (OPF) classifier in order to provide a fast and accurate framework for feature selection. Experiments on datasets obtained from a wide range of applications, such as vowel recognition, image classification and fraud detection in power distribution systems are conducted in order to asses the robustness of the proposed technique against Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and a Particle Swarm Optimization (PSO)-based algorithm for feature selection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ObjectiveTo describe onset features, classification and treatment of juvenile dermatomyositis (JDM) and juvenile polymyositis (JPM) from a multicentre registry.MethodsInclusion criteria were onset age lower than 18 years and a diagnosis of any idiopathic inflammatory myopathy (IIM) by attending physician. Bohan & Peter (1975) criteria categorisation was established by a scoring algorithm to define JDM and JPM based oil clinical protocol data.ResultsOf the 189 cases included, 178 were classified as JDM, 9 as JPM (19.8: 1) and 2 did not fit the criteria; 6.9% had features of chronic arthritis and connective tissue disease overlap. Diagnosis classification agreement occurred in 66.1%. Medial? onset age was 7 years, median follow-up duration was 3.6 years. Malignancy was described in 2 (1.1%) cases. Muscle weakness occurred in 95.8%; heliotrope rash 83.5%; Gottron plaques 83.1%; 92% had at least one abnormal muscle enzyme result. Muscle biopsy performed in 74.6% was abnormal in 91.5% and electromyogram performed in 39.2% resulted abnormal in 93.2%. Logistic regression analysis was done in 66 cases with all parameters assessed and only aldolase resulted significant, as independent variable for definite JDM (OR=5.4, 95%CI 1.2-24.4, p=0.03). Regarding treatment, 97.9% received steroids; 72% had in addition at least one: methotrexate (75.7%), hydroxychloroquine (64.7%), cyclosporine A (20.6%), IV immunoglobulin (20.6%), azathioprine (10.3%) or cyclophosphamide (9.6%). In this series 24.3% developed calcinosis and mortality rate was 4.2%.ConclusionEvaluation of predefined criteria set for a valid diagnosis indicated aldolase as the most important parameter associated with de, methotrexate combination, was the most indicated treatment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article presents a quantitative and objective approach to cat ganglion cell characterization and classification. The combination of several biologically relevant features such as diameter, eccentricity, fractal dimension, influence histogram, influence area, convex hull area, and convex hull diameter are derived from geometrical transforms and then processed by three different clustering methods (Ward's hierarchical scheme, K-means and genetic algorithm), whose results are then combined by a voting strategy. These experiments indicate the superiority of some features and also suggest some possible biological implications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a method to enhance microcalcifications and classify their borders by applying the wavelet transform. Decomposing an image and removing its low frequency sub-band the microcalcifications are enhanced. Analyzing the effects of perturbations on high frequency subband it's possible to classify its borders as smooth, rugged or undefined. Results show a false positive reduction of 69.27% using a region growing algorithm. © 2008 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we would like to shed light the problem of efficiency and effectiveness of image classification in large datasets. As the amount of data to be processed and further classified has increased in the last years, there is a need for faster and more precise pattern recognition algorithms in order to perform online and offline training and classification procedures. We deal here with the problem of moist area classification in radar image in a fast manner. Experimental results using Optimum-Path Forest and its training set pruning algorithm also provided and discussed. © 2011 IEEE.