974 resultados para Classification Tree Pruning


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Information was collated on the seed storage behaviour of 67 tree species native to the Amazon rainforest of Brazil; 38 appeared to show orthodox, 23 recalcitrant and six intermediate seed storage behaviour. A double-criteria key based on thousand-seed weight and seed moisture content at shedding to estimate likely seed storage behaviour, developed previously, showed good agreement with the above classifications. The key can aid seed storage behaviour identification considerably.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With the service life of water supply network (WSN) growth, the growing phenomenon of aging pipe network has become exceedingly serious. As urban water supply network is hidden underground asset, it is difficult for monitoring staff to make a direct classification towards the faults of pipe network by means of the modern detecting technology. In this paper, based on the basic property data (e.g. diameter, material, pressure, distance to pump, distance to tank, load, etc.) of water supply network, decision tree algorithm (C4.5) has been carried out to classify the specific situation of water supply pipeline. Part of the historical data was used to establish a decision tree classification model, and the remaining historical data was used to validate this established model. Adopting statistical methods were used to access the decision tree model including basic statistical method, Receiver Operating Characteristic (ROC) and Recall-Precision Curves (RPC). These methods has been successfully used to assess the accuracy of this established classification model of water pipe network. The purpose of classification model was to classify the specific condition of water pipe network. It is important to maintain the pipeline according to the classification results including asset unserviceable (AU), near perfect condition (NPC) and serious deterioration (SD). Finally, this research focused on pipe classification which plays a significant role in maintaining water supply networks in the future.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article deals with classification problems involving unequal probabilities in each class and discusses metrics to systems that use multilayer perceptrons neural networks (MLP) for the task of classifying new patterns. In addition we propose three new pruning methods that were compared to other seven existing methods in the literature for MLP networks. All pruning algorithms presented in this paper have been modified by the authors to do pruning of neurons, in order to produce fully connected MLP networks but being small in its intermediary layer. Experiments were carried out involving the E. coli unbalanced classification problem and ten pruning methods. The proposed methods had obtained good results, actually, better results than another pruning methods previously defined at the MLP neural network area. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods: Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results: We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions: The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in countries with limited resources.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Active optical sensing (LIDAR and light curtain transmission) devices mounted on a mobile platform can correctly detect, localize, and classify trees. To conduct an evaluation and comparison of the different sensors, an optical encoder wheel was used for vehicle odometry and provided a measurement of the linear displacement of the prototype vehicle along a row of tree seedlings as a reference for each recorded sensor measurement. The field trials were conducted in a juvenile tree nursery with one-year-old grafted almond trees at Sierra Gold Nurseries, Yuba City, CA, United States. Through these tests and subsequent data processing, each sensor was individually evaluated to characterize their reliability, as well as their advantages and disadvantages for the proposed task. Test results indicated that 95.7% and 99.48% of the trees were successfully detected with the LIDAR and light curtain sensors, respectively. LIDAR correctly classified, between alive or dead tree states at a 93.75% success rate compared to 94.16% for the light curtain sensor. These results can help system designers select the most reliable sensor for the accurate detection and localization of each tree in a nursery, which might allow labor-intensive tasks, such as weeding, to be automated without damaging crops.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Retrospective clinical data presents many challenges for data mining and machine learning. The transcription of patient records from paper charts and subsequent manipulation of data often results in high volumes of noise as well as a loss of other important information. In addition, such datasets often fail to represent expert medical knowledge and reasoning in any explicit manner. In this research we describe applying data mining methods to retrospective clinical data to build a prediction model for asthma exacerbation severity for pediatric patients in the emergency department. Difficulties in building such a model forced us to investigate alternative strategies for analyzing and processing retrospective data. This paper describes this process together with an approach to mining retrospective clinical data by incorporating formalized external expert knowledge (secondary knowledge sources) into the classification task. This knowledge is used to partition the data into a number of coherent sets, where each set is explicitly described in terms of the secondary knowledge source. Instances from each set are then classified in a manner appropriate for the characteristics of the particular set. We present our methodology and outline a set of experiential results that demonstrate some advantages and some limitations of our approach. © 2008 Springer-Verlag Berlin Heidelberg.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a molecular phylogenetic analysis of caenophidian (advanced) snakes using sequences from two mitochondrial genes (12S and 16S rRNA) and one nuclear (c-mos) gene (1681 total base pairs), and with 131 terminal taxa sampled from throughout all major caenophidian lineages but focussing on Neotropical xenodontines. Direct optimization parsimony analysis resulted in a well-resolved phylogenetic tree, which corroborates some clades identified in previous analyses and suggests new hypotheses for the composition and relationships of others. The major salient points of our analysis are: (1) placement of Acrochordus, Xenodermatids, and Pareatids as successive outgroups to all remaining caenophidians (including viperids, elapids, atractaspidids, and all other "colubrid" groups); (2) within the latter group, viperids and homalopsids are sucessive sister clades to all remaining snakes; (3) the following monophyletic clades within crown group caenophidians: Afro-Asian psammophiids (including Mimophis from Madagascar), Elapidae (including hydrophiines but excluding Homoroselaps), Pseudoxyrhophiinae, Colubrinae, Natricinae, Dipsadinae, and Xenodontinae. Homoroselaps is associated with atractaspidids. Our analysis suggests some taxonomic changes within xenodontines, including new taxonomy for Alsophis elegans, Liophis amarali, and further taxonomic changes within Xenodontini and the West Indian radiation of xenodontines. Based on our molecular analysis, we present a revised classification for caenophidians and provide morphological diagnoses for many of the included clades; we also highlight groups where much more work is needed. We name as new two higher taxonomic clades within Caenophidia, one new subfamily within Dipsadidae, and, within Xenodontinae five new tribes, six new genera and two resurrected genera. We synonymize Xenoxybelis and Pseudablabes with Philodryas; Erythrolamprus with Liophis; and Lystrophis and Waglerophis with Xenodon.