974 resultados para Classification Tree Pruning
Resumo:
We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 <= r <= 21 (85.2%) and r >= 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 <= r <= 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (> 80%) while simultaneously achieving low contamination (similar to 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 <= r <= 21.
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
In this work the identification and diagnosis of various stages of chronic liver disease is addressed. The classification results of a support vector machine, a decision tree and a k-nearest neighbor classifier are compared. Ultrasound image intensity and textural features are jointly used with clinical and laboratorial data in the staging process. The classifiers training is performed by using a population of 97 patients at six different stages of chronic liver disease and a leave-one-out cross-validation strategy. The best results are obtained using the support vector machine with a radial-basis kernel, with 73.20% of overall accuracy. The good performance of the method is a promising indicator that it can be used, in a non invasive way, to provide reliable information about the chronic liver disease staging.
Resumo:
Radio link quality estimation is essential for protocols and mechanisms such as routing, mobility management and localization, particularly for low-power wireless networks such as wireless sensor networks. Commodity Link Quality Estimators (LQEs), e.g. PRR, RNP, ETX, four-bit and RSSI, can only provide a partial characterization of links as they ignore several link properties such as channel quality and stability. In this paper, we propose F-LQE (Fuzzy Link Quality Estimator, a holistic metric that estimates link quality on the basis of four link quality properties—packet delivery, asymmetry, stability, and channel quality—that are expressed and combined using Fuzzy Logic. We demonstrate through an extensive experimental analysis that F-LQE is more reliable than existing estimators (e.g., PRR, WMEWMA, ETX, RNP, and four-bit) as it provides a finer grain link classification. It is also more stable as it has lower coefficient of variation of link estimates. Importantly, we evaluate the impact of F-LQE on the performance of tree routing, specifically the CTP (Collection Tree Protocol). For this purpose, we adapted F-LQE to build a new routing metric for CTP, which we dubbed as F-LQE/RM. Extensive experimental results obtained with state-of-the-art widely used test-beds show that F-LQE/RM improves significantly CTP routing performance over four-bit (the default LQE of CTP) and ETX (another popular LQE). F-LQE/RM improves the end-to-end packet delivery by up to 16%, reduces the number of packet retransmissions by up to 32%, reduces the Hop count by up to 4%, and improves the topology stability by up to 47%.
Resumo:
Grasslands in semi-arid regions, like Mongolian steppes, are facing desertification and degradation processes, due to climate change. Mongolia’s main economic activity consists on an extensive livestock production and, therefore, it is a concerning matter for the decision makers. Remote sensing and Geographic Information Systems provide the tools for advanced ecosystem management and have been widely used for monitoring and management of pasture resources. This study investigates which is the higher thematic detail that is possible to achieve through remote sensing, to map the steppe vegetation, using medium resolution earth observation imagery in three districts (soums) of Mongolia: Dzag, Buutsagaan and Khureemaral. After considering different thematic levels of detail for classifying the steppe vegetation, the existent pasture types within the steppe were chosen to be mapped. In order to investigate which combination of data sets yields the best results and which classification algorithm is more suitable for incorporating these data sets, a comparison between different classification methods were tested for the study area. Sixteen classifications were performed using different combinations of estimators, Landsat-8 (spectral bands and Landsat-8 NDVI-derived) and geophysical data (elevation, mean annual precipitation and mean annual temperature) using two classification algorithms, maximum likelihood and decision tree. Results showed that the best performing model was the one that incorporated Landsat-8 bands with mean annual precipitation and mean annual temperature (Model 13), using the decision tree. For maximum likelihood, the model that incorporated Landsat-8 bands with mean annual precipitation (Model 5) and the one that incorporated Landsat-8 bands with mean annual precipitation and mean annual temperature (Model 13), achieved the higher accuracies for this algorithm. The decision tree models consistently outperformed the maximum likelihood ones.
Resumo:
Aniba canelilla (H.B.K.) Mez. is a tree species from Amazon that produces essential oil. The oil extraction from its leaves and stems can be an alternative way to avoid the tree cutting for production of essential oil. The aim of this study was to analyse factors that may influence the essential oil production and the biomass of resprouts after pruning the leaves and stems of A. canelilla trees. The tree crowns were pruned in the wet season and after nine months the leaves and stems of the remaining crown and the resprouts were collected, in the dry season. The results showed that the essential oil yield and chemical composition differed among the stems, leaves and resprouts. The stems' essential oil production differed between the seasons and had a higher production in the resprouting stems than the old stems of the remaining crown. The production of essential oil and leaf biomass of resprouts were differently related to the canopy openness, indicating that light increases the production of the essential oil and decreases the biomass of resprouting leaves. This study revealed that plant organs differ in their essential oil production and that the canopy openness must be taken into account when pruning the A. canelilla tree crown in order to achieve higher oil productivity.
Resumo:
Background Individual signs and symptoms are of limited value for the diagnosis of influenza. Objective To develop a decision tree for the diagnosis of influenza based on a classification and regression tree (CART) analysis. Methods Data from two previous similar cohort studies were assembled into a single dataset. The data were randomly divided into a development set (70%) and a validation set (30%). We used CART analysis to develop three models that maximize the number of patients who do not require diagnostic testing prior to treatment decisions. The validation set was used to evaluate overfitting of the model to the training set. Results Model 1 has seven terminal nodes based on temperature, the onset of symptoms and the presence of chills, cough and myalgia. Model 2 was a simpler tree with only two splits based on temperature and the presence of chills. Model 3 was developed with temperature as a dichotomous variable (≥38°C) and had only two splits based on the presence of fever and myalgia. The area under the receiver operating characteristic curves (AUROCC) for the development and validation sets, respectively, were 0.82 and 0.80 for Model 1, 0.75 and 0.76 for Model 2 and 0.76 and 0.77 for Model 3. Model 2 classified 67% of patients in the validation group into a high- or low-risk group compared with only 38% for Model 1 and 54% for Model 3. Conclusions A simple decision tree (Model 2) classified two-thirds of patients as low or high risk and had an AUROCC of 0.76. After further validation in an independent population, this CART model could support clinical decision making regarding influenza, with low-risk patients requiring no further evaluation for influenza and high-risk patients being candidates for empiric symptomatic or drug therapy.
Resumo:
This article reviews current concepts of the biology of Endotrypanum spp. Data summarized here on parasite classification and taxonomic divergence found among these haemoflagellates come from our studies of molecular characterization of Endotrypanum stocks (representing an heterogenous population of reference strains and isolates from the Brazilian Amazon region) and from scientific literature. Using numerical zymotaxonomy we have demonstrated genetic diversity among these parasites. The molecular trees obtained revealed that there are, at least, three groups (distinct species?) of Endotrypanum, which are distributed in Central and South America. In concordance with this classification of the parasites there are further newer molecular data obtained using distinct markers. Moreover, comparative studies (based on the molecular genetics of the organisms) have shown the phylogenetic relationships between some Endotrypanum and related kinetoplastid lineages.
Resumo:
Schistosomiasis mansoni is not just a physical disease, but is related to social and behavioural factors as well. Snails of the Biomphalaria genus are an intermediate host for Schistosoma mansoni and infect humans through water. The objective of this study is to classify the risk of schistosomiasis in the state of Minas Gerais (MG). We focus on socioeconomic and demographic features, basic sanitation features, the presence of accumulated water bodies, dense vegetation in the summer and winter seasons and related terrain characteristics. We draw on the decision tree approach to infection risk modelling and mapping. The model robustness was properly verified. The main variables that were selected by the procedure included the terrain's water accumulation capacity, temperature extremes and the Human Development Index. In addition, the model was used to generate two maps, one that included risk classification for the entire of MG and another that included classification errors. The resulting map was 62.9% accurate.
Resumo:
OBJECTIVE. The main goal of this paper is to obtain a classification model based on feed-forward multilayer perceptrons in order to improve postpartum depression prediction during the 32 weeks after childbirth with a high sensitivity and specificity and to develop a tool to be integrated in a decision support system for clinicians. MATERIALS AND METHODS. Multilayer perceptrons were trained on data from 1397 women who had just given birth, from seven Spanish general hospitals, including clinical, environmental and genetic variables. A prospective cohort study was made just after delivery, at 8 weeks and at 32 weeks after delivery. The models were evaluated with the geometric mean of accuracies using a hold-out strategy. RESULTS. Multilayer perceptrons showed good performance (high sensitivity and specificity) as predictive models for postpartum depression. CONCLUSIONS. The use of these models in a decision support system can be clinically evaluated in future work. The analysis of the models by pruning leads to a qualitative interpretation of the influence of each variable in the interest of clinical protocols.
Resumo:
Aleppo pine (Pinus halepensis Mill.) is a relevant conifer species for studying adaptive responses to drought and fire regimes in the Mediterranean region. In this study, we performed Illumina next-generation sequencing of two phenotypically divergent Aleppo pine accessions with the aims of (i) characterizing the transcriptome through Illumina RNA-Seq on trees phenotypically divergent for adaptive traits linked to fire adaptation and drought, (ii) performing a functional annotation of the assembled transcriptome, (iii) identifying genes with accelerated evolutionary rates, (iv) studying the expression levels of the annotated genes and (v) developing gene-based markers for population genomic and association genetic studies. The assembled transcriptome consisted of 48,629 contigs and covered about 54.6 Mbp. The comparison of Aleppo pine transcripts to Picea sitchensis protein-coding sequences resulted in the detection of 34,014 SNPs across species, with a Ka /Ks average value of 0.216, suggesting that the majority of the assembled genes are under negative selection. Several genes were differentially expressed across the two pine accessions with contrasted phenotypes, including a glutathione-s-transferase, a cellulose synthase and a cobra-like protein. A large number of new markers (3334 amplifiable SSRs and 28,236 SNPs) have been identified which should facilitate future population genomics and association genetics in this species. A 384-SNP Oligo Pool Assay for genotyping with the Illumina VeraCode technology has been designed which showed an high overall SNP conversion rate (76.6%). Our results showed that Illumina next-generation sequencing is a valuable technology to obtain an extensive overview on whole transcriptomes of nonmodel species with large genomes.
Resumo:
We characterize divergence times, intraspecific diversity and distributions for recently recognized lineages within the Hyla arborea species group, based on mitochondrial and nuclear sequences from 160 localities spanning its whole distribution. Lineages of H. arborea, H. orientalis, H. molleri have at least Pliocene age, supporting species level divergence. The genetically uniform Iberian H. molleri, although largely isolated by the Pyrenees, is parapatric to H. arborea, with evidence for successful hybridization in a small Aquitanian corridor (southwestern France), where the distribution also overlaps with H. meridionalis. The genetically uniform H. arborea, spread from Crete to Brittany, exhibits molecular signatures of a postglacial range expansion. It meets different mtDNA clades of H. orientalis in NE-Greece, along the Carpathians, and in Poland along the Vistula River (there including hybridization). The East-European H. orientalis is strongly structured genetically. Five geographic mitochondrial clades are recognized, with a molecular signature of postglacial range expansions for the clade that reached the most northern latitudes. Hybridization with H. savignyi is suggested in southwestern Turkey. Thus, cryptic diversity in these Pliocene Hyla lineages covers three extremes: a genetically poor, quasi-Iberian endemic (H. molleri), a more uniform species distributed from the Balkans to Western Europe (H. arborea), and a well-structured Asia Minor-Eastern European species (H. orientalis).
Resumo:
In patients undergoing non-cardiac surgery, cardiac events are the most common cause of perioperative morbidity and mortality. It is often difficult to choose adequate cardiologic examinations before surgery. This paper, inspired by the guidelines of the European and American societies of cardiology (ESC, AHA, ACC), discusses the place of standard ECG, echocardiography, treadmill or bicycle ergometer and pharmacological stress testing in preoperative evaluations. The role of coronary angiography and prophylactic revascularization will also be discussed. Finally, we provide a decision tree which will be helpful to both general practitioners and specialists.
Resumo:
This paper presents 3-D brain tissue classificationschemes using three recent promising energy minimizationmethods for Markov random fields: graph cuts, loopybelief propagation and tree-reweighted message passing.The classification is performed using the well knownfinite Gaussian mixture Markov Random Field model.Results from the above methods are compared with widelyused iterative conditional modes algorithm. Theevaluation is performed on a dataset containing simulatedT1-weighted MR brain volumes with varying noise andintensity non-uniformities. The comparisons are performedin terms of energies as well as based on ground truthsegmentations, using various quantitative metrics.