12 resultados para Functional Classification Trees

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Zinc-finger-containing proteins can be classified into evolutionary and functionally divergent protein families that share one or more domains in which a zinc ion is tetrahedrally coordinated by cysteines and histidines. The zinc finger domain defines one of the largest protein superfamilies in mammalian genomes; 46 different conserved zinc finger domains are listed in InterPro (http://www.ebi.ac.uk/InterPro). Zinc finger proteins can bind to DNA, RNA, other proteins, or lipids as a modular domain in combination with other conserved structures. Owing to this combinatorial diversity, different members of zinc finger superfamilies contribute to many distinct cellular processes, including transcriptional regulation, mRNA stability and processing, and protein turnover. Accordingly, mutations of zinc finger genes lead to aberrations in a broad spectrum of biological processes such as development, differentiation, apoptosis, and immunological responses. This study provides the first comprehensive classification of zinc finger proteins in a mammalian transcriptome. Specific detailed analysis of the SP/Kruppel-like factors and the E3 ubiquitin-ligase RING-H2 families illustrates the importance of such an analysis for a more comprehensive functional classification of large protein families. We describe the characterization of a new family of C2H2 zinc-finger-containing proteins and a new conserved domain characteristic of this family, the identification and characterization of Sp8, a new member of the Sp family of transcriptional regulators, and the identification of five new RING-H2 proteins.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In a first step toward understanding the molecular basis of pineapple fruit development, a sequencing project was initiated to survey a range of expressed sequences from green unripe and yellow ripe fruit tissue. A highly abundant metallothionein transcript was identified during library construction, and was estimated to account for up to 50% of all EST library clones. Library clones with metallothionein subtracted were sequenced, and 408 unripe green and 1140 ripe yellow edited EST clone sequences were retrieved. Clone redundancy was high, with the combined 1548 clone sequences clustering into just 634 contigs comprising 191 consensus sequences and 443 singletons. Half of the EST clone sequences clustered within 13.5% and 9.3% of contigs from green unripe and yellow ripe libraries, respectively, indicating that a small subset of genes dominate the majority of the transcriptome. Furthermore, sequence cluster analysis, northern analysis, and functional classification revealed major differences between genes expressed in the unripe green and ripe yellow fruit tissues. Abundant genes identified from the green fruit include a fruit bromelain and a bromelain inhibitor. Abundant genes identified in the yellow fruit library include a MADS box gene, and several genes normally associated with protein synthesis, including homologues of ribosomal L10 and the translation factors SUI1 and eIF5A. Both the green unripe and yellow ripe libraries contained high proportions of clones associated with oxidative stress responses and the detoxification of free radicals.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background. The factors behind the reemergence of severe, invasive group A streptococcal (GAS) diseases are unclear, but it could be caused by altered genetic endowment in these organisms. However, data from previous studies assessing the association between single genetic factors and invasive disease are often conflicting, suggesting that other, as-yet unidentified factors are necessary for the development of this class of disease. Methods. In this study, we used a targeted GAS virulence microarray containing 226 GAS genes to determine the virulence gene repertoires of 68 GAS isolates (42 associated with invasive disease and 28 associated with noninvasive disease) collected in a defined geographic location during a contiguous time period. We then employed 3 advanced machine learning methods (genetic algorithm neural network, support vector machines, and classification trees) to identify genes with an increased association with invasive disease. Results. Virulence gene profiles of individual GAS isolates varied extensively among these geographically and temporally related strains. Using genetic algorithm neural network analysis, we identified 3 genes with a marginal overrepresentation in invasive disease isolates. Significantly, 2 of these genes, ssa and mf4, encoded superantigens but were only present in a restricted set of GAS M-types. The third gene, spa, was found in variable distributions in all M-types in the study. Conclusions. Our comprehensive analysis of GAS virulence profiles provides strong evidence for the incongruent relationships among any of the 226 genes represented on the array and the overall propensity of GAS to cause invasive disease, underscoring the pathogenic complexity of these diseases, as well as the importance of multiple bacteria and/ or host factors.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An expanding human population and associated demands for goods and services continues to exert an increasing pressure on ecological systems. Although the rate of expansion of agricultural lands has slowed since 1960, rapid deforestation still occurs in many tropical countries, including Colombia. However, the location and extent of deforestation and associated ecological impacts within tropical countries is often not well known. The primary aim of this study was to obtain an understanding of the spatial patterns of forest conversion for agricultural land uses in Colombia. We modeled native forest conversion in Colombia at regional and national-levels using logistic regression and classification trees. We investigated the impact of ignoring the regional variability of model parameters, and identified biophysical and socioeconomic factors that best explain the current spatial pattern and inter-regional variation in forest cover. We validated our predictions for the Amazon region using MODIS satellite imagery. The regional-level classification tree that accounted for regional heterogeneity had the greatest discrimination ability. Factors related to accessibility (distance to roads and towns) were related to the presence of forest cover, although this relationship varied regionally. In order to identify areas with a high risk of deforestation, we used predictions from the best model, refined by areas with rural population growth rates of > 2%. We ranked forest ecosystem types in terms of levels of threat of conversion. Our results provide useful inputs to planning for biodiversity conservation in Colombia, by identifying areas and ecosystem types that are vulnerable to deforestation. Several of the predicted deforestation hotspots coincide with areas that are outstanding in terms of biodiversity value.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background and Purpose - Although implemented in 1998, no research has examined how well the Australian National Subacute and Nonacute Patient (AN-SNAP) Casemix Classification predicts length of stay (LOS), discharge destination, and functional improvement in public hospital stroke rehabilitation units in Australia. Methods - 406 consecutive admissions to 3 stroke rehabilitation units in Queensland, Australia were studied. Sociode-mographic, clinical, and functional data were collected. General linear modeling and logistic regression were used to assess the ability of AN-SNAP to predict outcomes. Results - AN-SNAP significantly predicted each outcome. There were clear relationships between the outcomes of longer LOS, poorer functional improvement and discharge into care, and the AN-SNAP classes that reflected poorer functional ability and older age. Other predictors included living situation, acute LOS, comorbidity, and stroke type. Conclusions - AN-SNAP is a consistent predictor of LOS, functional change and discharge destination, and has utility in assisting clinicians to set rehabilitation goals and plan discharge.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methodological criticisms of research undertaken in the area of paediatric burns are widespread. To date, quasi-experimental research designs have most frequently been used to examine the impact of impairments such as scarring and reduced ran e of motion on functional outcomes. Predominantly, these studies have utilised a narrow definition of functioning (e.g. school attendance) to determine a child's level of participation in activities post-burn injury. Until recently, there had been little attempt to develop and/or test a theoretical model of functional outcome with these children. Using a conceptual model of functional outcome based oil the International Classification of Functioning, Disability and Health, this review paper outlines the current state of the research literature and presents explanatory case study methodology as an alternative research design to further advance the Study of functional outcome post-burn injury. (C) 2004 Elsevier Ltd and ISBI. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scorpion toxins are common experimental tools for studies of biochemical and pharmacological properties of ion channels. The number of functionally annotated scorpion toxins is steadily growing, but the number of identified toxin sequences is increasing at much faster pace. With an estimated 100,000 different variants, bioinformatic analysis of scorpion toxins is becoming a necessary tool for their systematic functional analysis. Here, we report a bioinformatics-driven system involving scorpion toxin structural classification, functional annotation, database technology, sequence comparison, nearest neighbour analysis, and decision rules which produces highly accurate predictions of scorpion toxin functional properties. (c) 2005 Elsevier Inc. All rights reserved.