53 resultados para statistical classification
em University of Queensland eSpace - Australia
Resumo:
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
Resumo:
Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).
Resumo:
Purpose: To evaluate the clinical features, treatment, and outcomes of a cohort of patients with ocular adnexal lymphoproliferative disease classified according to the World Health Organization modification of the Revised European-American Classification of Lymphoid neoplasms and to perform a robust statistical analysis of these data. Methods: Sixty-nine cases of ocular adnexal lymphoproliferative disease, seen in a tertiary referral center from 1992 to 2003, were included in the study. Lesions were classified by using the World Health Organization modification of the Revised European-American Classification of Lymphoid neoplasms classification. Outcome variables included disease-specific Survival, relapse-free survival, local control, and distant control. Results: Stage IV disease at presentation, aggressive lymphoma histology, the presence of prior or concurrent systemic lymphoma at presentation, and bilateral adnexal disease were significant predictors for reduced disease-specific survival, local control, and distant control. Multivariate analysis found that aggressive histology and bilateral adnexal disease had significantly reduced disease-specific Survival. Conclusions: The typical presentation of adnexal lymphoproliferative disease is with a painless mass, swelling, or proptosis; however, pain and inflammation occurred in 20% and 30% of patients, respectively. Stage at presentation, tumor histology, primary or secondary status, and whether the process was unilateral or bilateral were significant variables for disease outcome. In this study, distant spread of lymphoma was lower in patients who received greater than 20 Gy of orbital radiotherapy.
Resumo:
Aims This paper presents the recommendations, developed from a 3-year consultation process, for a program of research to underpin the development of diagnostic concepts and criteria in the Substance Use Disorders section of the Diagnostic and Statistical Manual of Mental Disorders (DSM) and potentially the relevant section of the next revision of the International Classification of Diseases (ICD). Methods A preliminary list of research topics was developed at the DSM-V Launch Conference in 2004. This led to the presentation of articles on these topics at a specific Substance Use Disorders Conference in February 2005, at the end of which a preliminary list of research questions was developed. This was further refined through an iterative process involving conference participants over the following year. Results Research questions have been placed into four categories: (1) questions that could be addressed immediately through secondary analyses of existing data sets; (2) items likely to require position papers to propose criteria or more focused questions with a view to subsequent analyses of existing data sets; (3) issues that could be proposed for literature reviews, but with a lower probability that these might progress to a data analytic phase; and (4) suggestions or comments that might not require immediate action, but that could be considered by the DSM-V and ICD 11 revision committees as part of their deliberations. Conclusions A broadly based research agenda for the development of diagnostic concepts and criteria for substance use disorders is presented.
Resumo:
Aims paper describes the background to the establishment of the Substance Use Disorders Workgroup, which was charged with developing the research agenda for the development of the next edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM). It summarizes 18 articles that were commissioned to inform that process. Methods A preliminary list of research topics, developed at the DSM-V Launch Conference in 2004, led to the identification of subjects that were subject to formal presentations and detailed discussion at the Substance Use Disorders Conference in February 2005. Results The 18 articles presented in this supplement examine: (1) categorical versus dimensional diagnoses; (2) the neurobiological basis of substance use disorders; (3) social and cultural perspectives; (4) the crosswalk between DSM-IV and the International Classification of Diseases Tenth Revision (ICD-10); (5) comorbidity of substance use disorders and mental health disorders; (6) subtypes of disorders; (7) issues in adolescence; (8) substance-specific criteria; (9) the place of non-substance addictive disorders; and (10) the available research resources. Conclusions In the final paper a broadly based research agenda for the development of diagnostic concepts and criteria for substance use disorders is presented.
Resumo:
Objective: This paper compares four techniques used to assess change in neuropsychological test scores before and after coronary artery bypass graft surgery (CABG), and includes a rationale for the classification of a patient as overall impaired. Methods: A total of 55 patients were tested before and after surgery on the MicroCog neuropsychological test battery. A matched control group underwent the same testing regime to generate test–retest reliabilities and practice effects. Two techniques designed to assess statistical change were used: the Reliable Change Index (RCI), modified for practice, and the Standardised Regression-based (SRB) technique. These were compared against two fixed cutoff techniques (standard deviation and 20% change methods). Results: The incidence of decline across test scores varied markedly depending on which technique was used to describe change. The SRB method identified more patients as declined on most measures. In comparison, the two fixed cutoff techniques displayed relatively reduced sensitivity in the detection of change. Conclusions: Overall change in an individual can be described provided the investigators choose a rational cutoff based on likely spread of scores due to chance. A cutoff value of ≥20% of test scores used provided acceptable probability based on the number of tests commonly encountered. Investigators must also choose a test battery that minimises shared variance among test scores.
Resumo:
We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.
Resumo:
Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.
Resumo:
Developing a unified classification system to replace four of the systems currently used in disability athletics (i.e., track and field) has been widely advocated. The diverse impairments to be included in a unified system require severed assessment methods, results of which cannot be meaningfully compared. Therefore, the taxonomic basis of current classification systems is invalid in a unified system. Biomechanical analysis establishes that force, a vector described in terms of magnitude and direction, is a key determinant of success in all athletic disciplines. It is posited that all impairments to be included in a unified system may be classified as either force magnitude impairments (FMI) or force control impairments (FCI). This framework would provide a valid taxonomic basis for a unified system, creating the opportunity to decrease the number of classes and enhance the viability of disability athletics.
Resumo:
Three main models of parameter setting have been proposed: the Variational model proposed by Yang (2002; 2004), the Structured Acquisition model endorsed by Baker (2001; 2005), and the Very Early Parameter Setting (VEPS) model advanced by Wexler (1998). The VEPS model contends that parameters are set early. The Variational model supposes that children employ statistical learning mechanisms to decide among competing parameter values, so this model anticipates delays in parameter setting when critical input is sparse, and gradual setting of parameters. On the Structured Acquisition model, delays occur because parameters form a hierarchy, with higher-level parameters set before lower-level parameters. Assuming that children freely choose the initial value, children sometimes will miss-set parameters. However when that happens, the input is expected to trigger a precipitous rise in one parameter value and a corresponding decline in the other value. We will point to the kind of child language data that is needed in order to adjudicate among these competing models.