968 resultados para statistical classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., blast and fasta validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a new feature representation method based on the construction of a Confidence Matrix (CM). This representation consists of posterior probability values provided by several weak classifiers, each one trained and used in different sets of features from the original sample. The CM allows the final classifier to abstract itself from discovering underlying groups of features. In this work the CM is applied to isolated character image recognition, for which several set of features can be extracted from each sample. Experimentation has shown that the use of CM permits a significant improvement in accuracy in most cases, while the others remain the same. The results were obtained after experimenting with four well-known corpora, using evolved meta-classifiers with the k-Nearest Neighbor rule as a weak classifier and by applying statistical significance tests.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The elemental analysis of Spanish palm dates by inductively coupled plasma atomic emission spectrometry and inductively coupled plasma mass spectrometry is reported for the first time. To complete the information about the mineral composition of the samples, C, H, and N are determined by elemental analysis. Dates from Israel, Tunisia, Saudi Arabia, Algeria and Iran have also been analyzed. The elemental composition have been used in multivariate statistical analysis to discriminate the dates according to its geographical origin. A total of 23 elements (As, Ba, C, Ca, Cd, Co, Cr, Cu, Fe, H, In, K, Li, Mg, Mn, N, Na, Ni, Pb, Se, Sr, V, and Zn) at concentrations from major to ultra-trace levels have been determined in 13 date samples (flesh and seeds). A careful inspection of the results indicate that Spanish samples show higher concentrations of Cd, Co, Cr, and Ni than the remaining ones. Multivariate statistical analysis of the obtained results, both in flesh and seed, indicate that the proposed approach can be successfully applied to discriminate the Spanish date samples from the rest of the samples tested.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nineteen samples of the Cape Roberts-1 drillcore were taken from Miocene- age deposits, from 90.25 - 146.50 metres below seafloor (mbsf) for thin section and laser grain-size analysis. Using the grain-size distribution, detailed core logging, X-radiography and thin-section analysis of microstructures, coupled with a statistical grouping of the grain-size data, three main styles of gravity-flow sedimentation were revealed. Thin (centimetre-scale) muddy debris-flow deposits are the most common and are possibly tirggered by debris rain-out from sea-ice These deposits are characterised by very poorly sorted, faintly laminated muddy sandstones with coarse granules toward their base. Contacts are gradational to sharp. Variations on this style of mass-wasting deposit are rhythmically stacked sequences of pebbly-coarse sandstones representing successive thin debris-flow events. These suggest very high sedimentation rates on an unstable slope in a shallow-water proximal glacimarine environment. Sandy-silty turbidites appear more common in the lower sections of the core, below approximately 141.00 mbsf, although they occur occasionally with the debris flow deposits The turbidites are characterised by inversely to normally graded, well-laminated siltstones with occasional lonestones, and represent a more distal shallow-water glacimarine environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose: To evaluate the clinical features, treatment, and outcomes of a cohort of patients with ocular adnexal lymphoproliferative disease classified according to the World Health Organization modification of the Revised European-American Classification of Lymphoid neoplasms and to perform a robust statistical analysis of these data. Methods: Sixty-nine cases of ocular adnexal lymphoproliferative disease, seen in a tertiary referral center from 1992 to 2003, were included in the study. Lesions were classified by using the World Health Organization modification of the Revised European-American Classification of Lymphoid neoplasms classification. Outcome variables included disease-specific Survival, relapse-free survival, local control, and distant control. Results: Stage IV disease at presentation, aggressive lymphoma histology, the presence of prior or concurrent systemic lymphoma at presentation, and bilateral adnexal disease were significant predictors for reduced disease-specific survival, local control, and distant control. Multivariate analysis found that aggressive histology and bilateral adnexal disease had significantly reduced disease-specific Survival. Conclusions: The typical presentation of adnexal lymphoproliferative disease is with a painless mass, swelling, or proptosis; however, pain and inflammation occurred in 20% and 30% of patients, respectively. Stage at presentation, tumor histology, primary or secondary status, and whether the process was unilateral or bilateral were significant variables for disease outcome. In this study, distant spread of lymphoma was lower in patients who received greater than 20 Gy of orbital radiotherapy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims This paper presents the recommendations, developed from a 3-year consultation process, for a program of research to underpin the development of diagnostic concepts and criteria in the Substance Use Disorders section of the Diagnostic and Statistical Manual of Mental Disorders (DSM) and potentially the relevant section of the next revision of the International Classification of Diseases (ICD). Methods A preliminary list of research topics was developed at the DSM-V Launch Conference in 2004. This led to the presentation of articles on these topics at a specific Substance Use Disorders Conference in February 2005, at the end of which a preliminary list of research questions was developed. This was further refined through an iterative process involving conference participants over the following year. Results Research questions have been placed into four categories: (1) questions that could be addressed immediately through secondary analyses of existing data sets; (2) items likely to require position papers to propose criteria or more focused questions with a view to subsequent analyses of existing data sets; (3) issues that could be proposed for literature reviews, but with a lower probability that these might progress to a data analytic phase; and (4) suggestions or comments that might not require immediate action, but that could be considered by the DSM-V and ICD 11 revision committees as part of their deliberations. Conclusions A broadly based research agenda for the development of diagnostic concepts and criteria for substance use disorders is presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims paper describes the background to the establishment of the Substance Use Disorders Workgroup, which was charged with developing the research agenda for the development of the next edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM). It summarizes 18 articles that were commissioned to inform that process. Methods A preliminary list of research topics, developed at the DSM-V Launch Conference in 2004, led to the identification of subjects that were subject to formal presentations and detailed discussion at the Substance Use Disorders Conference in February 2005. Results The 18 articles presented in this supplement examine: (1) categorical versus dimensional diagnoses; (2) the neurobiological basis of substance use disorders; (3) social and cultural perspectives; (4) the crosswalk between DSM-IV and the International Classification of Diseases Tenth Revision (ICD-10); (5) comorbidity of substance use disorders and mental health disorders; (6) subtypes of disorders; (7) issues in adolescence; (8) substance-specific criteria; (9) the place of non-substance addictive disorders; and (10) the available research resources. Conclusions In the final paper a broadly based research agenda for the development of diagnostic concepts and criteria for substance use disorders is presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: This paper compares four techniques used to assess change in neuropsychological test scores before and after coronary artery bypass graft surgery (CABG), and includes a rationale for the classification of a patient as overall impaired. Methods: A total of 55 patients were tested before and after surgery on the MicroCog neuropsychological test battery. A matched control group underwent the same testing regime to generate test–retest reliabilities and practice effects. Two techniques designed to assess statistical change were used: the Reliable Change Index (RCI), modified for practice, and the Standardised Regression-based (SRB) technique. These were compared against two fixed cutoff techniques (standard deviation and 20% change methods). Results: The incidence of decline across test scores varied markedly depending on which technique was used to describe change. The SRB method identified more patients as declined on most measures. In comparison, the two fixed cutoff techniques displayed relatively reduced sensitivity in the detection of change. Conclusions: Overall change in an individual can be described provided the investigators choose a rational cutoff based on likely spread of scores due to chance. A cutoff value of ≥20% of test scores used provided acceptable probability based on the number of tests commonly encountered. Investigators must also choose a test battery that minimises shared variance among test scores.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Using methods of Statistical Physics, we investigate the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks. For nonlinear classification rules, the generalization error saturates on a plateau, when the number of examples is too small to properly estimate the coefficients of the nonlinear part. When trained on simple rules, we find that SVMs overfit only weakly. The performance of SVMs is strongly enhanced, when the distribution of the inputs has a gap in feature space.