980 resultados para Statistical testing
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
Background: Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both.Results: Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors.Conclusion: We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Resumo:
This paper examines statistical analysis of social reciprocity, that is, the balance between addressing and receiving behaviour in social interactions. Specifically, it focuses on the measurement of social reciprocity by means of directionality and skew-symmetry statistics at different levels. Two statistics have been used as overall measures of social reciprocity at group level: the directional consistency and the skew-symmetry statistics. Furthermore, the skew-symmetry statistic allows social researchers to obtain complementary information at dyadic and individual levels. However, having computed these measures, social researchers may be interested in testing statistical hypotheses regarding social reciprocity. For this reason, it has been developed a statistical procedure, based on Monte Carlo sampling, in order to allow social researchers to describe groups and make statistical decisions.
Resumo:
Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices.
Resumo:
This paper uses Colombian household survey data collected over the period 1984-2005 to estimate Gini coe¢ cients along with their corresponding standard errors. We Önd a statistically signiÖcant increase in wage income inequality following the adoption of the liberalisation measures of the early 1990s, and mixed evidence during the recovery years that followed the economic recession of the late 1990s. We also Önd that in several cases the observed di§erences in the Gini coe¢ cients across cities have not been statistically signiÖcant.
Resumo:
The question as to whether it is better to diversify a real estate portfolio within a property type across the regions or within a region across the property types is one of continuing interest for academics and practitioners alike. The current study, however, is somewhat different from the usual sector/regional analysis taking account of the fact that holdings in the UK real estate market are heavily concentrated in a single region, London. As a result this study is designed to investigate whether a real estate fund manager can obtain a statistically significant improvement in risk/return performance from extending out of a London based portfolio into firstly the rest of the South East of England and then into the remainder of the UK, or whether the manger would be better off staying within London and diversifying across the various property types. The results indicating that staying within London and diversifying across the various property types may offer performance comparable with regional diversification, although this conclusion largely depends on the time period and the fund manager’s ability to diversify efficiently.
Resumo:
The use of inter-laboratory test comparisons to determine the performance of individual laboratories for specific tests (or for calibration) [ISO/IEC Guide 43-1, 1997. Proficiency testing by interlaboratory comparisons - Part 1: Development and operation of proficiency testing schemes] is called Proficiency Testing (PT). In this paper we propose the use of the generalized likelihood ratio test to compare the performance of the group of laboratories for specific tests relative to the assigned value and illustrate the procedure considering an actual data from the PT program in the area of volume. The proposed test extends the test criteria in use allowing to test for the consistency of the group of laboratories. Moreover, the class of elliptical distributions are considered for the obtained measurements. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
As the development of genotyping and next-generation sequencing technologies, multi-marker testing in genome-wide association study and rare variant association study became active research areas in statistical genetics. This dissertation contains three methodologies for association study by exploring different genetic data features and demonstrates how to use those methods to test genetic association hypothesis. The methods can be categorized into in three scenarios: 1) multi-marker testing for strong Linkage Disequilibrium regions, 2) multi-marker testing for family-based association studies, 3) multi-marker testing for rare variant association study. I also discussed the advantage of using these methods and demonstrated its power by simulation studies and applications to real genetic data.
Resumo:
Mode of access: Internet.
Resumo:
"Notes prepared by Ralph J. Brookner."
Resumo:
Mode of access: Internet.
Resumo:
We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
A procedure for calculating critical level and power of likelihood ratio test, based on a Monte-Carlo simulation method is proposed. General principles of software building for its realization are given. Some examples of its application are shown.
Resumo:
In this Letter, we propose a new and model-independent cosmological test for the distance-duality (DD) relation, eta = D(L)(z)(1 + z)(-2)/D(A)(z) = 1, where D(L) and D(A) are, respectively, the luminosity and angular diameter distances. For D(L) we consider two sub-samples of Type Ia supernovae (SNe Ia) taken from Constitution data whereas D(A) distances are provided by two samples of galaxy clusters compiled by De Filippis et al. and Bonamente et al. by combining Sunyaev-Zeldovich effect and X-ray surface brightness. The SNe Ia redshifts of each sub-sample were carefully chosen to coincide with the ones of the associated galaxy cluster sample (Delta z < 0.005), thereby allowing a direct test of the DD relation. Since for very low redshifts, D(A)(z) approximate to D(L)(z), we have tested the DD relation by assuming that. is a function of the redshift parameterized by two different expressions: eta(z) = 1 + eta(0)z and eta(z) = 1 +eta(0)z/(1 + z), where eta(0) is a constant parameter quantifying a possible departure from the strict validity of the reciprocity relation (eta(0) = 0). In the best scenario (linear parameterization), we obtain eta(0) = -0.28(-0.44)(+0.44) (2 sigma, statistical + systematic errors) for the De Filippis et al. sample (elliptical geometry), a result only marginally compatible with the DD relation. However, for the Bonamente et al. sample (spherical geometry) the constraint is eta(0) = -0.42(-0.34)(+0.34) (3 sigma, statistical + systematic errors), which is clearly incompatible with the duality-distance relation.
Resumo:
This field study was a combined chemical and biological investigation of the relative effects of using dispersants to treat oil spills impacting mangrove habitats. The aim of the chemistry was to determine whether dispersant affected the short- or long-term composition of a medium range crude oil (Gippsland) stranded in a tropical mangrove environment in Queensland, Australia. Sediment cores from three replicate plots of each treatment (oil only and oil plus dispersant) were analyzed for total hydrocarbons and for individual molecular markers (alkanes, aromatics, triterpanes, and steranes). Sediments were collected at 2 days, then 1, 7, 13 and 22 months post-spill. Over this time, oil in the six treated plots decreased exponentially from 36.6 +/- 16.5 to 1.2 +/- 0.8 mg/g dry wt. There was no statistical difference in initial oil concentrations, penetration of oil to depth, or in the rates of oil dissipation between oiled or dispersed oil plots. At 13 months, alkanes were >50% degraded, aromatics were similar to 30% degraded based upon ratios of labile to resistant markers. However, there was no change in the triterpane or sterane biomarker signatures of the retained oil. This is of general forensic interest for pollution events. The predominant removal processes were evaporation (less than or equal to 27%) and dissolution (greater than or equal to 56%), with a lag-phase of 1 month before the start of significant microbial degradation (less than or equal to 7%). The most resistant fraction of the oil that remained after 7 months (the higher molecular weight hydrocarbons) correlated with the initial total organic carbon content of the soil. Removal rate in the Queensland mangroves was significantly faster than that observed in the Caribbean and was related to tidal flushing. (C) 1999 Elsevier Science Ltd. All rights reserved.