995 resultados para Statistical bias
Resumo:
The proportional odds model provides a powerful tool for analysing ordered categorical data and setting sample size, although for many clinical trials its validity is questionable. The purpose of this paper is to present a new class of constrained odds models which includes the proportional odds model. The efficient score and Fisher's information are derived from the profile likelihood for the constrained odds model. These results are new even for the special case of proportional odds where the resulting statistics define the Mann-Whitney test. A strategy is described involving selecting one of these models in advance, requiring assumptions as strong as those underlying proportional odds, but allowing a choice of such models. The accuracy of the new procedure and its power are evaluated.
Resumo:
Proportion estimators are quite frequently used in many application areas. The conventional proportion estimator (number of events divided by sample size) encounters a number of problems when the data are sparse as will be demonstrated in various settings. The problem of estimating its variance when sample sizes become small is rarely addressed in a satisfying framework. Specifically, we have in mind applications like the weighted risk difference in multicenter trials or stratifying risk ratio estimators (to adjust for potential confounders) in epidemiological studies. It is suggested to estimate p using the parametric family (see PDF for character) and p(1 - p) using (see PDF for character), where (see PDF for character). We investigate the estimation problem of choosing c 0 from various perspectives including minimizing the average mean squared error of (see PDF for character), average bias and average mean squared error of (see PDF for character). The optimal value of c for minimizing the average mean squared error of (see PDF for character) is found to be independent of n and equals c = 1. The optimal value of c for minimizing the average mean squared error of (see PDF for character) is found to be dependent of n with limiting value c = 0.833. This might justifiy to use a near-optimal value of c = 1 in practice which also turns out to be beneficial when constructing confidence intervals of the form (see PDF for character).
Resumo:
We consider the case of a multicenter trial in which the center specific sample sizes are potentially small. Under homogeneity, the conventional procedure is to pool information using a weighted estimator where the weights used are inverse estimated center-specific variances. Whereas this procedure is efficient for conventional asymptotics (e. g. center-specific sample sizes become large, number of center fixed), it is commonly believed that the efficiency of this estimator holds true also for meta-analytic asymptotics (e.g. center-specific sample size bounded, potentially small, and number of centers large). In this contribution we demonstrate that this estimator fails to be efficient. In fact, it shows a persistent bias with increasing number of centers showing that it isnot meta-consistent. In addition, we show that the Cochran and Mantel-Haenszel weighted estimators are meta-consistent and, in more generality, provide conditions on the weights such that the associated weighted estimator is meta-consistent.
Resumo:
OBJECTIVES: This contribution provides a unifying concept for meta-analysis integrating the handling of unobserved heterogeneity, study covariates, publication bias and study quality. It is important to consider these issues simultaneously to avoid the occurrence of artifacts, and a method for doing so is suggested here. METHODS: The approach is based upon the meta-likelihood in combination with a general linear nonparametric mixed model, which lays the ground for all inferential conclusions suggested here. RESULTS: The concept is illustrated at hand of a meta-analysis investigating the relationship of hormone replacement therapy and breast cancer. The phenomenon of interest has been investigated in many studies for a considerable time and different results were reported. In 1992 a meta-analysis by Sillero-Arenas et al. concluded a small, but significant overall effect of 1.06 on the relative risk scale. Using the meta-likelihood approach it is demonstrated here that this meta-analysis is due to considerable unobserved heterogeneity. Furthermore, it is shown that new methods are available to model this heterogeneity successfully. It is argued further to include available study covariates to explain this heterogeneity in the meta-analysis at hand. CONCLUSIONS: The topic of HRT and breast cancer has again very recently become an issue of public debate, when results of a large trial investigating the health effects of hormone replacement therapy were published indicating an increased risk for breast cancer (risk ratio of 1.26). Using an adequate regression model in the previously published meta-analysis an adjusted estimate of effect of 1.14 can be given which is considerably higher than the one published in the meta-analysis of Sillero-Arenas et al. In summary, it is hoped that the method suggested here contributes further to a good meta-analytic practice in public health and clinical disciplines.
Resumo:
This paper investigates the applications of capture-recapture methods to human populations. Capture-recapture methods are commonly used in estimating the size of wildlife populations but can also be used in epidemiology and social sciences, for estimating prevalence of a particular disease or the size of the homeless population in a certain area. Here we focus on estimating the prevalence of infectious diseases. Several estimators of population size are considered: the Lincoln-Petersen estimator and its modified version, the Chapman estimator, Chao's lower bound estimator, the Zelterman's estimator, McKendrick's moment estimator and the maximum likelihood estimator. In order to evaluate these estimators, they are applied to real, three-source, capture-recapture data. By conditioning on each of the sources of three source data, we have been able to compare the estimators with the true value that they are estimating. The Chapman and Chao estimators were compared in terms of their relative bias. A variance formula derived through conditioning is suggested for Chao's estimator, and normal 95% confidence intervals are calculated for this and the Chapman estimator. We then compare the coverage of the respective confidence intervals. Furthermore, a simulation study is included to compare Chao's and Chapman's estimator. Results indicate that Chao's estimator is less biased than Chapman's estimator unless both sources are independent. Chao's estimator has also the smaller mean squared error. Finally, the implications and limitations of the above methods are discussed, with suggestions for further development.
Resumo:
BACKGROUND: The widespread occurrence of feminized male fish downstream of some wastewater treatment works has led to substantial interest from ecologists and public health professionals. This concern stems from the view that the effects observed have a parallel in humans, and that both phenomena are caused by exposure to mixtures of contaminants that interfere with reproductive development. The evidence for a "wildlife-human connection" is, however, weak: Testicular dysgenesis syndrome, seen in human males, is most easily reproduced in rodent models by exposure to mixtures of antiandrogenic chemicals. In contrast, the accepted explanation for feminization of wild male fish is that it results mainly from exposure to steroidal estrogens originating primarily from human excretion. OBJECTIVES: We sought to further explore the hypothesis that endocrine disruption in fish is multi-causal, resulting from exposure to mixtures of chemicals with both estrogenic and antiandrogenic properties. METHODS: We used hierarchical generalized linear and generalized additive statistical modeling to explore the associations between modeled concentrations and activities of estrogenic and antiandrogenic chemicals in 30 U.K. rivers and feminized responses seen in wild fish living in these rivers. RESULTS: In addition to the estrogenic substances, antiandrogenic activity was prevalent in almost all treated sewage effluents tested. Further, the results of the modeling demonstrated that feminizing effects in wild fish could be best modeled as a function of their predicted exposure to both anti-androgens and estrogens or to antiandrogens alone. CONCLUSION: The results provide a strong argument for a multicausal etiology of widespread feminization of wild fish in U.K. rivers involving contributions from both steroidal estrogens and xeno-estrogens and from other (as yet unknown) contaminants with antiandrogenic properties. These results may add farther credence to the hypothesis that endocrine-disrupting effects seen in wild fish and in humans are caused by similar combinations of endocrine-disrupting chemical cocktails.
Resumo:
In conventional phylogeographic studies, historical demographic processes are elucidated from the geographical distribution of individuals represented on an inferred gene tree. However, the interpretation of gene trees in this context can be difficult as the same demographic/geographical process can randomly lead to multiple different genealogies. Likewise, the same gene trees can arise under different demographic models. This problem has led to the emergence of many statistical methods for making phylogeographic inferences. A popular phylogeographic approach based on nested clade analysis is challenged by the fact that a certain amount of the interpretation of the data is left to the subjective choices of the user, and it has been argued that the method performs poorly in simulation studies. More rigorous statistical methods based on coalescence theory have been developed. However, these methods may also be challenged by computational problems or poor model choice. In this review, we will describe the development of statistical methods in phylogeographic analysis, and discuss some of the challenges facing these methods.
Resumo:
Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real dataset, both with complex evolutionary scenarios, illustrates the main possibilities of DIY ABC.
Resumo:
There is great interest in using amplified fragment length polymorphism (AFLP) markers because they are inexpensive and easy to produce. It is, therefore, possible to generate a large number of markers that have a wide coverage of species genotnes. Several statistical methods have been proposed to study the genetic structure using AFLP's but they assume Hardy-Weinberg equilibrium and do not estimate the inbreeding coefficient, F-IS. A Bayesian method has been proposed by Holsinger and colleagues that relaxes these simplifying assumptions but we have identified two sources of bias that can influence estimates based on these markers: (i) the use of a uniform prior on ancestral allele frequencies and (ii) the ascertainment bias of AFLP markers. We present a new Bayesian method that avoids these biases by using an implementation based on the approximate Bayesian computation (ABC) algorithm. This new method estimates population-specific F-IS and F-ST values and offers users the possibility of taking into account the criteria for selecting the markers that are used in the analyses. The software is available at our web site (http://www-leca.uif-grenoble.fi-/logiciels.htm). Finally, we provide advice on how to avoid the effects of ascertainment bias.
Resumo:
An appropriate model of recent human evolution is not only important to understand our own history, but it is necessary to disentangle the effects of demography and selection on genome diversity. Although most genetic data support the view that our species originated recently in Africa, it is still unclear if it completely replaced former members of the Homo genus, or if some interbreeding occurred during its range expansion. Several scenarios of modern human evolution have been proposed on the basis of molecular and paleontological data, but their likelihood has never been statistically assessed. Using DNA data from 50 nuclear loci sequenced in African, Asian and Native American samples, we show here by extensive simulations that a simple African replacement model with exponential growth has a higher probability (78%) as compared with alternative multiregional evolution or assimilation scenarios. A Bayesian analysis of the data under this best supported model points to an origin of our species approximate to 141 thousand years ago (Kya), an exit out-of-Africa approximate to 51 Kya, and a recent colonization of the Americas approximate to 10.5 Kya. We also find that the African replacement model explains not only the shallow ancestry of mtDNA or Y-chromosomes but also the occurrence of deep lineages at some autosomal loci, which has been formerly interpreted as a sign of interbreeding with Homo erectus.
Resumo:
Background: We report an analysis of a protein network of functionally linked proteins, identified from a phylogenetic statistical analysis of complete eukaryotic genomes. Phylogenetic methods identify pairs of proteins that co-evolve on a phylogenetic tree, and have been shown to have a high probability of correctly identifying known functional links. Results: The eukaryotic correlated evolution network we derive displays the familiar power law scaling of connectivity. We introduce the use of explicit phylogenetic methods to reconstruct the ancestral presence or absence of proteins at the interior nodes of a phylogeny of eukaryote species. We find that the connectivity distribution of proteins at the point they arise on the tree and join the network follows a power law, as does the connectivity distribution of proteins at the time they are lost from the network. Proteins resident in the network acquire connections over time, but we find no evidence that 'preferential attachment' - the phenomenon of newly acquired connections in the network being more likely to be made to proteins with large numbers of connections - influences the network structure. We derive a 'variable rate of attachment' model in which proteins vary in their propensity to form network interactions independently of how many connections they have or of the total number of connections in the network, and show how this model can produce apparent power-law scaling without preferential attachment. Conclusion: A few simple rules can explain the topological structure and evolutionary changes to protein-interaction networks: most change is concentrated in satellite proteins of low connectivity and small phenotypic effect, and proteins differ in their propensity to form attachments. Given these rules of assembly, power law scaled networks naturally emerge from simple principles of selection, yielding protein interaction networks that retain a high-degree of robustness on short time scales and evolvability on longer evolutionary time scales.
Resumo:
A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.