577 resultados para biostatistics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mendelian models can predict who carries an inherited deleterious mutation of known disease genes based on family history. For example, the BRCAPRO model is commonly used to identify families who carry mutations of BRCA1 and BRCA2, based on familial breast and ovarian cancers. These models incorporate the age of diagnosis of diseases in relatives and current age or age of death. We develop a rigorous foundation for handling multiple diseases with censoring. We prove that any disease unrelated to mutations can be excluded from the model, unless it is sufficiently common and dependent on a mutation-related disease time. Furthermore, if a family member has a disease with higher probability density among mutation carriers, but the model does not account for it, then the carrier probability is deflated. However, even if a family only has diseases the model accounts for, if the model excludes a mutation-related disease, then the carrier probability will be inflated. In light of these results, we extend BRCAPRO to account for surviving all non-breast/ovary cancers as a single outcome. The extension also enables BRCAPRO to extract more useful information from male relatives. Using 1500 familes from the Cancer Genetics Network, accounting for surviving other cancers improves BRCAPRO’s concordance index from 0.758 to 0.762 (p = 0.046), improves its positive predictive value from 35% to 39% (p < 10−6) without impacting its negative predictive value, and improves its overall calibration, although calibration slightly worsens for those with carrier probability < 10%. Copyright c 2000 John Wiley & Sons, Ltd.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of many genetic studies is to locate the genomic regions (called quantitative trait loci, QTLs) that contribute to variation in a quantitative trait (such as body weight). Confidence intervals for the locations of QTLs are particularly important for the design of further experiments to identify the gene or genes responsible for the effect. Likelihood support intervals are the most widely used method to obtain confidence intervals for QTL location, but the non-parametric bootstrap has also been recommended. Through extensive computer simulation, we show that bootstrap confidence intervals are poorly behaved and so should not be used in this context. The profile likelihood (or LOD curve) for QTL location has a tendency to peak at genetic markers, and so the distribution of the maximum likelihood estimate (MLE) of QTL location has the unusual feature of point masses at genetic markers; this contributes to the poor behavior of the bootstrap. Likelihood support intervals and approximate Bayes credible intervals, on the other hand, are shown to behave appropriately.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Microarray technology is a powerful tool able to measure RNA expression for thousands of genes at once. Various studies have been published comparing competing platforms with mixed results: some find agreement, others do not. As the number of researchers starting to use microarrays and the number of crossplatform meta-analysis studies rapidly increase, appropriate platform assessments become more important. Here we present results from a comparison study that offers important improvements over those previously described in the literature. In particular, we notice that none of the previously published papers consider differences between labs. For this paper, a consortium of ten labs from the Washington DC/Baltimore (USA) area was formed to compare three heavily used platforms using identical RNA samples: Appropriate statistical analysis demonstrates that relatively large differences exist between labs using the same platform, but that the results from the best performing labs agree rather well. Supplemental material is available from http://www.biostat.jhsph.edu/~ririzarr/techcomp/

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The affected sib/relative pair (ASP/ARP) design is often used with covariates to find genes that can cause a disease in pathways other than through those covariates. However, such "covariates" can themselves have genetic determinants, and the validity of existing methods has so far only been argued under implicit assumptions. We propose an explicit causal formulation of the problem using potential outcomes and principal stratification. The general role of this formulation is to identify and separate the meaning of the different assumptions that can provide valid causal inference in linkage analysis. This separation helps to (a) develop better methods under explicit assumptions, and (b) show the different ways in which these assumptions can fail, which is necessary for developing further specific designs to test these assumptions and confirm or improve the inference. Using this formulation in the specific problem above, we show that, when the "covariate" (e.g., addiction to smoking) also has genetic determinants, then existing methods, including those previously thought as valid, can declare linkage between the disease and marker loci even when no such linkage exists. We also introduce design strategies to address the problem.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We derive a new class of iterative schemes for accelerating the convergence of the EM algorithm, by exploiting the connection between fixed point iterations and extrapolation methods. First, we present a general formulation of one-step iterative schemes, which are obtained by cycling with the extrapolation methods. We, then square the one-step schemes to obtain the new class of methods, which we call SQUAREM. Squaring a one-step iterative scheme is simply applying it twice within each cycle of the extrapolation method. Here we focus on the first order or rank-one extrapolation methods for two reasons, (1) simplicity, and (2) computational efficiency. In particular, we study two first order extrapolation methods, the reduced rank extrapolation (RRE1) and minimal polynomial extrapolation (MPE1). The convergence of the new schemes, both one-step and squared, is non-monotonic with respect to the residual norm. The first order one-step and SQUAREM schemes are linearly convergent, like the EM algorithm but they have a faster rate of convergence. We demonstrate, through five different examples, the effectiveness of the first order SQUAREM schemes, SqRRE1 and SqMPE1, in accelerating the EM algorithm. The SQUAREM schemes are also shown to be vastly superior to their one-step counterparts, RRE1 and MPE1, in terms of computational efficiency. The proposed extrapolation schemes can fail due to the numerical problems of stagnation and near breakdown. We have developed a new hybrid iterative scheme that combines the RRE1 and MPE1 schemes in such a manner that it overcomes both stagnation and near breakdown. The squared first order hybrid scheme, SqHyb1, emerges as the iterative scheme of choice based on our numerical experiments. It combines the fast convergence of the SqMPE1, while avoiding near breakdowns, with the stability of SqRRE1, while avoiding stagnations. The SQUAREM methods can be incorporated very easily into an existing EM algorithm. They only require the basic EM step for their implementation and do not require any other auxiliary quantities such as the complete data log likelihood, and its gradient or hessian. They are an attractive option in problems with a very large number of parameters, and in problems where the statistical model is complex, the EM algorithm is slow and each EM step is computationally demanding.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There are numerous statistical methods for quantitative trait linkage analysis in human studies. An ideal such method would have high power to detect genetic loci contributing to the trait, would be robust to non-normality in the phenotype distribution, would be appropriate for general pedigrees, would allow the incorporation of environmental covariates, and would be appropriate in the presence of selective sampling. We recently described a general framework for quantitative trait linkage analysis, based on generalized estimating equations, for which many current methods are special cases. This procedure is appropriate for general pedigrees and easily accommodates environmental covariates. In this paper, we use computer simulations to investigate the power robustness of a variety of linkage test statistics built upon our general framework. We also propose two novel test statistics that take account of higher moments of the phenotype distribution, in order to accommodate non-normality. These new linkage tests are shown to have high power and to be robust to non-normality. While we have not yet examined the performance of our procedures in the context of selective sampling via computer simulations, the proposed tests satisfy all of the other qualities of an ideal quantitative trait linkage analysis method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Assessments of environmental and territorial justice are similar in that both assess whether empirical relations between the spatial arrangement of undesirable hazards (or desirable public goods and services) and socio-demographic groups are consistent with notions of social justice, evaluating the spatial distribution of benefits and burdens (outcome equity) and the process that produces observed differences (process equity. Using proximity to major highways in NYC as a case study, we review methodological issues pertinent to both fields and discuss choice and computation of exposure measures, but focus primarily on measures of inequity. We present inequity measures computed from the empirically estimated joint distribution of exposure and demographics and compare them to traditional measures such as linear regression, logistic regression and Theil’s entropy index. We find that measures computed from the full joint distribution provide more unified, transparent and intuitive operational definitions of inequity and show how the approach can be used to structure siting and decommissioning decisions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Smoothing splines are a popular approach for non-parametric regression problems. We use periodic smoothing splines to fit a periodic signal plus noise model to data for which we assume there are underlying circadian patterns. In the smoothing spline methodology, choosing an appropriate smoothness parameter is an important step in practice. In this paper, we draw a connection between smoothing splines and REACT estimators that provides motivation for the creation of criteria for choosing the smoothness parameter. The new criteria are compared to three existing methods, namely cross-validation, generalized cross-validation, and generalization of maximum likelihood criteria, by a Monte Carlo simulation and by an application to the study of circadian patterns. For most of the situations presented in the simulations, including the practical example, the new criteria out-perform the three existing criteria.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The NMMAPS data package contains daily mortality, air pollution, and weather data originally assembled as part of the National Morbidity,Mortality, and Air Pollution Study (NMMAPS). The data have recently been updated and are available for 108 United States cities for the years 1987--2000. The package provides tools for building versions of the full database in a structured and reproducible manner. These database derivatives may be more suitable for particular analyses. We describe how to use the package to implement a multi-city time series analysis of mortality and PM(10). In addition we demonstrate how to reproduce recent findings based on the NMMAPS data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We describe a Bayesian method for estimating the number of essential genes in a genome, on the basis of data on viable mutants for which a single transposon was inserted after a random TA site in a genome,potentially disrupting a gene. The prior distribution for the number of essential genes was taken to be uniform. A Gibbs sampler was used to estimate the posterior distribution. The method is illustrated with simulated data. Further simulations were used to study the performance of the procedure.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recombinant inbred lines (RILs) can serve as powerful tools for genetic mapping. Recently, members of the Complex Trait Consortium have proposed the development of a large panel of eight-way RILs in the mouse, derived from eight genetically diverse parental strains. Such a panel would be a valuable community resource. The use of such eight-way RILs will require a detailed understanding of the relationship between alleles at linked loci on an RI chromosome. We extend the work of Haldane and Waddington (1931) on twoway RILs and describe the map expansion, clustering of breakpoints, and other features of the genomes of multiple-strain RILs as a function of the level of crossover interference in meiosis. In this technical report, we present all of our results, in their gory detail. We don’t intend to include such details in the final publication, but want to present them here for those who might be interested.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High-throughput SNP arrays provide estimates of genotypes for up to one million loci, often used in genome-wide association studies. While these estimates are typically very accurate, genotyping errors do occur, which can influence in particular the most extreme test statistics and p-values. Estimates for the genotype uncertainties are also available, although typically ignored. In this manuscript, we develop a framework to incorporate these genotype uncertainties in case-control studies for any genetic model. We verify that using the assumption of a “local alternative” in the score test is very reasonable for effect sizes typically seen in SNP association studies, and show that the power of the score test is simply a function of the correlation of the genotype probabilities with the true genotypes. We demonstrate that the power to detect a true association can be substantially increased for difficult to call genotypes, resulting in improved inference in association studies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the linkage disequilibrium (LD) structure in the genotype data, and accommodates missing genotypes via haplotype-based imputation. We also derive an efficient algorithm to simulate case-parent trios where genetic risk is determined via epistatic interactions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The concordance probability is used to evaluate the discriminatory power and the predictive accuracy of nonlinear statistical models. We derive an analytic expression for the concordance probability in the Cox proportional hazards model. The proposed estimator is a function of the regression parameters and the covariate distribution only and does not use the observed event and censoring times. For this reason it is asymptotically unbiased, unlike Harrell's c-index based on informative pairs. The asymptotic distribution of the concordance probability estimate is derived using U-statistic theory and the methodology is applied to a predictive model in lung cancer.