Biblioteca Digital

578 resultados para 010402 Biostatistics

Concordance Probability and Discriminatory Power in Proportional Hazards Regression

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The concordance probability is used to evaluate the discriminatory power and the predictive accuracy of nonlinear statistical models. We derive an analytic expression for the concordance probability in the Cox proportional hazards model. The proposed estimator is a function of the regression parameters and the covariate distribution only and does not use the observed event and censoring times. For this reason it is asymptotically unbiased, unlike Harrell's c-index based on informative pairs. The asymptotic distribution of the concordance probability estimate is derived using U-statistic theory and the methodology is applied to a predictive model in lung cancer.

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Power calculations in a small sample comparative study, with a continuous outcome measure, are typically undertaken using the asymptotic distribution of the test statistic. When the sample size is small, this asymptotic result can be a poor approximation. An alternative approach, using a rank based test statistic, is an exact power calculation. When the number of groups is greater than two, the number of calculations required to perform an exact power calculation is prohibitive. To reduce the computational burden, a Monte Carlo resampling procedure is used to approximate the exact power function of a k-sample rank test statistic under the family of Lehmann alternative hypotheses. The motivating example for this approach is the design of animal studies, where the number of animals per group is typically small.

Sequential Quantitative Trait Locus Mapping in Experimental Crosses

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The etiology of complex diseases is heterogeneous. The presence of risk alleles in one or more genetic loci affects the function of a variety of intermediate biological pathways, resulting in the overt expression of disease. Hence, there is an increasing focus on identifying the genetic basis of disease by sytematically studying phenotypic traits pertaining to the underlying biological functions. In this paper we focus on identifying genetic loci linked to quantitative phenotypic traits in experimental crosses. Such genetic mapping methods often use a one stage design by genotyping all the markers of interest on the available subjects. A genome scan based on single locus or multi-locus models is used to identify the putative loci. Since the number of quantitative trait loci (QTLs) is very likely to be small relative to the number of markers genotyped, a one-stage selective genotyping approach is commonly used to reduce the genotyping burden, whereby markers are genotyped solely on individuals with extreme trait values. This approach is powerful in the presence of a single quantitative trait locus (QTL) but may result in substantial loss of information in the presence of multiple QTLs. Here we investigate the efficiency of sequential two stage designs to identify QTLs in experimental populations. Our investigations for backcross and F2 crosses suggest that genotyping all the markers on 60% of the subjects in Stage 1 and genotyping the chromosomes significant at 20% level using additional subjects in Stage 2 and testing using all the subjects provides an efficient approach to identify the QTLs and utilizes only 70% of the genotyping burden relative to a one stage design, regardless of the heritability and genotyping density. Complex traits are a consequence of multiple QTLs conferring main effects as well as epistatic interactions. We propose a two-stage analytic approach where a single-locus genome scan is conducted in Stage 1 to identify promising chromosomes, and interactions are examined using the loci on these chromosomes in Stage 2. We examine settings under which the two-stage analytic approach provides sufficient power to detect the putative QTLs.

A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.

A Fully Bayesian Approach for Combining Multilevel Failure Information in Fault Tree Quantification and Corresponding Optimal Resource Allocation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a fully Bayesian approach that simultaneously combines basic event and statistically independent higher event-level failure data in fault tree quantification. Such higher-level data could correspond to train, sub-system or system failure events. The full Bayesian approach also allows the highest-level data that are usually available for existing facilities to be automatically propagated to lower levels. A simple example illustrates the proposed approach. The optimal allocation of resources for collecting additional data from a choice of different level events is also presented. The optimization is achieved using a genetic algorithm.

Classification and selection of biomarkers in genomic data using LASSO

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.

Monotone Constrained Tensor-product B-spline with application to screening studies

Relevância:

10.00% 10.00%

Publicador:

Resumo:

When different markers are responsive to different aspects of a disease, combination of multiple markers could provide a better screening test for early detection. It is also resonable to assume that the risk of disease changes smoothly as the biomarker values change and the change in risk is monotone with respect to each biomarker. In this paper, we propose a boundary constrained tensor-product B-spline method to estimate the risk of disease by maximizing a penalized likelihood. To choose the optimal amount of smoothing, two scores are proposed which are extensions of the GCV score (O'Sullivan et al. (1986)) and the GACV score (Ziang and Wahba (1996)) to incorporate linear constraints. Simulation studies are carried out to investigate the performance of the proposed estimator and the selection scores. In addidtion, sensitivities and specificities based ona pproximate leave-one-out estimates are proposed to generate more realisitc ROC curves. Data from a pancreatic cancer study is used for illustration.

Bayes Factors Based on Test Statistics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditionally, the use of Bayes factors has required the specification of proper prior distributions on model parameters implicit to both null and alternative hypotheses. In this paper, I describe an approach to defining Bayes factors based on modeling test statistics. Because the distributions of test statistics do not depend on unknown model parameters, this approach eliminates the subjectivity normally associated with the definition of Bayes factors. For standard test statistics, including the _2, F, t and z statistics, the values of Bayes factors that result from this approach can be simply expressed in closed form.

Quantifying and Comparing the Accuracy of Binary Biomarkers When Predicting a Failure Time Outcome

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The positive and negative predictive value are standard measures used to quantify the predictive accuracy of binary biomarkers when the outcome being predicted is also binary. When the biomarkers are instead being used to predict a failure time outcome, there is no standard way of quantifying predictive accuracy. We propose a natural extension of the traditional predictive values to accommodate censored survival data. We discuss not only quantifying predictive accuracy using these extended predictive values, but also rigorously comparing the accuracy of two biomarkers in terms of their predictive values. Using a marginal regression framework, we describe how to estimate differences in predictive accuracy and how to test whether the observed difference is statistically significant.

Performance of the Halex in Logitudinal Studies of Older Adults

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Goal: The Halex is an indicator of health status that combines self-rated health and activity limitations, which has been used by NCHS to predict future years of healthy life. The scores for each health state were developed based on strong assumptions, notably that a person in excellent health with ADL disabilities is as healthy as a person in poor health with no disabilities. Our goal was to examine the performance of the Halex as a longitudinal measure of health for older adults, and to improve the scoring if necessary. Methods: We used data from the Cardiovascular Health Study (CHS) to compare the relationship of baseline health to health 2 years later. Subject ages ranged from 65 to 103 (mean age 75). A total of 40,827 transitions were available for analysis. We examined whether Halex scores at time 0 were related monotonically to scores two years later, and iterated the original scores to improve the fit over time. Findings: The original Halex scores were not consistent over time. Persons in excellent health with ADL limitations were much healthier 2 years later than people in poor health with no limitations, even though they had been assumed to have identical health. People with ADL limitations had higher scores than predicted. The assumptions made in creating the Halex were not upheld in the data. Conclusions: The new iterated scores are specific to older adults, are appropriate for longitudinal data, and are relatively assumption-free. We recommend the use of these new scores for longitudinal studies of older adults that use the Halex health states.

Tests for Comparing Mark-Specific Hazards and Cumulative Incidence Functions

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is of interest in some applications to determine whether there is a relationship between a hazard rate function (or a cumulative incidence function) and a mark variable which is only observed at uncensored failure times. We develop nonparametric tests for this problem when the mark variable is continuous. Tests are developed for the null hypothesis that the mark-specific hazard rate is independent of the mark versus ordered and two-sided alternatives expressed in terms of mark-specific hazard functions and mark-specific cumulative incidence functions. The test statistics are based on functionals of a bivariate test process equal to a weighted average of differences between a Nelson--Aalen-type estimator of the mark-specific cumulative hazard function and a nonparametric estimator of this function under the null hypothesis. The weight function in the test process can be chosen so that the test statistics are asymptotically distribution-free.Asymptotically correct critical values are obtained through a simple simulation procedure. The testing procedures are shown to perform well in numerical studies, and are illustrated with an AIDS clinical trial example. Specifically, the tests are used to assess if the instantaneous or absolute risk of treatment failure depends on the amount of accumulation of drug resistance mutations in a subject's HIV virus. This assessment helps guide development of anti-HIV therapies that surmount the problem of drug resistance.

Whither PQL?

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Generalized linear mixed models (GLMM) are generalized linear models with normally distributed random effects in the linear predictor. Penalized quasi-likelihood (PQL), an approximate method of inference in GLMMs, involves repeated fitting of linear mixed models with “working” dependent variables and iterative weights that depend on parameter estimates from the previous cycle of iteration. The generality of PQL, and its implementation in commercially available software, has encouraged the application of GLMMs in many scientific fields. Caution is needed, however, since PQL may sometimes yield badly biased estimates of variance components, especially with binary outcomes. Recent developments in numerical integration, including adaptive Gaussian quadrature, higher order Laplace expansions, stochastic integration and Markov chain Monte Carlo (MCMC) algorithms, provide attractive alternatives to PQL for approximate likelihood inference in GLMMs. Analyses of some well known datasets, and simulations based on these analyses, suggest that PQL still performs remarkably well in comparison with more elaborate procedures in many practical situations. Adaptive Gaussian quadrature is a viable alternative for nested designs where the numerical integration is limited to a small number of dimensions. Higher order Laplace approximations hold the promise of accurate inference more generally. MCMC is likely the method of choice for the most complex problems that involve high dimensional integrals.

Modeling Breastmilk Infectivity in HIV-1 Infected Mothers

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Estimation of breastmilk infectivity in HIV-1 infected mothers is difficult because transmission can occur while the fetus is in-utero, during delivery, or through breastfeeding. Since transmission can only be detected through periodic testing, however, it may be impossible to determine the actual mode of transmission in any individual child. In this paper we develop a model to estimate breastmilk infectivity as well as the probabilities of in-utero and intrapartum transmission. In addition, the model allows separate estimation of early and late breastmilk infectivity and individual variation in maternal infectivity. Methods for hypothesis testing of binary risk factors and a method for assessing goodness of fit are also described. Data from a randomized trial of breastfeeding versus formula feeding among HIV-1 infected mothers in Nairobi, Kenya are used to illustrate the methods.

Selection of Matching Variables in Community Health Intervention Trials

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In a matched experimental design, the effectiveness of matching in reducing bias and increasing power depends on the strength of the association between the matching variable and the outcome of interest. In particular, in the design of a community health intervention trial, the effectiveness of a matched design, where communities are matched according to some community characteristic, depends on the strength of the correlation between the matching characteristic and the change in the health behavior being measured. We attempt to estimate the correlation between community characteristics and changes in health behaviors in four datasets from community intervention trials and observational studies. Community characteristics that are highly correlated with changes in health behaviors would potentially be effective matching variables in studies of health intervention programs designed to change those behaviors. Among the community characteristics considered, the urban-rural character of the community was the most highly correlated with changes in health behaviors. The correlations between Per Capita Income, Percent Low Income & Percent aged over 65 and changes in health behaviors were marginally statistically significant (p < 0.08).

Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part aggress with the more general information bound calculations of Robins, Hsieh, and Newey (1995). By verifying the conditions of Murphy and Van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.

«
1
2
...
5
6
7
8
9
10
11
...
38
39
»