28 resultados para Simulation analysis


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Currently there is no general method to study the impact of population admixture within families on the assumptions of random mating and consequently, Hardy-Weinberg equilibrium (HWE) and linkage equilibrium (LE) and on the inference obtained from traditional linkage analysis. ^ First, through simulation, the effect of admixture of two populations on the log of the odds (LOD) score was assessed, using Prostate Cancer as the typical disease model. Comparisons between simulated mixed and homogeneous families were performed. LOD scores under both models of admixture (within families and within a data set of homogeneous families) were closest to the homogeneous family scores of the population having the highest mixing proportion. Random sampling of families or ascertainment of families with disease affection status did not affect this observation, nor did the mode of inheritance (dominant/recessive) or sample size. ^ Second, after establishing the effect of admixture on the LOD score and inference for linkage, the presence of induced disequilibria by population admixture within families was studied and an adjustment procedure was developed. The adjustment did not force all disequilibria to disappear but because the families were adjusted for the population admixture, those replicates where the disequilibria exist are no longer affected by the disequilibria in terms of maximization for linkage. Furthermore, the adjustment was able to exclude uninformative families or families that had such a high departure from HWE and/or LE that their LOD scores were not reliable. ^ Together these observations imply that the presence of families of mixed population ancestry impacts linkage analysis in terms of the LOD score and the estimate of the recombination fraction. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Do siblings of centenarians tend to have longer life spans? To answer this question, life spans of 184 siblings for 42 centenarians have been evaluated. Two important questions have been addressed in analyzing the sibling data. First, a standard needs to be established, to which the life spans of 184 siblings are compared. In this report, an external reference population is constructed from the U.S. life tables. Its estimated mortality rates are treated as baseline hazards from which the relative mortality of the siblings are estimated. Second, the standard survival models which assume independent observations are invalid when correlation within family exists, underestimating the true variance. Methods that allow correlations are illustrated by three different methods. First, the cumulative relative excess mortality between siblings and their comparison group is calculated and used as an effective graphic tool, along with the Product Limit estimator of the survival function. The variance estimator of the cumulative relative excess mortality is adjusted for the potential within family correlation using Taylor linearization approach. Second, approaches that adjust for the inflated variance are examined. They are adjusted one-sample log-rank test using design effect originally proposed by Rao and Scott in the correlated binomial or Poisson distribution setting and the robust variance estimator derived from the log-likelihood function of a multiplicative model. Nether of these two approaches provide correlation estimate within families, but the comparison with the comparison with the standard remains valid under dependence. Last, using the frailty model concept, the multiplicative model, where the baseline hazards are known, is extended by adding a random frailty term that is based on the positive stable or the gamma distribution. Comparisons between the two frailty distributions are performed by simulation. Based on the results from various approaches, it is concluded that the siblings of centenarians had significant lower mortality rates as compared to their cohorts. The frailty models also indicate significant correlations between the life spans of the siblings. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discoveries of the BRCA1 and BRCA2 genes have made it possible for women of families with hereditary breast/ovarian cancer to determine if they carry cancer-predisposing genetic mutations. Women with germline mutations have significantly higher probabilities of developing both cancers than the general population. Since the presence of a BRCA1 or BRCA2 mutation does not guarantee future cancer development, the appropriate course of action remains uncertain for these women. Prophylactic mastectomy and oophorectomy remain controversial since the underlying premise for surgical intervention is based more upon reduction in the estimated risk of cancer than on actual evidence of clinical benefit. Issues that are incorporated in a woman's decision making process include quality of life without breasts, ovaries, attitudes toward possible surgical morbidity as well as a remaining risk of future development of breast/ovarian cancer despite prophylactic surgery. The incorporation of patient preferences into decision analysis models can determine the quality-adjusted survival of different prophylactic approaches to breast/ovarian cancer prevention. Monte Carlo simulation was conducted on 4 separate decision models representing prophylactic oophorectomy, prophylactic mastectomy, prophylactic oophorectomy/mastectomy and screening. The use of 3 separate preference assessment methods across different populations of women allows researchers to determine how quality adjusted survival varies according to clinical strategy, method of preference assessment and the population from which preferences are assessed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Improvements in the analysis of microarray images are critical for accurately quantifying gene expression levels. The acquisition of accurate spot intensities directly influences the results and interpretation of statistical analyses. This dissertation discusses the implementation of a novel approach to the analysis of cDNA microarray images. We use a stellar photometric model, the Moffat function, to quantify microarray spots from nylon microarray images. The inherent flexibility of the Moffat shape model makes it ideal for quantifying microarray spots. We apply our novel approach to a Wilms' tumor microarray study and compare our results with a fixed-circle segmentation approach for spot quantification. Our results suggest that different spot feature extraction methods can have an impact on the ability of statistical methods to identify differentially expressed genes. We also used the Moffat function to simulate a series of microarray images under various experimental conditions. These simulations were used to validate the performance of various statistical methods for identifying differentially expressed genes. Our simulation results indicate that tests taking into account the dependency between mean spot intensity and variance estimation, such as the smoothened t-test, can better identify differentially expressed genes, especially when the number of replicates and mean fold change are low. The analysis of the simulations also showed that overall, a rank sum test (Mann-Whitney) performed well at identifying differentially expressed genes. Previous work has suggested the strengths of nonparametric approaches for identifying differentially expressed genes. We also show that multivariate approaches, such as hierarchical and k-means cluster analysis along with principal components analysis, are only effective at classifying samples when replicate numbers and mean fold change are high. Finally, we show how our stellar shape model approach can be extended to the analysis of 2D-gel images by adapting the Moffat function to take into account the elliptical nature of spots in such images. Our results indicate that stellar shape models offer a previously unexplored approach for the quantification of 2D-gel spots. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Colorectal cancer is the forth most common diagnosed cancer in the United States. Every year about a hundred forty-seven thousand people will be diagnosed with colorectal cancer and fifty-six thousand people lose their lives due to this disease. Most of the hereditary nonpolyposis colorectal cancer (HNPCC) and 12% of the sporadic colorectal cancer show microsatellite instability. Colorectal cancer is a multistep progressive disease. It starts from a mutation in a normal colorectal cell and grows into a clone of cells that further accumulates mutations and finally develops into a malignant tumor. In terms of molecular evolution, the process of colorectal tumor progression represents the acquisition of sequential mutations. ^ Clinical studies use biomarkers such as microsatellite or single nucleotide polymorphisms (SNPs) to study mutation frequencies in colorectal cancer. Microsatellite data obtained from single genome equivalent PCR or small pool PCR can be used to infer tumor progression. Since tumor progression is similar to population evolution, we used an approach known as coalescent, which is well established in population genetics, to analyze this type of data. Coalescent theory has been known to infer the sample's evolutionary path through the analysis of microsatellite data. ^ The simulation results indicate that the constant population size pattern and the rapid tumor growth pattern have different genetic polymorphic patterns. The simulation results were compared with experimental data collected from HNPCC patients. The preliminary result shows the mutation rate in 6 HNPCC patients range from 0.001 to 0.01. The patients' polymorphic patterns are similar to the constant population size pattern which implies the tumor progression is through multilineage persistence instead of clonal sequential evolution. The results should be further verified using a larger dataset. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In geographical epidemiology, maps of disease rates and disease risk provide a spatial perspective for researching disease etiology. For rare diseases or when the population base is small, the rate and risk estimates may be unstable. Empirical Bayesian (EB) methods have been used to spatially smooth the estimates by permitting an area estimate to "borrow strength" from its neighbors. Such EB methods include the use of a Gamma model, of a James-Stein estimator, and of a conditional autoregressive (CAR) process. A fully Bayesian analysis of the CAR process is proposed. One advantage of this fully Bayesian analysis is that it can be implemented simply by using repeated sampling from the posterior densities. Use of a Markov chain Monte Carlo technique such as Gibbs sampler was not necessary. Direct resampling from the posterior densities provides exact small sample inferences instead of the approximate asymptotic analyses of maximum likelihood methods (Clayton & Kaldor, 1987). Further, the proposed CAR model provides for covariates to be included in the model. A simulation demonstrates the effect of sample size on the fully Bayesian analysis of the CAR process. The methods are applied to lip cancer data from Scotland, and the results are compared. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mixture modeling is commonly used to model categorical latent variables that represent subpopulations in which population membership is unknown but can be inferred from the data. In relatively recent years, the potential of finite mixture models has been applied in time-to-event data. However, the commonly used survival mixture model assumes that the effects of the covariates involved in failure times differ across latent classes, but the covariate distribution is homogeneous. The aim of this dissertation is to develop a method to examine time-to-event data in the presence of unobserved heterogeneity under a framework of mixture modeling. A joint model is developed to incorporate the latent survival trajectory along with the observed information for the joint analysis of a time-to-event variable, its discrete and continuous covariates, and a latent class variable. It is assumed that the effects of covariates on survival times and the distribution of covariates vary across different latent classes. The unobservable survival trajectories are identified through estimating the probability that a subject belongs to a particular class based on observed information. We applied this method to a Hodgkin lymphoma study with long-term follow-up and observed four distinct latent classes in terms of long-term survival and distributions of prognostic factors. Our results from simulation studies and from the Hodgkin lymphoma study demonstrated the superiority of our joint model compared with the conventional survival model. This flexible inference method provides more accurate estimation and accommodates unobservable heterogeneity among individuals while taking involved interactions between covariates into consideration.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multi-center clinical trials are very common in the development of new drugs and devices. One concern in such trials, is the effect of individual investigational sites enrolling small numbers of patients on the overall result. Can the presence of small centers cause an ineffective treatment to appear effective when treatment-by-center interaction is not statistically significant?^ In this research, simulations are used to study the effect that centers enrolling few patients may have on the analysis of clinical trial data. A multi-center clinical trial with 20 sites is simulated to investigate the effect of a new treatment in comparison to a placebo treatment. Twelve of these 20 investigational sites are considered small, each enrolling less than four patients per treatment group. Three clinical trials are simulated with sample sizes of 100, 170 and 300. The simulated data is generated with various characteristics, one in which treatment should be considered effective and another where treatment is not effective. Qualitative interactions are also produced within the small sites to further investigate the effect of small centers under various conditions.^ Standard analysis of variance methods and the "sometimes-pool" testing procedure are applied to the simulated data. One model investigates treatment and center effect and treatment-by-center interaction. Another model investigates treatment effect alone. These analyses are used to determine the power to detect treatment-by-center interactions, and the probability of type I error.^ We find it is difficult to detect treatment-by-center interactions when only a few investigational sites enrolling a limited number of patients participate in the interaction. However, we find no increased risk of type I error in these situations. In a pooled analysis, when the treatment is not effective, the probability of finding a significant treatment effect in the absence of significant treatment-by-center interaction is well within standard limits of type I error. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this research is to develop a new statistical method to determine the minimum set of rows (R) in a R x C contingency table of discrete data that explains the dependence of observations. The statistical power of the method will be empirically determined by computer simulation to judge its efficiency over the presently existing methods. The method will be applied to data on DNA fragment length variation at six VNTR loci in over 72 populations from five major racial groups of human (total sample size is over 15,000 individuals; each sample having at least 50 individuals). DNA fragment lengths grouped in bins will form the basis of studying inter-population DNA variation within the racial groups are significant, will provide a rigorous re-binning procedure for forensic computation of DNA profile frequencies that takes into account intra-racial DNA variation among populations. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many statistical studies feature data with both exact-time and interval-censored events. While a number of methods currently exist to handle interval-censored events and multivariate exact-time events separately, few techniques exist to deal with their combination. This thesis develops a theoretical framework for analyzing a multivariate endpoint comprised of a single interval-censored event plus an arbitrary number of exact-time events. The approach fuses the exact-time events, modeled using the marginal method of Wei, Lin, and Weissfeld, with a piecewise-exponential interval-censored component. The resulting model incorporates more of the information in the data and also removes some of the biases associated with the exclusion of interval-censored events. A simulation study demonstrates that our approach produces reliable estimates for the model parameters and their variance-covariance matrix. As a real-world data example, we apply this technique to the Systolic Hypertension in the Elderly Program (SHEP) clinical trial, which features three correlated events: clinical non-fatal myocardial infarction, fatal myocardial infarction (two exact-time events), and silent myocardial infarction (one interval-censored event). ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this dissertation, we propose a continuous-time Markov chain model to examine the longitudinal data that have three categories in the outcome variable. The advantage of this model is that it permits a different number of measurements for each subject and the duration between two consecutive time points of measurements can be irregular. Using the maximum likelihood principle, we can estimate the transition probability between two time points. By using the information provided by the independent variables, this model can also estimate the transition probability for each subject. The Monte Carlo simulation method will be used to investigate the goodness of model fitting compared with that obtained from other models. A public health example will be used to demonstrate the application of this method. ^