6 resultados para Statistical model

em DigitalCommons@The Texas Medical Center


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of this dissertation was to design and implement strategies for assessment of exposures to organic chemicals used in the production of a styrene-butadiene polymer at the Texas Plastics Company (TPC). Linear statistical retrospective exposure models, univariate and multivariate, were developed based on the validation of historical industrial hygiene monitoring data collected by industrial hygienists at TPC, and additional current industrial hygiene monitoring data collected for the purposes of this study. The current monitoring data served several purposes. First, it provided information on current exposure data, in the form of unbiased estimates of mean exposure to organic chemicals for each job title included. Second, it provided information on homogeneity of exposure within each job title, through the use of a carefully designed sampling scheme which addressed variability of exposure both between and within job titles. Third, it permitted the investigation of how well current exposure data can serve as an evaluation tool for retrospective exposure estimation. Finally, this dissertation investigated the simultaneous evaluation of exposure to several chemicals, as well as the use of values below detection limits in a multivariate linear statistical model of exposures. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The purpose of this study was to analyze the implementation of national family planning policy in the United States, which was embedded in four separate statutes during the period of study, Fiscal Years 1976-81. The design of the study utilized a modification of the Sabatier and Mazmanian framework for policy analysis, which defined implementation as the carrying out of statutory policy. The study was divided into two phases. The first part of the study compared the implementation of family planning policy by each of the pertinent statutes. The second part of the study identified factors that were associated with implementation of federal family planning policy within the context of block grants.^ Implemention was measured here by federal dollars spent for family planning, adjusted for the size of the respective state target populations. Expenditure data were collected from the Alan Guttmacher Institute and from each of the federal agencies having administrative authority for the four pertinent statutes, respectively. Data from the former were used for most of the analysis because they were more complete and more reliable.^ The first phase of the study tested the hypothesis that the coherence of a statute is directly related to effective implementation. Equity in the distribution of funds to the states was used to operationalize effective implementation. To a large extent, the results of the analysis supported the hypothesis. In addition to their theoretical significance, these findings were also significant for policymakers insofar they demonstrated the effectiveness of categorical legislation in implementing desired health policy.^ Given the current and historically intermittent emphasis on more state and less federal decision-making in health and human serives, the second phase of the study focused on state level factors that were associated with expenditures of social service block grant funds for family planning. Using the Sabatier-Mazmanian implementation model as a framework, many factors were tested. Those factors showing the strongest conceptual and statistical relationship to the dependent variable were used to construct a statistical model. Using multivariable regression analysis, this model was applied cross-sectionally to each of the years of the study. The most striking finding here was that the dominant determinants of the state spending varied for each year of the study (Fiscal Years 1976-1981). The significance of these results was that they provided empirical support of current implementation theory, showing that the dominant determinants of implementation vary greatly over time. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The importance of race as a factor in mental health status has been a topic of controversy. This study reviews the history of research in this area and examines racial variances in the relationship between selected socio-demographic variables and general well-being. The study also examines the appropriateness of an additive versus an interactive statistical model for this investigation.^ The sample consists of 6,913 persons who completed the General Well-Being Schedules as administered in the detailed component of the first National Health and Nutrition Examination Survey (NHANES I) conducted by the National Center for Health Statistics between April, 1971 and October, 1975. The sampling design is a multistage, probability sample of clusters of persons in area based segments. Of the 6,913 persons, 873 are Black.^ Unlike other recent community based mental health studies, this study revealed significant differences between the general well-being of Blacks and Whites. Blacks continued to exhibit significantly lower levels of well-being even after adjustments were made for income, education, marital status, sex, age and place of residence. Statistical interaction was found between race and sex with Black females reporting lower levels of well-being than either Black or White males or their White female counterparts.^ The study includes a detailed review of the NHANES I sample design. It is shown that selected aspects of the design make it difficult to render appropriate national comparisons of Black-White differences. As a result conclusions pertaining to these differences based on NHANES I may be of questionable validity. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The efficacy of waste stabilization lagoons for the treatment of five priority pollutants and two widely used commercial compounds was evaluated in laboratory model ponds. Three ponds were designed to simulate a primary anaerobic lagoon, a secondary facultative lagoon, and a tertiary aerobic lagoon. Biodegradation, volatilization, and sorption losses were quantified for bis(2-chloroethyl) ether, benzene, toluene, naphthalene, phenanthrene, ethylene glycol, and ethylene glycol monoethyl ether. A statistical model using a log normal transformation indicated biodegradation of bis(2-chloroethyl) ether followed first-order kinetics. Additionally, multiple regression analysis indicated biochemical oxygen demand was the water quality variable most highly correlated with bis(2-chloroethyl) ether effluent concentration. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^