23 resultados para Multilevel linear model
Resumo:
Background Past and recent evidence shows that radionuclides in drinking water may be a public health concern. Developmental thresholds for birth defects with respect to chronic low level domestic radiation exposures, such as through drinking water, have not been definitely recognized, and there is a strong need to address this deficiency in information. In this study we examined the geographic distribution of orofacial cleft birth defects in and around uranium mining district Counties in South Texas (Atascosa, Bee, Brooks, Calhoun, Duval, Goliad, Hidalgo, Jim Hogg, Jim Wells, Karnes, Kleberg, Live Oak, McMullen, Nueces, San Patricio, Refugio, Starr, Victoria, Webb, and Zavala), from 1999 to 2007. The probable association of cleft birth defect rates by ZIP codes classified according to uranium and radium concentrations in drinking water supplies was evaluated. Similar associations between orofacial cleft birth defects and radium/radon in drinking water were reported earlier by Cech and co-investigators in another of the Gulf Coast region (Harris County, Texas).50, 55 Since substantial uranium mining activity existed and still exists in South Texas, contamination of drinking water sources with radiation and its relation to birth defects is a ground for concern. ^ Methods Residential addresses of orofacial cleft birth defect cases, as well as live births within the twenty Counties during 1999-2007 were geocoded and mapped. Prevalence rates were calculated by ZIP codes and were mapped accordingly. Locations of drinking water supplies were also geocoded and mapped. ZIP codes were stratified as having high combined uranium (≥30μg/L) vs. low combined uranium (<30μg/L). Likewise, ZIP codes having the uranium isotope, Ra-226 in drinking water, were also stratified as having elevated radium (≥3 pCi/L) vs. low radium (<3 pCi/L). A linear regression was performed using STATA® generalized linear model (GLM) program to evaluate the probable association between cleft birth defect rates by ZIP codes and concentration of uranium and radium via domestic water supply. These rates were further adjusted for potentially confounding variables such as maternal age, education, occupation, and ethnicity. ^ Results This study showed higher rates of cleft births in ZIP codes classified as having high combined uranium versus ZIP codes having low combined uranium. The model was further improved by adding radium stratified as explained above. Adjustment for maternal age and ethnicity did not substantially affect the statistical significance of uranium or radium concentrations in household water supplies. ^ Conclusion Although this study lacks individual exposure levels, the findings suggest a significant association between elevated uranium and radium concentrations in tap water and high orofacial birth defect rates by ZIP codes. Future case-control studies that can measure individual exposure levels and adjust for contending risk factors could result in a better understanding of the exposure-disease association.^
Resumo:
The infant mortality rate (IMR) is considered to be one of the most important indices of a country's well-being. Countries around the world and other health organizations like the World Health Organization are dedicating their resources, knowledge and energy to reduce the infant mortality rates. The well-known Millennium Development Goal 4 (MDG 4), whose aim is to archive a two thirds reduction of the under-five mortality rate between 1990 and 2015, is an example of the commitment. ^ In this study our goal is to model the trends of IMR between the 1950s to 2010s for selected countries. We would like to know how the IMR is changing overtime and how it differs across countries. ^ IMR data collected over time forms a time series. The repeated observations of IMR time series are not statistically independent. So in modeling the trend of IMR, it is necessary to account for these correlations. We proposed to use the generalized least squares method in general linear models setting to deal with the variance-covariance structure in our model. In order to estimate the variance-covariance matrix, we referred to the time-series models, especially the autoregressive and moving average models. Furthermore, we will compared results from general linear model with correlation structure to that from ordinary least squares method without taking into account the correlation structure to check how significantly the estimates change.^
Resumo:
Cardiovascular disease (CVD) is a threat to public health. It has been reported to be the leading cause of death in United States. The invention of next generation sequencing (NGS) technology has revolutionized the biomedical research. To investigate NGS data of CVD related quantitative traits would contribute to address the unknown etiology and disease mechanism of CVD. NHLBI's Exome Sequencing Project (ESP) contains CVD related phenotypes and their associated NGS exomes sequence data. Initially, a subset of next generation sequencing data consisting of 13 CVD-related quantitative traits was investigated. Only 6 traits, systolic blood pressure (SBP), diastolic blood pressure (DBP), height, platelet counts, waist circumference, and weight, were analyzed by functional linear model (FLM) and 7 currently existing methods. FLM outperformed all currently existing methods by identifying the highest number of significant genes and had identified 96, 139, 756, 1162, 1106, and 298 genes associated with SBP, DBP, Height, Platelet, Waist, and Weight respectively. ^
New methods for quantification and analysis of quantitative real-time polymerase chain reaction data
Resumo:
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
Hierarchically clustered populations are often encountered in public health research, but the traditional methods used in analyzing this type of data are not always adequate. In the case of survival time data, more appropriate methods have only begun to surface in the last couple of decades. Such methods include multilevel statistical techniques which, although more complicated to implement than traditional methods, are more appropriate. ^ One population that is known to exhibit a hierarchical structure is that of patients who utilize the health care system of the Department of Veterans Affairs where patients are grouped not only by hospital, but also by geographic network (VISN). This project analyzes survival time data sets housed at the Houston Veterans Affairs Medical Center Research Department using two different Cox Proportional Hazards regression models, a traditional model and a multilevel model. VISNs that exhibit significantly higher or lower survival rates than the rest are identified separately for each model. ^ In this particular case, although there are differences in the results of the two models, it is not enough to warrant using the more complex multilevel technique. This is shown by the small estimates of variance associated with levels two and three in the multilevel Cox analysis. Much of the differences that are exhibited in identification of VISNs with high or low survival rates is attributable to computer hardware difficulties rather than to any significant improvements in the model. ^
Resumo:
This study proposed a novel statistical method that modeled the multiple outcomes and missing data process jointly using item response theory. This method follows the "intent-to-treat" principle in clinical trials and accounts for the correlation between outcomes and missing data process. This method may provide a good solution to chronic mental disorder study. ^ The simulation study demonstrated that if the true model is the proposed model with moderate or strong correlation, ignoring the within correlation may lead to overestimate of the treatment effect and result in more type I error than specified level. Even if the within correlation is small, the performance of proposed model is as good as naïve response model. Thus, the proposed model is robust for different correlation settings if the data is generated by the proposed model.^
Resumo:
Hierarchical linear growth model (HLGM), as a flexible and powerful analytic method, has played an increased important role in psychology, public health and medical sciences in recent decades. Mostly, researchers who conduct HLGM are interested in the treatment effect on individual trajectories, which can be indicated by the cross-level interaction effects. However, the statistical hypothesis test for the effect of cross-level interaction in HLGM only show us whether there is a significant group difference in the average rate of change, rate of acceleration or higher polynomial effect; it fails to convey information about the magnitude of the difference between the group trajectories at specific time point. Thus, reporting and interpreting effect sizes have been increased emphases in HLGM in recent years, due to the limitations and increased criticisms for statistical hypothesis testing. However, most researchers fail to report these model-implied effect sizes for group trajectories comparison and their corresponding confidence intervals in HLGM analysis, since lack of appropriate and standard functions to estimate effect sizes associated with the model-implied difference between grouping trajectories in HLGM, and also lack of computing packages in the popular statistical software to automatically calculate them. ^ The present project is the first to establish the appropriate computing functions to assess the standard difference between grouping trajectories in HLGM. We proposed the two functions to estimate effect sizes on model-based grouping trajectories difference at specific time, we also suggested the robust effect sizes to reduce the bias of estimated effect sizes. Then, we applied the proposed functions to estimate the population effect sizes (d ) and robust effect sizes (du) on the cross-level interaction in HLGM by using the three simulated datasets, and also we compared the three methods of constructing confidence intervals around d and du recommended the best one for application. At the end, we constructed 95% confidence intervals with the suitable method for the effect sizes what we obtained with the three simulated datasets. ^ The effect sizes between grouping trajectories for the three simulated longitudinal datasets indicated that even though the statistical hypothesis test shows no significant difference between grouping trajectories, effect sizes between these grouping trajectories can still be large at some time points. Therefore, effect sizes between grouping trajectories in HLGM analysis provide us additional and meaningful information to assess group effect on individual trajectories. In addition, we also compared the three methods to construct 95% confident intervals around corresponding effect sizes in this project, which handled with the uncertainty of effect sizes to population parameter. We suggested the noncentral t-distribution based method when the assumptions held, and the bootstrap bias-corrected and accelerated method when the assumptions are not met.^