13 resultados para 340402 Econometric and Statistical Methods
em DigitalCommons@The Texas Medical Center
Resumo:
Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^
Resumo:
This paper defines and compares several models for describing excess influenza pneumonia mortality in Houston. First, the methodology used by the Center for Disease Control is examined and several variations of this methodology are studied. All of the models examined emphasize the difficulty of omitting epidemic weeks.^ In an attempt to find a better method of describing expected and epidemic mortality, time series methods are examined. Grouping in four-week periods, truncating the data series to adjust epidemic periods, and seasonally-adjusting the series y(,t), by:^ (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)^ is the best method examined. This new series w(,t) is stationary and a moving average model MA(1) gives a good fit for forecasting influenza and pneumonia mortality in Houston.^ Influenza morbidity, other causes of death, sex, race, age, climate variables, environmental factors, and school absenteeism are all examined in terms of their relationship to influenza and pneumonia mortality. Both influenza morbidity and ischemic heart disease mortality show a very high relationship that remains when seasonal trends are removed from the data. However, when jointly modeling the three series it is obvious that the simple time series MA(1) model of truncated, seasonally-adjusted four-week data gives a better forecast.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
Calcium levels in spines play a significant role in determining the sign and magnitude of synaptic plasticity. The magnitude of calcium influx into spines is highly dependent on influx through N-methyl D-aspartate (NMDA) receptors, and therefore depends on the number of postsynaptic NMDA receptors in each spine. We have calculated previously how the number of postsynaptic NMDA receptors determines the mean and variance of calcium transients in the postsynaptic density, and how this alters the shape of plasticity curves. However, the number of postsynaptic NMDA receptors in the postsynaptic density is not well known. Anatomical methods for estimating the number of NMDA receptors produce estimates that are very different than those produced by physiological techniques. The physiological techniques are based on the statistics of synaptic transmission and it is difficult to experimentally estimate their precision. In this paper we use stochastic simulations in order to test the validity of a physiological estimation technique based on failure analysis. We find that the method is likely to underestimate the number of postsynaptic NMDA receptors, explain the source of the error, and re-derive a more precise estimation technique. We also show that the original failure analysis as well as our improved formulas are not robust to small estimation errors in key parameters.
Resumo:
Most studies of differential gene-expressions have been conducted between two given conditions. The two-condition experimental (TCE) approach is simple in that all genes detected display a common differential expression pattern responsive to a common two-condition difference. Therefore, the genes that are differentially expressed under the other conditions other than the given two conditions are undetectable with the TCE approach. In order to address the problem, we propose a new approach called multiple-condition experiment (MCE) without replication and develop corresponding statistical methods including inference of pairs of conditions for genes, new t-statistics, and a generalized multiple-testing method for any multiple-testing procedure via a control parameter C. We applied these statistical methods to analyze our real MCE data from breast cancer cell lines and found that 85 percent of gene-expression variations were caused by genotypic effects and genotype-ANAX1 overexpression interactions, which agrees well with our expected results. We also applied our methods to the adenoma dataset of Notterman et al. and identified 93 differentially expressed genes that could not be found in TCE. The MCE approach is a conceptual breakthrough in many aspects: (a) many conditions of interests can be conducted simultaneously; (b) study of association between differential expressions of genes and conditions becomes easy; (c) it can provide more precise information for molecular classification and diagnosis of tumors; (d) it can save lot of experimental resources and time for investigators.^
Resumo:
The pattern of the births during the week has been reported by many studies. The births occurred in weekends are found consistently less then births occurred in weekdays. This study employed two statistical methods, two-way ANOVA and two-way Friedman's test to analyse the daily variations in amount of births of 222,735 births from 2005-2007 in Harris County, Texas. The two methods were compared on their assumptions, procedures and results. Both of the tests showed a significant result which indicated that the births through the week are not uniformly distributed. The result of multiple comparison demonstrated the births occurring on weekends were significantly different than the births occurring on weekdays with least amount on Sundays.^
Resumo:
Studies have shown that rare genetic variants have stronger effects in predisposing common diseases, and several statistical methods have been developed for association studies involving rare variants. In order to better understand how these statistical methods perform, we seek to compare two recently developed rare variant statistical methods (VT and C-alpha) on 10,000 simulated re-sequencing data sets with disease status and the corresponding 10,000 simulated null data sets. The SLC1A1 gene has been suggested to be associated with diastolic blood pressure (DBP) in previous studies. In the current study, we applied VT and C-alpha methods to the empirical re-sequencing data for the SLC1A1 gene from 300 whites and 200 blacks. We found that VT method obtains higher power and performs better than C-alpha method with the simulated data we used. The type I errors were well-controlled for both methods. In addition, both VT and C-alpha methods suggested no statistical evidence for the association between the SLC1A1 gene and DBP. Overall, our findings provided an important comparison of the two statistical methods for future reference and provided preliminary and pioneer findings on the association between the SLC1A1 gene and blood pressure.^
Resumo:
Macromolecular interactions, such as protein-protein interactions and protein-DNA interactions, play important roles in executing biological functions in cells. However the complexity of such interactions often makes it very challenging to elucidate the structural details of these subjects. In this thesis, two different research strategies were applied on two different two macromolecular systems: X-ray crystallography on three tandem FF domains of transcription regulator CA150 and electron microscopy on STAT1-importin α5 complex. The results from these studies provide novel insights into the function-structure relationships of transcription coupled RNA splicing mediated by CA150 and the nuclear import process of the JAK-STAT signaling pathway. ^ The first project aimed at the protein-protein interaction module FF domain, which often occurs as tandem repeats. Crystallographic structure of the first three FF domains of human CA150 was determined to 2.7 Å resolution. This is the only crystal structure of an FF domain and the only structure on tandem FF domains to date. It revealed a striking connectivity between an FF domain and the next. Peptide binding assay with the potential binding ligand of FF domains was performed using fluorescence polarization. Furthermore, for the first time, FF domains were found to potentially interact with DNA. DNA binding assays were also performed and the results were supportive to this newly proposed functionality of an FF domain. ^ The second project aimed at understanding the molecular mechanism of the nuclear import process of transcription factor STAT1. The first structural model of pSTAT1-importin α5 complex in solution was built from the images of negative staining electron microscopy. Two STAT1 molecules were observed to interact with one molecule of importin α5 in an asymmetric manner. This seems to imply that STAT1 interacts with importin α5 with a novel mechanism that is different from canonical importin α-cargo interactions. Further in vitro binding assays were performed to obtain more details on the pSTAT1-importin α5 interaction. ^
Resumo:
Background. Research into methods for recovery from fatigue due to exercise is a popular topic among sport medicine, kinesiology and physical therapy. However, both the quantity and quality of studies and a clear solution of recovery are lacking. An analysis of the statistical methods in the existing literature of performance recovery can enhance the quality of research and provide some guidance for future studies. Methods: A literature review was performed using SCOPUS, SPORTDiscus, MEDLINE, CINAHL, Cochrane Library and Science Citation Index Expanded databases to extract the studies related to performance recovery from exercise of human beings. Original studies and their statistical analysis for recovery methods including Active Recovery, Cryotherapy/Contrast Therapy, Massage Therapy, Diet/Ergogenics, and Rehydration were examined. Results: The review produces a Research Design and Statistical Method Analysis Summary. Conclusion: Research design and statistical methods can be improved by using the guideline from the Research Design and Statistical Method Analysis Summary. This summary table lists the potential issues and suggested solutions, such as, sample size calculation, sports specific and research design issues consideration, population and measure markers selection, statistical methods for different analytical requirements, equality of variance and normality of data, post hoc analyses and effect size calculation.^
Resumo:
This dissertation develops and tests a comparative effectiveness methodology utilizing a novel approach to the application of Data Envelopment Analysis (DEA) in health studies. The concept of performance tiers (PerT) is introduced as terminology to express a relative risk class for individuals within a peer group and the PerT calculation is implemented with operations research (DEA) and spatial algorithms. The analysis results in the discrimination of the individual data observations into a relative risk classification by the DEA-PerT methodology. The performance of two distance measures, kNN (k-nearest neighbor) and Mahalanobis, was subsequently tested to classify new entrants into the appropriate tier. The methods were applied to subject data for the 14 year old cohort in the Project HeartBeat! study.^ The concepts presented herein represent a paradigm shift in the potential for public health applications to identify and respond to individual health status. The resultant classification scheme provides descriptive, and potentially prescriptive, guidance to assess and implement treatments and strategies to improve the delivery and performance of health systems. ^
Resumo:
Coalescent theory represents the most significant progress in theoretical population genetics in the past three decades. The coalescent theory states that all genes or alleles in a given population are ultimately inherited from a single ancestor shared by all members of the population, known as the most recent common ancestor. It is now widely recognized as a cornerstone for rigorous statistical analyses of molecular data from population [1]. The scientists have developed a large number of coalescent models and methods[2,3,4,5,6], which are not only applied in coalescent analysis and process, but also in today’s population genetics and genome studies, even public health. The thesis aims at completing a statistical framework based on computers for coalescent analysis. This framework provides a large number of coalescent models and statistic methods to assist students and researchers in coalescent analysis, whose results are presented in various formats as texts, graphics and printed pages. In particular, it also supports to create new coalescent models and statistical methods. ^
Resumo:
A population-based cross-sectional survey of socio-environmental factors associated with the prevalence of Dracunculus medinensis (guinea worm disease) was conducted in Idere, a rural agricultural community in Ibarapa, Oyo state, Nigeria, during 1982.^ The epidemiologic data were collected by household interview of all 501 households. The environmental data were collected by analysis of water samples collected from all domestic water sources and rainfall records.^ The specific objectives of this research were to: (a) Describe the prevalence of guinea worm disease in Idere during 1982 by age, sex, area of residence, drinking water source, religion and weekly amount of money spent by the household to collect potable drinking water. (b) Compare the characteristics of cases with non-cases of guinea worm in order to identify factors associated with high risk of infection. (c) Investigate domestic water sources for the distribution of Cyclops. (d) Determine the extent of potable water shortage with a view to identifying factors responsible for such shortage in the community. (e) Describe the effects of guinea worm on school attendance during 1980/1982 school years by class and location of school from piped water supply.^ The findings of this research indicate that during 1982, 31.8 percent of Idere's 6,527 residents experienced guinea worm infection, with higher prevalence of infection recorded in males in their most productive years and females in their teenage years. The role of sex and age to risk of higher infection rate was explained in the context of water related exposure and water intake due to dehydration from physical occupational actitives of subgroups.^ Potable water available to residents was considerably below the minimum recommended by WHO for tropical climates, with sixty-eight percent of water needs of the residents coming from unprotected surface water which harbour Cyclops, the obligatory intermediate host of Dracunculus medinensis. An association was found between periods of relative high density of Cyclops in domestic water and rainfall.^ Impact of guinea worm infection on educational activities was considerable and its implications were discussed, including the implications of the research findings in relation to control of guinea worm disease in Ibarapa. ^
Resumo:
In recent years, disaster preparedness through assessment of medical and special needs persons (MSNP) has taken a center place in public eye in effect of frequent natural disasters such as hurricanes, storm surge or tsunami due to climate change and increased human activity on our planet. Statistical methods complex survey design and analysis have equally gained significance as a consequence. However, there exist many challenges still, to infer such assessments over the target population for policy level advocacy and implementation. ^ Objective. This study discusses the use of some of the statistical methods for disaster preparedness and medical needs assessment to facilitate local and state governments for its policy level decision making and logistic support to avoid any loss of life and property in future calamities. ^ Methods. In order to obtain precise and unbiased estimates for Medical Special Needs Persons (MSNP) and disaster preparedness for evacuation in Rio Grande Valley (RGV) of Texas, a stratified and cluster-randomized multi-stage sampling design was implemented. US School of Public Health, Brownsville surveyed 3088 households in three counties namely Cameron, Hidalgo, and Willacy. Multiple statistical methods were implemented and estimates were obtained taking into count probability of selection and clustering effects. Statistical methods for data analysis discussed were Multivariate Linear Regression (MLR), Survey Linear Regression (Svy-Reg), Generalized Estimation Equation (GEE) and Multilevel Mixed Models (MLM) all with and without sampling weights. ^ Results. Estimated population for RGV was 1,146,796. There were 51.5% female, 90% Hispanic, 73% married, 56% unemployed and 37% with their personal transport. 40% people attained education up to elementary school, another 42% reaching high school and only 18% went to college. Median household income is less than $15,000/year. MSNP estimated to be 44,196 (3.98%) [95% CI: 39,029; 51,123]. All statistical models are in concordance with MSNP estimates ranging from 44,000 to 48,000. MSNP estimates for statistical methods are: MLR (47,707; 95% CI: 42,462; 52,999), MLR with weights (45,882; 95% CI: 39,792; 51,972), Bootstrap Regression (47,730; 95% CI: 41,629; 53,785), GEE (47,649; 95% CI: 41,629; 53,670), GEE with weights (45,076; 95% CI: 39,029; 51,123), Svy-Reg (44,196; 95% CI: 40,004; 48,390) and MLM (46,513; 95% CI: 39,869; 53,157). ^ Conclusion. RGV is a flood zone, most susceptible to hurricanes and other natural disasters. People in the region are mostly Hispanic, under-educated with least income levels in the U.S. In case of any disaster people in large are incapacitated with only 37% have their personal transport to take care of MSNP. Local and state government’s intervention in terms of planning, preparation and support for evacuation is necessary in any such disaster to avoid loss of precious human life. ^ Key words: Complex Surveys, statistical methods, multilevel models, cluster randomized, sampling weights, raking, survey regression, generalized estimation equations (GEE), random effects, Intracluster correlation coefficient (ICC).^