14 resultados para Analysis and statistical methods
em DigitalCommons@The Texas Medical Center
Resumo:
Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^
Resumo:
The pattern of the births during the week has been reported by many studies. The births occurred in weekends are found consistently less then births occurred in weekdays. This study employed two statistical methods, two-way ANOVA and two-way Friedman's test to analyse the daily variations in amount of births of 222,735 births from 2005-2007 in Harris County, Texas. The two methods were compared on their assumptions, procedures and results. Both of the tests showed a significant result which indicated that the births through the week are not uniformly distributed. The result of multiple comparison demonstrated the births occurring on weekends were significantly different than the births occurring on weekdays with least amount on Sundays.^
Resumo:
Studies have shown that rare genetic variants have stronger effects in predisposing common diseases, and several statistical methods have been developed for association studies involving rare variants. In order to better understand how these statistical methods perform, we seek to compare two recently developed rare variant statistical methods (VT and C-alpha) on 10,000 simulated re-sequencing data sets with disease status and the corresponding 10,000 simulated null data sets. The SLC1A1 gene has been suggested to be associated with diastolic blood pressure (DBP) in previous studies. In the current study, we applied VT and C-alpha methods to the empirical re-sequencing data for the SLC1A1 gene from 300 whites and 200 blacks. We found that VT method obtains higher power and performs better than C-alpha method with the simulated data we used. The type I errors were well-controlled for both methods. In addition, both VT and C-alpha methods suggested no statistical evidence for the association between the SLC1A1 gene and DBP. Overall, our findings provided an important comparison of the two statistical methods for future reference and provided preliminary and pioneer findings on the association between the SLC1A1 gene and blood pressure.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
Calcium levels in spines play a significant role in determining the sign and magnitude of synaptic plasticity. The magnitude of calcium influx into spines is highly dependent on influx through N-methyl D-aspartate (NMDA) receptors, and therefore depends on the number of postsynaptic NMDA receptors in each spine. We have calculated previously how the number of postsynaptic NMDA receptors determines the mean and variance of calcium transients in the postsynaptic density, and how this alters the shape of plasticity curves. However, the number of postsynaptic NMDA receptors in the postsynaptic density is not well known. Anatomical methods for estimating the number of NMDA receptors produce estimates that are very different than those produced by physiological techniques. The physiological techniques are based on the statistics of synaptic transmission and it is difficult to experimentally estimate their precision. In this paper we use stochastic simulations in order to test the validity of a physiological estimation technique based on failure analysis. We find that the method is likely to underestimate the number of postsynaptic NMDA receptors, explain the source of the error, and re-derive a more precise estimation technique. We also show that the original failure analysis as well as our improved formulas are not robust to small estimation errors in key parameters.
Resumo:
Background and purpose: Breast cancer continues to be a health problem for women, representing 28 percent of all female cancers and remaining one of the leading causes of death for women. Breast cancer incidence rates become substantial before the age of 50. After menopause, breast cancer incidence rates continue to increase with age creating a long-lasting source of concern (Harris et al., 1992). Mammography, a technique for the detection of breast tumors in their nonpalpable stage when they are most curable, has taken on considerable importance as a public health measure. The lifetime risk of breast cancer is approximately 1 in 9 and occurs over many decades. Recommendations are that screening be periodic in order to detect cancer at early stages. These recommendations, largely, are not followed. Not only are most women not getting regular mammograms, but this circumstance is particularly the case among older women where regular mammography has been proven to reduce mortality by approximately 30 percent. The purpose of this project was to increase our understanding of factors that are associated with stage of readiness to obtain subsequent mammograms. A secondary purpose of this research was to suggest further conceptual considerations toward the extension of the Transtheoretical Model (TTM) of behavior change to repeat screening mammography. ^ Methods. A sample (n = 1,222) of women 50 years and older in a large multi-specialty clinic in Houston, Texas was surveyed by mail questionnaire regarding their previous screening experience and stage of readiness to obtain repeat screening. A computerized database, maintained on all women who undergo mammography at the clinic, was used to identify women who are eligible for the project. The major statistical technique employed to select the significant variables and to examine the man and interaction effects of independent variables on dependent variables was polychotomous stepwise, logistic regression. A prediction model for each stage of readiness definition was estimated. The expected probabilities for stage of readiness were calculated to assess the magnitude and direction of significant predictors. ^ Results. Analysis showed that both ways of defining stage of readiness for obtaining a screening mammogram were associated with specific constructs, including decisional balance and processes of the change. ^ Conclusions. The results of the present study demonstrate that the TTM appears to translate to repeat mammography screening. Findings in the current study also support finding of previous studies that suggest that stage of readiness is associated with respondent decisional balance and the processes of change. ^
Resumo:
This dissertation develops and tests a comparative effectiveness methodology utilizing a novel approach to the application of Data Envelopment Analysis (DEA) in health studies. The concept of performance tiers (PerT) is introduced as terminology to express a relative risk class for individuals within a peer group and the PerT calculation is implemented with operations research (DEA) and spatial algorithms. The analysis results in the discrimination of the individual data observations into a relative risk classification by the DEA-PerT methodology. The performance of two distance measures, kNN (k-nearest neighbor) and Mahalanobis, was subsequently tested to classify new entrants into the appropriate tier. The methods were applied to subject data for the 14 year old cohort in the Project HeartBeat! study.^ The concepts presented herein represent a paradigm shift in the potential for public health applications to identify and respond to individual health status. The resultant classification scheme provides descriptive, and potentially prescriptive, guidance to assess and implement treatments and strategies to improve the delivery and performance of health systems. ^
Resumo:
Coalescent theory represents the most significant progress in theoretical population genetics in the past three decades. The coalescent theory states that all genes or alleles in a given population are ultimately inherited from a single ancestor shared by all members of the population, known as the most recent common ancestor. It is now widely recognized as a cornerstone for rigorous statistical analyses of molecular data from population [1]. The scientists have developed a large number of coalescent models and methods[2,3,4,5,6], which are not only applied in coalescent analysis and process, but also in today’s population genetics and genome studies, even public health. The thesis aims at completing a statistical framework based on computers for coalescent analysis. This framework provides a large number of coalescent models and statistic methods to assist students and researchers in coalescent analysis, whose results are presented in various formats as texts, graphics and printed pages. In particular, it also supports to create new coalescent models and statistical methods. ^
Resumo:
This paper defines and compares several models for describing excess influenza pneumonia mortality in Houston. First, the methodology used by the Center for Disease Control is examined and several variations of this methodology are studied. All of the models examined emphasize the difficulty of omitting epidemic weeks.^ In an attempt to find a better method of describing expected and epidemic mortality, time series methods are examined. Grouping in four-week periods, truncating the data series to adjust epidemic periods, and seasonally-adjusting the series y(,t), by:^ (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)^ is the best method examined. This new series w(,t) is stationary and a moving average model MA(1) gives a good fit for forecasting influenza and pneumonia mortality in Houston.^ Influenza morbidity, other causes of death, sex, race, age, climate variables, environmental factors, and school absenteeism are all examined in terms of their relationship to influenza and pneumonia mortality. Both influenza morbidity and ischemic heart disease mortality show a very high relationship that remains when seasonal trends are removed from the data. However, when jointly modeling the three series it is obvious that the simple time series MA(1) model of truncated, seasonally-adjusted four-week data gives a better forecast.^
Resumo:
Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^
Resumo:
Background. A community-wide outbreak of cryptosporidiosis occurred in Dallas County during the summer of 2008. A subset of cases occurring with onset of illness within a 2 week interval was epidemiologically linked to 2 neighborhood interactive water fountain parks. ^ Methods. A case control study was conducted to evaluate risk factors associated with developing illness with cryptosporidiosis from the fountain parks. Cases were selected from a line list from the epidemiological study. The selection for the controls was either healthy family members or a daycare center nearby. Cases and controls were not matched. ^ Results. Interviews were completed for 44 fountain park attendees who met case definition and 54 community controls. Twenty-seven percent (27.3%) of the cases and 13.0% of the controls were between the ages of 0–4 years. Thirty-nine percent (38.6%) of the cases and 24.1% of the controls were between the ages of 5–13 years. Fourteen percent (13.6%) of the cases and 33.3% of the controls were between the ages of 14–31 years. Twenty percent (20.5%) of the cases and 29.6% of the controls were between the ages of 32–63 years. 47.7% of the cases and 42.6% of the controls were males. Fountain park attendees who reported having been splashed in the face with water were 10 times more likely to become ill than controls (OR = 10.0, 95% CI = 2.8–35.1). Persons who reported having swallowed water from the interactive fountains were 34 times more likely to become ill than controls (OR = 34.3, 95%CI = 9.3–125.7). ^ Conclusion. Prompt reporting of cases, identification of outbreak sources, and immediate implementation of remediation measures were critical in curtailing further transmission from these particular sites through the remainder of the season. This investigation underscores the potential for cryptosporidiosis outbreaks to occur in interactive fountain parks, and the need for enhanced preventive measures in these settings. Education of the public regarding avoidance of behaviors such as drinking water from interactive fountains is also an important component of public health prevention efforts. ^
Resumo:
Background. Research into methods for recovery from fatigue due to exercise is a popular topic among sport medicine, kinesiology and physical therapy. However, both the quantity and quality of studies and a clear solution of recovery are lacking. An analysis of the statistical methods in the existing literature of performance recovery can enhance the quality of research and provide some guidance for future studies. Methods: A literature review was performed using SCOPUS, SPORTDiscus, MEDLINE, CINAHL, Cochrane Library and Science Citation Index Expanded databases to extract the studies related to performance recovery from exercise of human beings. Original studies and their statistical analysis for recovery methods including Active Recovery, Cryotherapy/Contrast Therapy, Massage Therapy, Diet/Ergogenics, and Rehydration were examined. Results: The review produces a Research Design and Statistical Method Analysis Summary. Conclusion: Research design and statistical methods can be improved by using the guideline from the Research Design and Statistical Method Analysis Summary. This summary table lists the potential issues and suggested solutions, such as, sample size calculation, sports specific and research design issues consideration, population and measure markers selection, statistical methods for different analytical requirements, equality of variance and normality of data, post hoc analyses and effect size calculation.^
Resumo:
Most studies of differential gene-expressions have been conducted between two given conditions. The two-condition experimental (TCE) approach is simple in that all genes detected display a common differential expression pattern responsive to a common two-condition difference. Therefore, the genes that are differentially expressed under the other conditions other than the given two conditions are undetectable with the TCE approach. In order to address the problem, we propose a new approach called multiple-condition experiment (MCE) without replication and develop corresponding statistical methods including inference of pairs of conditions for genes, new t-statistics, and a generalized multiple-testing method for any multiple-testing procedure via a control parameter C. We applied these statistical methods to analyze our real MCE data from breast cancer cell lines and found that 85 percent of gene-expression variations were caused by genotypic effects and genotype-ANAX1 overexpression interactions, which agrees well with our expected results. We also applied our methods to the adenoma dataset of Notterman et al. and identified 93 differentially expressed genes that could not be found in TCE. The MCE approach is a conceptual breakthrough in many aspects: (a) many conditions of interests can be conducted simultaneously; (b) study of association between differential expressions of genes and conditions becomes easy; (c) it can provide more precise information for molecular classification and diagnosis of tumors; (d) it can save lot of experimental resources and time for investigators.^
Resumo:
A population-based cross-sectional survey of socio-environmental factors associated with the prevalence of Dracunculus medinensis (guinea worm disease) was conducted in Idere, a rural agricultural community in Ibarapa, Oyo state, Nigeria, during 1982.^ The epidemiologic data were collected by household interview of all 501 households. The environmental data were collected by analysis of water samples collected from all domestic water sources and rainfall records.^ The specific objectives of this research were to: (a) Describe the prevalence of guinea worm disease in Idere during 1982 by age, sex, area of residence, drinking water source, religion and weekly amount of money spent by the household to collect potable drinking water. (b) Compare the characteristics of cases with non-cases of guinea worm in order to identify factors associated with high risk of infection. (c) Investigate domestic water sources for the distribution of Cyclops. (d) Determine the extent of potable water shortage with a view to identifying factors responsible for such shortage in the community. (e) Describe the effects of guinea worm on school attendance during 1980/1982 school years by class and location of school from piped water supply.^ The findings of this research indicate that during 1982, 31.8 percent of Idere's 6,527 residents experienced guinea worm infection, with higher prevalence of infection recorded in males in their most productive years and females in their teenage years. The role of sex and age to risk of higher infection rate was explained in the context of water related exposure and water intake due to dehydration from physical occupational actitives of subgroups.^ Potable water available to residents was considerably below the minimum recommended by WHO for tropical climates, with sixty-eight percent of water needs of the residents coming from unprotected surface water which harbour Cyclops, the obligatory intermediate host of Dracunculus medinensis. An association was found between periods of relative high density of Cyclops in domestic water and rainfall.^ Impact of guinea worm infection on educational activities was considerable and its implications were discussed, including the implications of the research findings in relation to control of guinea worm disease in Ibarapa. ^