963 resultados para Statistical models


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Quantifying the health effects associated with simultaneous exposure to many air pollutants is now a research priority of the US EPA. Bayesian hierarchical models (BHM) have been extensively used in multisite time series studies of air pollution and health to estimate health effects of a single pollutant adjusted for potential confounding of other pollutants and other time-varying factors. However, when the scientific goal is to estimate the impacts of many pollutants jointly, a straightforward application of BHM is challenged by the need to specify a random-effect distribution on a high-dimensional vector of nuisance parameters, which often do not have an easy interpretation. In this paper we introduce a new BHM formulation, which we call "reduced BHM", aimed at analyzing clustered data sets in the presence of a large number of random effects that are not of primary scientific interest. At the first stage of the reduced BHM, we calculate the integrated likelihood of the parameter of interest (e.g. excess number of deaths attributed to simultaneous exposure to high levels of many pollutants). At the second stage, we specify a flexible random-effect distribution directly on the parameter of interest. The reduced BHM overcomes many of the challenges in the specification and implementation of full BHM in the context of a large number of nuisance parameters. In simulation studies we show that the reduced BHM performs comparably to the full BHM in many scenarios, and even performs better in some cases. Methods are applied to estimate location-specific and overall relative risks of cardiovascular hospital admissions associated with simultaneous exposure to elevated levels of particulate matter and ozone in 51 US counties during the period 1999-2005.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Clustered data analysis is characterized by the need to describe both systematic variation in a mean model and cluster-dependent random variation in an association model. Marginalized multilevel models embrace the robustness and interpretations of a marginal mean model, while retaining the likelihood inference capabilities and flexible dependence structures of a conditional association model. Although there has been increasing recognition of the attractiveness of marginalized multilevel models, there has been a gap in their practical application arising from a lack of readily available estimation procedures. We extend the marginalized multilevel model to allow for nonlinear functions in both the mean and association aspects. We then formulate marginal models through conditional specifications to facilitate estimation with mixed model computational solutions already in place. We illustrate this approach on a cerebrovascular deficiency crossover trial.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In evaluating the accuracy of diagnosis tests, it is common to apply two imperfect tests jointly or sequentially to a study population. In a recent meta-analysis of the accuracy of microsatellite instability testing (MSI) and traditional mutation analysis (MUT) in predicting germline mutations of the mismatch repair (MMR) genes, a Bayesian approach (Chen, Watson, and Parmigiani 2005) was proposed to handle missing data resulting from partial testing and the lack of a gold standard. In this paper, we demonstrate an improved estimation of the sensitivities and specificities of MSI and MUT by using a nonlinear mixed model and a Bayesian hierarchical model, both of which account for the heterogeneity across studies through study-specific random effects. The methods can be used to estimate the accuracy of two imperfect diagnostic tests in other meta-analyses when the prevalence of disease, the sensitivities and/or the specificities of diagnostic tests are heterogeneous among studies. Furthermore, simulation studies have demonstrated the importance of carefully selecting appropriate random effects on the estimation of diagnostic accuracy measurements in this scenario.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social sciences and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this paper, we develop multilevel latent class model, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the Expectation-Maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the Obsessive Compulsive Disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for latent class analysis of multilevel data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dr. Rossi discusses the common errors that are made when fitting statistical models to data. Focuses on the planning, data analysis, and interpretation phases of a statistical analysis, and highlights the errors that are commonly made by researchers of these phases. The implications of these commonly made errors are discussed along with a discussion of the methods that can be used to prevent these errors from occurring. A prescription for carrying out a correct statistical analysis will be discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

It is system dynamics that determines the function of cells, tissues and organisms. To develop mathematical models and estimate their parameters are an essential issue for studying dynamic behaviors of biological systems which include metabolic networks, genetic regulatory networks and signal transduction pathways, under perturbation of external stimuli. In general, biological dynamic systems are partially observed. Therefore, a natural way to model dynamic biological systems is to employ nonlinear state-space equations. Although statistical methods for parameter estimation of linear models in biological dynamic systems have been developed intensively in the recent years, the estimation of both states and parameters of nonlinear dynamic systems remains a challenging task. In this report, we apply extended Kalman Filter (EKF) to the estimation of both states and parameters of nonlinear state-space models. To evaluate the performance of the EKF for parameter estimation, we apply the EKF to a simulation dataset and two real datasets: JAK-STAT signal transduction pathway and Ras/Raf/MEK/ERK signaling transduction pathways datasets. The preliminary results show that EKF can accurately estimate the parameters and predict states in nonlinear state-space equations for modeling dynamic biochemical networks.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In recent years, disaster preparedness through assessment of medical and special needs persons (MSNP) has taken a center place in public eye in effect of frequent natural disasters such as hurricanes, storm surge or tsunami due to climate change and increased human activity on our planet. Statistical methods complex survey design and analysis have equally gained significance as a consequence. However, there exist many challenges still, to infer such assessments over the target population for policy level advocacy and implementation. ^ Objective. This study discusses the use of some of the statistical methods for disaster preparedness and medical needs assessment to facilitate local and state governments for its policy level decision making and logistic support to avoid any loss of life and property in future calamities. ^ Methods. In order to obtain precise and unbiased estimates for Medical Special Needs Persons (MSNP) and disaster preparedness for evacuation in Rio Grande Valley (RGV) of Texas, a stratified and cluster-randomized multi-stage sampling design was implemented. US School of Public Health, Brownsville surveyed 3088 households in three counties namely Cameron, Hidalgo, and Willacy. Multiple statistical methods were implemented and estimates were obtained taking into count probability of selection and clustering effects. Statistical methods for data analysis discussed were Multivariate Linear Regression (MLR), Survey Linear Regression (Svy-Reg), Generalized Estimation Equation (GEE) and Multilevel Mixed Models (MLM) all with and without sampling weights. ^ Results. Estimated population for RGV was 1,146,796. There were 51.5% female, 90% Hispanic, 73% married, 56% unemployed and 37% with their personal transport. 40% people attained education up to elementary school, another 42% reaching high school and only 18% went to college. Median household income is less than $15,000/year. MSNP estimated to be 44,196 (3.98%) [95% CI: 39,029; 51,123]. All statistical models are in concordance with MSNP estimates ranging from 44,000 to 48,000. MSNP estimates for statistical methods are: MLR (47,707; 95% CI: 42,462; 52,999), MLR with weights (45,882; 95% CI: 39,792; 51,972), Bootstrap Regression (47,730; 95% CI: 41,629; 53,785), GEE (47,649; 95% CI: 41,629; 53,670), GEE with weights (45,076; 95% CI: 39,029; 51,123), Svy-Reg (44,196; 95% CI: 40,004; 48,390) and MLM (46,513; 95% CI: 39,869; 53,157). ^ Conclusion. RGV is a flood zone, most susceptible to hurricanes and other natural disasters. People in the region are mostly Hispanic, under-educated with least income levels in the U.S. In case of any disaster people in large are incapacitated with only 37% have their personal transport to take care of MSNP. Local and state government’s intervention in terms of planning, preparation and support for evacuation is necessary in any such disaster to avoid loss of precious human life. ^ Key words: Complex Surveys, statistical methods, multilevel models, cluster randomized, sampling weights, raking, survey regression, generalized estimation equations (GEE), random effects, Intracluster correlation coefficient (ICC).^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

My dissertation focuses on developing methods for gene-gene/environment interactions and imprinting effect detections for human complex diseases and quantitative traits. It includes three sections: (1) generalizing the Natural and Orthogonal interaction (NOIA) model for the coding technique originally developed for gene-gene (GxG) interaction and also to reduced models; (2) developing a novel statistical approach that allows for modeling gene-environment (GxE) interactions influencing disease risk, and (3) developing a statistical approach for modeling genetic variants displaying parent-of-origin effects (POEs), such as imprinting. In the past decade, genetic researchers have identified a large number of causal variants for human genetic diseases and traits by single-locus analysis, and interaction has now become a hot topic in the effort to search for the complex network between multiple genes or environmental exposures contributing to the outcome. Epistasis, also known as gene-gene interaction is the departure from additive genetic effects from several genes to a trait, which means that the same alleles of one gene could display different genetic effects under different genetic backgrounds. In this study, we propose to implement the NOIA model for association studies along with interaction for human complex traits and diseases. We compare the performance of the new statistical models we developed and the usual functional model by both simulation study and real data analysis. Both simulation and real data analysis revealed higher power of the NOIA GxG interaction model for detecting both main genetic effects and interaction effects. Through application on a melanoma dataset, we confirmed the previously identified significant regions for melanoma risk at 15q13.1, 16q24.3 and 9p21.3. We also identified potential interactions with these significant regions that contribute to melanoma risk. Based on the NOIA model, we developed a novel statistical approach that allows us to model effects from a genetic factor and binary environmental exposure that are jointly influencing disease risk. Both simulation and real data analyses revealed higher power of the NOIA model for detecting both main genetic effects and interaction effects for both quantitative and binary traits. We also found that estimates of the parameters from logistic regression for binary traits are no longer statistically uncorrelated under the alternative model when there is an association. Applying our novel approach to a lung cancer dataset, we confirmed four SNPs in 5p15 and 15q25 region to be significantly associated with lung cancer risk in Caucasians population: rs2736100, rs402710, rs16969968 and rs8034191. We also validated that rs16969968 and rs8034191 in 15q25 region are significantly interacting with smoking in Caucasian population. Our approach identified the potential interactions of SNP rs2256543 in 6p21 with smoking on contributing to lung cancer risk. Genetic imprinting is the most well-known cause for parent-of-origin effect (POE) whereby a gene is differentially expressed depending on the parental origin of the same alleles. Genetic imprinting affects several human disorders, including diabetes, breast cancer, alcoholism, and obesity. This phenomenon has been shown to be important for normal embryonic development in mammals. Traditional association approaches ignore this important genetic phenomenon. In this study, we propose a NOIA framework for a single locus association study that estimates both main allelic effects and POEs. We develop statistical (Stat-POE) and functional (Func-POE) models, and demonstrate conditions for orthogonality of the Stat-POE model. We conducted simulations for both quantitative and qualitative traits to evaluate the performance of the statistical and functional models with different levels of POEs. Our results showed that the newly proposed Stat-POE model, which ensures orthogonality of variance components if Hardy-Weinberg Equilibrium (HWE) or equal minor and major allele frequencies is satisfied, had greater power for detecting the main allelic additive effect than a Func-POE model, which codes according to allelic substitutions, for both quantitative and qualitative traits. The power for detecting the POE was the same for the Stat-POE and Func-POE models under HWE for quantitative traits.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62P10, 92C40

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Uncertainty quantification (UQ) is both an old and new concept. The current novelty lies in the interactions and synthesis of mathematical models, computer experiments, statistics, field/real experiments, and probability theory, with a particular emphasize on the large-scale simulations by computer models. The challenges not only come from the complication of scientific questions, but also from the size of the information. It is the focus in this thesis to provide statistical models that are scalable to massive data produced in computer experiments and real experiments, through fast and robust statistical inference.

Chapter 2 provides a practical approach for simultaneously emulating/approximating massive number of functions, with the application on hazard quantification of Soufri\`{e}re Hills volcano in Montserrate island. Chapter 3 discusses another problem with massive data, in which the number of observations of a function is large. An exact algorithm that is linear in time is developed for the problem of interpolation of Methylation levels. Chapter 4 and Chapter 5 are both about the robust inference of the models. Chapter 4 provides a new criteria robustness parameter estimation criteria and several ways of inference have been shown to satisfy such criteria. Chapter 5 develops a new prior that satisfies some more criteria and is thus proposed to use in practice.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization–maximization (MM) algorithm with a Nelder–Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Excess nutrient loads carried by streams and rivers are a great concern for environmental resource managers. In agricultural regions, excess loads are transported downstream to receiving water bodies, potentially causing algal blooms, which could lead to numerous ecological problems. To better understand nutrient load transport, and to develop appropriate water management plans, it is important to have accurate estimates of annual nutrient loads. This study used a Monte Carlo sub-sampling method and error-corrected statistical models to estimate annual nitrate-N loads from two watersheds in central Illinois. The performance of three load estimation methods (the seven-parameter log-linear model, the ratio estimator, and the flow-weighted averaging estimator) applied at one-, two-, four-, six-, and eight-week sampling frequencies were compared. Five error correction techniques; the existing composite method, and four new error correction techniques developed in this study; were applied to each combination of sampling frequency and load estimation method. On average, the most accurate error reduction technique, (proportional rectangular) resulted in 15% and 30% more accurate load estimates when compared to the most accurate uncorrected load estimation method (ratio estimator) for the two watersheds. Using error correction methods, it is possible to design more cost-effective monitoring plans by achieving the same load estimation accuracy with fewer observations. Finally, the optimum combinations of monitoring threshold and sampling frequency that minimizes the number of samples required to achieve specified levels of accuracy in load estimation were determined. For one- to three-weeks sampling frequencies, combined threshold/fixed-interval monitoring approaches produced the best outcomes, while fixed-interval-only approaches produced the most accurate results for four- to eight-weeks sampling frequencies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of this study was to gain an understanding of the effects of population heterogeneity, missing data, and causal relationships on parameter estimates from statistical models when analyzing change in medication use. From a public health perspective, two timely topics were addressed: the use and effects of statins in populations in primary prevention of cardiovascular disease and polypharmacy in older population. Growth mixture models were applied to characterize the accumulation of cardiovascular and diabetes medications among apparently healthy population of statin initiators. The causal effect of statin adherence on the incidence of acute cardiovascular events was estimated using marginal structural models in comparison with discrete-time hazards models. The impact of missing data on the growth estimates of evolution of polypharmacy was examined comparing statistical models under different assumptions for missing data mechanism. The data came from Finnish administrative registers and from the population-based Geriatric Multidisciplinary Strategy for the Good Care of the Elderly study conducted in Kuopio, Finland, during 2004–07. Five distinct patterns of accumulating medications emerged among the population of apparently healthy statin initiators during two years after statin initiation. Proper accounting for time-varying dependencies between adherence to statins and confounders using marginal structural models produced comparable estimation results with those from a discrete-time hazards model. Missing data mechanism was shown to be a key component when estimating the evolution of polypharmacy among older persons. In conclusion, population heterogeneity, missing data and causal relationships are important aspects in longitudinal studies that associate with the study question and should be critically assessed when performing statistical analyses. Analyses should be supplemented with sensitivity analyses towards model assumptions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Aiming to obtain empirical models for the estimation of Syrah leaf area a set of 210 fruiting shoots was randomly collected during the 2013 growing season in an adult experimental vineyard, located in Lisbon, Portugal. Samples of 30 fruiting shoots were taken periodically from the stage of inflorescences visible to veraison (7 sampling dates). At the lab, from each shoot, primary and lateral leaves were separated and numbered according to node insertion. For each leaf, the length of the central and lateral veins was recorded and then the leaf area was measured by a leaf area meter. For single leaf area estimation the best statistical models uses as explanatory variable the sum of the lengths of the two lateral leaf veins. For the estimation of leaf area per shoot it was followed the approach of Lopes & Pinto (2005), based on 3 explanatory variables: number of primary leaves and area of the largest and smallest leaves. The best statistical model for estimation of primary leaf area per shoot uses a calculated variable obtained from the average of the largest and smallest primary leaf area multiplied by the number of primary leaves. For lateral leaf area estimation another model using the same type of calculated variable is also presented. All models explain a very high proportion of variability in leaf area. Our results confirm the already reported strong importance of the three measured variables (number of leaves and area of the largest and smallest leaf) as predictors of the shoot leaf area. The proposed models can be used to accurately predict Syrah primary and secondary leaf area per shoot in any phase of the growing cycle. They are inexpensive, practical, non-destructive methods which do not require specialized staff or expensive equipment.