710 resultados para Variance Models


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last three years, in our Early Algebra Thinking Project, we have been studying Years 3 to 5 students’ ability to generalise in a variety of situations, namely, compensation principles in computation, the balance principle in equivalence and equations, change and inverse change rules with function machines, and pattern rules with growing patterns. In these studies, we have attempted to involve a variety of models and representations and to build students’ abilities to switch between them (in line with the theories of Dreyfus, 1991, and Duval, 1999). The results have shown the negative effect of closure on generalisation in symbolic representations, the predominance of single variance generalisation over covariant generalisation in tabular representations, and the reduced ability to readily identify commonalities and relationships in enactive and iconic representations. This chapter uses the results to explore the interrelation between generalisation and verbal and visual comprehension of context. The studies evidence the importance of understanding and communicating aspects of representational forms which allowed commonalities to be seen across or between representations. Finally the chapter explores the implications of the studies for a theory that describes a growth in integration of models and representations that leads to generalisation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent efforts in mission planning for underwater vehicles have utilised predictive models to aid in navigation, optimal path planning and drive opportunistic sampling. Although these models provide information at a unprecedented resolutions and have proven to increase accuracy and effectiveness in multiple campaigns, most are deterministic in nature. Thus, predictions cannot be incorporated into probabilistic planning frameworks, nor do they provide any metric on the variance or confidence of the output variables. In this paper, we provide an initial investigation into determining the confidence of ocean model predictions based on the results of multiple field deployments of two autonomous underwater vehicles. For multiple missions conducted over a two-month period in 2011, we compare actual vehicle executions to simulations of the same missions through the Regional Ocean Modeling System in an ocean region off the coast of southern California. This comparison provides a qualitative analysis of the current velocity predictions for areas within the selected deployment region. Ultimately, we present a spatial heat-map of the correlation between the ocean model predictions and the actual mission executions. Knowing where the model provides unreliable predictions can be incorporated into planners to increase the utility and application of the deterministic estimations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Animal models typically require a known genetic pedigree to estimate quantitative genetic parameters. Here we test whether animal models can alternatively be based on estimates of relatedness derived entirely from molecular marker data. Our case study is the morphology of a wild bird population, for which we report estimates of the genetic variance-covariance matrices (G) of six morphological traits using three methods: the traditional animal model; a molecular marker-based approach to estimate heritability based on Ritland's pairwise regression method; and a new approach using a molecular genealogy arranged in a relatedness matrix (R) to replace the pedigree in an animal model. Using the traditional animal model, we found significant genetic variance for all six traits and positive genetic covariance among traits. The pairwise regression method did not return reliable estimates of quantitative genetic parameters in this population, with estimates of genetic variance and covariance typically being very small or negative. In contrast, we found mixed evidence for the use of the pedigree-free animal model. Similar to the pairwise regression method, the pedigree-free approach performed poorly when the full-rank R matrix based on the molecular genealogy was employed. However, performance improved substantially when we reduced the dimensionality of the R matrix in order to maximize the signal to noise ratio. Using reduced-rank R matrices generated estimates of genetic variance that were much closer to those from the traditional model. Nevertheless, this method was less reliable at estimating covariances, which were often estimated to be negative. Taken together, these results suggest that pedigree-free animal models can recover quantitative genetic information, although the signal remains relatively weak. It remains to be determined whether this problem can be overcome by the use of a more powerful battery of molecular markers and improved methods for reconstructing genealogies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a recent paper, Gordon, Muratov, and Shvartsman studied a partial differential equation (PDE) model describing radially symmetric diffusion and degradation in two and three dimensions. They paid particular attention to the local accumulation time (LAT), also known in the literature as the mean action time, which is a spatially dependent timescale that can be used to provide an estimate of the time required for the transient solution to effectively reach steady state. They presented exact results for three-dimensional applications and gave approximate results for the two-dimensional analogue. Here we make two generalizations of Gordon, Muratov, and Shvartsman’s work: (i) we present an exact expression for the LAT in any dimension and (ii) we present an exact expression for the variance of the distribution. The variance provides useful information regarding the spread about the mean that is not captured by the LAT. We conclude by describing further extensions of the model that were not considered by Gordon,Muratov, and Shvartsman. We have found that exact expressions for the LAT can also be derived for these important extensions...

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A modeling paradigm is proposed for covariate, variance and working correlation structure selection for longitudinal data analysis. Appropriate selection of covariates is pertinent to correct variance modeling and selecting the appropriate covariates and variance function is vital to correlation structure selection. This leads to a stepwise model selection procedure that deploys a combination of different model selection criteria. Although these criteria find a common theoretical root based on approximating the Kullback-Leibler distance, they are designed to address different aspects of model selection and have different merits and limitations. For example, the extended quasi-likelihood information criterion (EQIC) with a covariance penalty performs well for covariate selection even when the working variance function is misspecified, but EQIC contains little information on correlation structures. The proposed model selection strategies are outlined and a Monte Carlo assessment of their finite sample properties is reported. Two longitudinal studies are used for illustration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The approach of generalized estimating equations (GEE) is based on the framework of generalized linear models but allows for specification of a working matrix for modeling within-subject correlations. The variance is often assumed to be a known function of the mean. This article investigates the impacts of misspecifying the variance function on estimators of the mean parameters for quantitative responses. Our numerical studies indicate that (1) correct specification of the variance function can improve the estimation efficiency even if the correlation structure is misspecified; (2) misspecification of the variance function impacts much more on estimators for within-cluster covariates than for cluster-level covariates; and (3) if the variance function is misspecified, correct choice of the correlation structure may not necessarily improve estimation efficiency. We illustrate impacts of different variance functions using a real data set from cow growth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical methods are often used to analyse commercial catch and effort data to provide standardised fishing effort and/or a relative index of fish abundance for input into stock assessment models. Achieving reliable results has proved difficult in Australia's Northern Prawn Fishery (NPF), due to a combination of such factors as the biological characteristics of the animals, some aspects of the fleet dynamics, and the changes in fishing technology. For this set of data, we compared four modelling approaches (linear models, mixed models, generalised estimating equations, and generalised linear models) with respect to the outcomes of the standardised fishing effort or the relative index of abundance. We also varied the number and form of vessel covariates in the models. Within a subset of data from this fishery, modelling correlation structures did not alter the conclusions from simpler statistical models. The random-effects models also yielded similar results. This is because the estimators are all consistent even if the correlation structure is mis-specified, and the data set is very large. However, the standard errors from different models differed, suggesting that different methods have different statistical efficiency. We suggest that there is value in modelling the variance function and the correlation structure, to make valid and efficient statistical inferences and gain insight into the data. We found that fishing power was separable from the indices of prawn abundance only when we offset the impact of vessel characteristics at assumed values from external sources. This may be due to the large degree of confounding within the data, and the extreme temporal changes in certain aspects of individual vessels, the fleet and the fleet dynamics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we consider the third-moment structure of a class of time series models. It is often argued that the marginal distribution of financial time series such as returns is skewed. Therefore it is of importance to know what properties a model should possess if it is to accommodate unconditional skewness. We consider modeling the unconditional mean and variance using models that respond nonlinearly or asymmetrically to shocks. We investigate the implications of these models on the third-moment structure of the marginal distribution as well as conditions under which the unconditional distribution exhibits skewness and nonzero third-order autocovariance structure. In this respect, an asymmetric or nonlinear specification of the conditional mean is found to be of greater importance than the properties of the conditional variance. Several examples are discussed and, whenever possible, explicit analytical expressions provided for all third-order moments and cross-moments. Finally, we introduce a new tool, the shock impact curve, for investigating the impact of shocks on the conditional mean squared error of return series.