989 resultados para Statistical Theory
Resumo:
To account for the preponderance of zero counts and simultaneous correlation of observations, a class of zero-inflated Poisson mixed regression models is applicable for accommodating the within-cluster dependence. In this paper, a score test for zero-inflation is developed for assessing correlated count data with excess zeros. The sampling distribution and the power of the test statistic are evaluated by simulation studies. The results show that the test statistic performs satisfactorily under a wide range of conditions. The test procedure is further illustrated using a data set on recurrent urinary tract infections. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
Standard factorial designs sometimes may be inadequate for experiments that aim to estimate a generalized linear model, for example, for describing a binary response in terms of several variables. A method is proposed for finding exact designs for such experiments that uses a criterion allowing for uncertainty in the link function, the linear predictor, or the model parameters, together with a design search. Designs are assessed and compared by simulation of the distribution of efficiencies relative to locally optimal designs over a space of possible models. Exact designs are investigated for two applications, and their advantages over factorial and central composite designs are demonstrated.
Resumo:
The estimation of P(S-n > u) by simulation, where S, is the sum of independent. identically distributed random varibles Y-1,..., Y-n, is of importance in many applications. We propose two simulation estimators based upon the identity P(S-n > u) = nP(S, > u, M-n = Y-n), where M-n = max(Y-1,..., Y-n). One estimator uses importance sampling (for Y-n only), and the other uses conditional Monte Carlo conditioning upon Y1,..., Yn-1. Properties of the relative error of the estimators are derived and a numerical study given in terms of the M/G/1 queue in which n is replaced by an independent geometric random variable N. The conclusion is that the new estimators compare extremely favorably with previous ones. In particular, the conditional Monte Carlo estimator is the first heavy-tailed example of an estimator with bounded relative error. Further improvements are obtained in the random-N case, by incorporating control variates and stratification techniques into the new estimation procedures.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, delta and epsilon, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a metachain to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely. [Bayesian phylogenetic inference; heating parameter; Markov chain Monte Carlo; replicated chains.]
Resumo:
Pharmacodynamics (PD) is the study of the biochemical and physiological effects of drugs. The construction of optimal designs for dose-ranging trials with multiple periods is considered in this paper, where the outcome of the trial (the effect of the drug) is considered to be a binary response: the success or failure of a drug to bring about a particular change in the subject after a given amount of time. The carryover effect of each dose from one period to the next is assumed to be proportional to the direct effect. It is shown for a logistic regression model that the efficiency of optimal parallel (single-period) or crossover (two-period) design is substantially greater than a balanced design. The optimal designs are also shown to be robust to misspecification of the value of the parameters. Finally, the parallel and crossover designs are combined to provide the experimenter with greater flexibility.
Resumo:
In early generation variety trials, large numbers of new breeders' lines need to be compared, and usually there is little seed available for each new line. A so-called unreplicated trial has each new line on just one plot at a site, but includes several (often around five) replicated check or control (or standard) varieties. The total proportion of check plots is usually between 10% and 20%. The aim of the trial is to choose some good performing lines (usually around 1/3 of those tested) to go on for further testing, rather than precise estimation of their mean yield. Now that spatial analyses of data from field experiments are becoming more common, there is interest in an efficient layout of an experiment given a proposed spatial analysis. Some possible design criteria are discussed, and efficient layouts under spatial dependence are considered.
Resumo:
In early generation variety trials, large numbers of new breeders' lines (varieties) may be compared, with each having little seed available. A so-called unreplicated trial has each new variety on just one plot at a site, but includes several replicated control varieties, making up around 10% and 20% of the trial. The aim of the trial is to choose some (usually around one third) good performing new varieties to go on for further testing, rather than precise estimation of their mean yields. Now that spatial analyses of data from field experiments are becoming more common, there is interest in an efficient layout of an experiment given a proposed spatial analysis and an efficiency criterion. Common optimal design criteria values depend on the usual C-matrix, which is very large, and hence it is time consuming to calculate its inverse. Since most varieties are unreplicated, the variety incidence matrix has a simple form, and some matrix manipulations can dramatically reduce the computation needed. However, there are many designs to compare, and numerical optimisation lacks insight into good design features. Some possible design criteria are discussed, and approximations to their values considered. These allow the features of efficient layouts under spatial dependence to be given and compared. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
We consider the problems of computing the power and exponential moments EXs and EetX of square Gaussian random matrices X=A+BWC for positive integer s and real t, where W is a standard normal random vector and A, B, C are appropriately dimensioned constant matrices. We solve the problems by a matrix product scalarization technique and interpret the solutions in system-theoretic terms. The results of the paper are applicable to Bayesian prediction in multivariate autoregressive time series and mean-reverting diffusion processes.
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.
Resumo:
The n-tuple pattern recognition method has been tested using a selection of 11 large data sets from the European Community StatLog project, so that the results could be compared with those reported for the 23 other algorithms the project tested. The results indicate that this ultra-fast memory-based method is a viable competitor with the others, which include optimisation-based neural network algorithms, even though the theory of memory-based neural computing is less highly developed in terms of statistical theory.