992 resultados para Statistical Computation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collecting data via a questionnaire and analyzing them while preserving respondents’ privacy may increase the number of respondents and the truthfulness of their responses. It may also reduce the systematic differences between respondents and non-respondents. In this paper, we propose a privacy-preserving method for collecting and analyzing survey responses using secure multi-party computation (SMC). The method is secure under the semi-honest adversarial model. The proposed method computes a wide variety of statistics. Total and stratified statistical counts are computed using the secure protocols developed in this paper. Then, additional statistics, such as a contingency table, a chi-square test, an odds ratio, and logistic regression, are computed within the R statistical environment using the statistical counts as building blocks. The method was evaluated on a questionnaire dataset of 3,158 respondents sampled for a medical study and simulated questionnaire datasets of up to 50,000 respondents. The computation time for the statistical analyses linearly scales as the number of respondents increases. The results show that the method is efficient and scalable for practical use. It can also be used for other applications in which categorical data are collected.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Research has shown that applying the T-2 control chart by using a variable parameters (VP) scheme yields rapid detection of out-of-control states. In this paper, the problem of economic statistical design of the VP T-2 control chart is considered as a double-objective minimization problem with the statistical objective being the adjusted average time to signal and the economic objective being expected cost per hour. We then find the Pareto-optimal designs in which the two objectives are met simultaneously by using a multi-objective genetic algorithm. Through an illustrative example, we show that relatively large benefits can be achieved by applying the VP scheme when compared with usual schemes, and in addition, the multi-objective approach provides the user with designs that are flexible and adaptive.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work, we investigate an alternative bootstrap approach based on a result of Ramsey [F.L. Ramsey, Characterization of the partial autocorrelation function, Ann. Statist. 2 (1974), pp. 1296-1301] and on the Durbin-Levinson algorithm to obtain a surrogate series from linear Gaussian processes with long range dependence. We compare this bootstrap method with other existing procedures in a wide Monte Carlo experiment by estimating, parametrically and semi-parametrically, the memory parameter d. We consider Gaussian and non-Gaussian processes to prove the robustness of the method to deviations from normality. The approach is also useful to estimate confidence intervals for the memory parameter d by improving the coverage level of the interval.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Computer Experiments, consisting of a number of runs of a computer model with different inputs, are now common-place in scientific research. Using a simple fire model for illustration some guidelines are given for the size of a computer experiment. A graph is provided relating the error of prediction to the sample size which should be of use when designing computer experiments. Methods for augmenting computer experiments with extra runs are also described and illustrated. The simplest method involves adding one point at a time choosing that point with the maximum prediction variance. Another method that appears to work well is to choose points from a candidate set with maximum determinant of the variance covariance matrix of predictions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Deterministic computer simulations of physical experiments are now common techniques in science and engineering. Often, physical experiments are too time consuming, expensive or impossible to conduct. Complex computer models or codes, rather than physical experiments lead to the study of computer experiments, which are used to investigate many scientific phenomena of this nature. A computer experiment consists of a number of runs of the computer code with different input choices. The Design and Analysis of Computer Experiments is a rapidly growing technique in statistical experimental design. This thesis investigates some practical issues in the design and analysis of computer experiments and attempts to answer some of the questions faced by experimenters using computer experiments. In particular, the question of the number of computer experiments and how they should be augmented is studied and attention is given to when the response is a function over time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spatial data analysis has become more and more important in the studies of ecology and economics during the last decade. One focus of spatial data analysis is how to select predictors, variance functions and correlation functions. However, in general, the true covariance function is unknown and the working covariance structure is often misspecified. In this paper, our target is to find a good strategy to identify the best model from the candidate set using model selection criteria. This paper is to evaluate the ability of some information criteria (corrected Akaike information criterion, Bayesian information criterion (BIC) and residual information criterion (RIC)) for choosing the optimal model when the working correlation function, the working variance function and the working mean function are correct or misspecified. Simulations are carried out for small to moderate sample sizes. Four candidate covariance functions (exponential, Gaussian, Matern and rational quadratic) are used in simulation studies. With the summary in simulation results, we find that the misspecified working correlation structure can still capture some spatial correlation information in model fitting. When the sample size is large enough, BIC and RIC perform well even if the the working covariance is misspecified. Moreover, the performance of these information criteria is related to the average level of model fitting which can be indicated by the average adjusted R square ( [GRAPHICS] ), and overall RIC performs well.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nolan and Temple Lang argue that “the ability to express statistical computations is an es- sential skill.” A key related capacity is the ability to conduct and present data analysis in a way that another person can understand and replicate. The copy-and-paste workflow that is an artifact of antiquated user-interface design makes reproducibility of statistical analysis more difficult, especially as data become increasingly complex and statistical methods become increasingly sophisticated. R Markdown is a new technology that makes creating fully-reproducible statistical analysis simple and painless. It provides a solution suitable not only for cutting edge research, but also for use in an introductory statistics course. We present experiential and statistical evidence that R Markdown can be used effectively in introductory statistics courses, and discuss its role in the rapidly-changing world of statistical computation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Heterogeneity in lifetime data may be modelled by multiplying an individual's hazard by an unobserved frailty. We test for the presence of frailty of this kind in univariate and bivariate data with Weibull distributed lifetimes, using statistics based on the ordered Cox-Snell residuals from the null model of no frailty. The form of the statistics is suggested by outlier testing in the gamma distribution. We find through simulation that the sum of the k largest or k smallest order statistics, for suitably chosen k , provides a powerful test when the frailty distribution is assumed to be gamma or positive stable, respectively. We provide recommended values of k for sample sizes up to 100 and simple formulae for estimated critical values for tests at the 5% level.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A supersaturated design (SSD) is an experimental plan, useful for evaluating the main effects of m factors with n experimental units when m > n - 1, each factor has two levels and when the first-order effects of only a few factors are expected to have dominant effects on the response. Use of these plans can be extremely cost-effective when it is necessary to screen hundreds or thousands of factors with a limited amount of resources. In this article we describe how to use cyclic balanced incomplete block designs and regular graph designs to construct E (s(2)) optimal and near optimal SSDs when m is a multiple of n - 1. We also provide a table that can be used to construct these designs for screening thousands of factors. We also explain how to obtain SSDs when m is not a multiple of n - 1. Using the table and the approaches given in this paper, SSDs can be developed for designs with up to 24 runs and up to 12,190 factors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Kumaraswamy [Generalized probability density-function for double-bounded random-processes, J. Hydrol. 462 (1980), pp. 79-88] introduced a distribution for double-bounded random processes with hydrological applications. For the first time, based on this distribution, we describe a new family of generalized distributions (denoted with the prefix `Kw`) to extend the normal, Weibull, gamma, Gumbel, inverse Gaussian distributions, among several well-known distributions. Some special distributions in the new family such as the Kw-normal, Kw-Weibull, Kw-gamma, Kw-Gumbel and Kw-inverse Gaussian distribution are discussed. We express the ordinary moments of any Kw generalized distribution as linear functions of probability weighted moments (PWMs) of the parent distribution. We also obtain the ordinary moments of order statistics as functions of PWMs of the baseline distribution. We use the method of maximum likelihood to fit the distributions in the new class and illustrate the potentiality of the new model with an application to real data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The generalized Birnbaum-Saunders distribution pertains to a class of lifetime models including both lighter and heavier tailed distributions. This model adapts well to lifetime data, even when outliers exist, and has other good theoretical properties and application perspectives. However, statistical inference tools may not exist in closed form for this model. Hence, simulation and numerical studies are needed, which require a random number generator. Three different ways to generate observations from this model are considered here. These generators are compared by utilizing a goodness-of-fit procedure as well as their effectiveness in predicting the true parameter values by using Monte Carlo simulations. This goodness-of-fit procedure may also be used as an estimation method. The quality of this estimation method is studied here. Finally, through a real data set, the generalized and classical Birnbaum-Saunders models are compared by using this estimation method.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this article, we compare three residuals based on the deviance component in generalised log-gamma regression models with censored observations. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. For all cases studied, the empirical distributions of the proposed residuals are in general symmetric around zero, but only a martingale-type residual presented negligible kurtosis for the majority of the cases studied. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for the martingale-type residual in generalised log-gamma regression models with censored data. A lifetime data set is analysed under log-gamma regression models and a model checking based on the martingale-type residual is performed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We obtain adjustments to the profile likelihood function in Weibull regression models with and without censoring. Specifically, we consider two different modified profile likelihoods: (i) the one proposed by Cox and Reid [Cox, D.R. and Reid, N., 1987, Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society B, 49, 1-39.], and (ii) an approximation to the one proposed by Barndorff-Nielsen [Barndorff-Nielsen, O.E., 1983, On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70, 343-365.], the approximation having been obtained using the results by Fraser and Reid [Fraser, D.A.S. and Reid, N., 1995, Ancillaries and third-order significance. Utilitas Mathematica, 47, 33-53.] and by Fraser et al. [Fraser, D.A.S., Reid, N. and Wu, J., 1999, A simple formula for tail probabilities for frequentist and Bayesian inference. Biometrika, 86, 655-661.]. We focus on point estimation and likelihood ratio tests on the shape parameter in the class of Weibull regression models. We derive some distributional properties of the different maximum likelihood estimators and likelihood ratio tests. The numerical evidence presented in the paper favors the approximation to Barndorff-Nielsen`s adjustment.