972 resultados para Statistical Computation
Resumo:
In this paper, we proposed a flexible cure rate survival model by assuming the number of competing causes of the event of interest following the Conway-Maxwell distribution and the time for the event to follow the generalized gamma distribution. This distribution can be used to model survival data when the hazard rate function is increasing, decreasing, bathtub and unimodal-shaped including some distributions commonly used in lifetime analysis as particular cases. Some appropriate matrices are derived in order to evaluate local influence on the estimates of the parameters by considering different perturbations, and some global influence measurements are also investigated. Finally, data set from the medical area is analysed.
Resumo:
For any continuous baseline G distribution [G. M. Cordeiro and M. de Castro, A new family of generalized distributions, J. Statist. Comput. Simul. 81 (2011), pp. 883-898], proposed a new generalized distribution (denoted here with the prefix 'Kw-G'(Kumaraswamy-G)) with two extra positive parameters. They studied some of its mathematical properties and presented special sub-models. We derive a simple representation for the Kw-Gdensity function as a linear combination of exponentiated-G distributions. Some new distributions are proposed as sub-models of this family, for example, the Kw-Chen [Z.A. Chen, A new two-parameter lifetime distribution with bathtub shape or increasing failure rate function, Statist. Probab. Lett. 49 (2000), pp. 155-161], Kw-XTG [M. Xie, Y. Tang, and T.N. Goh, A modified Weibull extension with bathtub failure rate function, Reliab. Eng. System Safety 76 (2002), pp. 279-285] and Kw-Flexible Weibull [M. Bebbington, C. D. Lai, and R. Zitikis, A flexible Weibull extension, Reliab. Eng. System Safety 92 (2007), pp. 719-726]. New properties of the Kw-G distribution are derived which include asymptotes, shapes, moments, moment generating function, mean deviations, Bonferroni and Lorenz curves, reliability, Renyi entropy and Shannon entropy. New properties of the order statistics are investigated. We discuss the estimation of the parameters by maximum likelihood. We provide two applications to real data sets and discuss a bivariate extension of the Kw-G distribution.
Resumo:
Lemonte and Cordeiro [Birnbaum-Saunders nonlinear regression models, Comput. Stat. Data Anal. 53 (2009), pp. 4441-4452] introduced a class of Birnbaum-Saunders (BS) nonlinear regression models potentially useful in lifetime data analysis. We give a general matrix Bartlett correction formula to improve the likelihood ratio (LR) tests in these models. The formula is simple enough to be used analytically to obtain several closed-form expressions in special cases. Our results generalize those in Lemonte et al. [Improved likelihood inference in Birnbaum-Saunders regressions, Comput. Stat. DataAnal. 54 (2010), pp. 1307-1316], which hold only for the BS linear regression models. We consider Monte Carlo simulations to show that the corrected tests work better than the usual LR tests.
Resumo:
This paper introduces a skewed log-Birnbaum-Saunders regression model based on the skewed sinh-normal distribution proposed by Leiva et al. [A skewed sinh-normal distribution and its properties and application to air pollution, Comm. Statist. Theory Methods 39 (2010), pp. 426-443]. Some influence methods, such as the local influence and generalized leverage, are presented. Additionally, we derived the normal curvatures of local influence under some perturbation schemes. An empirical application to a real data set is presented in order to illustrate the usefulness of the proposed model.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.
Resumo:
For the first time, we introduce a generalized form of the exponentiated generalized gamma distribution [Cordeiro et al. The exponentiated generalized gamma distribution with application to lifetime data, J. Statist. Comput. Simul. 81 (2011), pp. 827-842.] that is the baseline for the log-exponentiated generalized gamma regression model. The new distribution can accommodate increasing, decreasing, bathtub- and unimodal-shaped hazard functions. A second advantage is that it includes classical distributions reported in the lifetime literature as special cases. We obtain explicit expressions for the moments of the baseline distribution of the new regression model. The proposed model can be applied to censored data since it includes as sub-models several widely known regression models. It therefore can be used more effectively in the analysis of survival data. We obtain maximum likelihood estimates for the model parameters by considering censored data. We show that our extended regression model is very useful by means of two applications to real data.
Resumo:
This article proposes computing sensitivities of upper tail probabilities of random sums by the saddlepoint approximation. The considered sensitivity is the derivative of the upper tail probability with respect to the parameter of the summation index distribution. Random sums with Poisson or Geometric distributed summation indices and Gamma or Weibull distributed summands are considered. The score method with importance sampling is considered as an alternative approximation. Numerical studies show that the saddlepoint approximation and the method of score with importance sampling are very accurate. But the saddlepoint approximation is substantially faster than the score method with importance sampling. Thus, the suggested saddlepoint approximation can be conveniently used in various scientific problems.
Resumo:
The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.
Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.
Resumo:
The climate belongs to the class of non-equilibrium forced and dissipative systems, for which most results of quasi-equilibrium statistical mechanics, including the fluctuation-dissipation theorem, do not apply. In this paper we show for the first time how the Ruelle linear response theory, developed for studying rigorously the impact of perturbations on general observables of non-equilibrium statistical mechanical systems, can be applied with great success to analyze the climatic response to general forcings. The crucial value of the Ruelle theory lies in the fact that it allows to compute the response of the system in terms of expectation values of explicit and computable functions of the phase space averaged over the invariant measure of the unperturbed state. We choose as test bed a classical version of the Lorenz 96 model, which, in spite of its simplicity, has a well-recognized prototypical value as it is a spatially extended one-dimensional model and presents the basic ingredients, such as dissipation, advection and the presence of an external forcing, of the actual atmosphere. We recapitulate the main aspects of the general response theory and propose some new general results. We then analyze the frequency dependence of the response of both local and global observables to perturbations having localized as well as global spatial patterns. We derive analytically several properties of the corresponding susceptibilities, such as asymptotic behavior, validity of Kramers-Kronig relations, and sum rules, whose main ingredient is the causality principle. We show that all the coefficients of the leading asymptotic expansions as well as the integral constraints can be written as linear function of parameters that describe the unperturbed properties of the system, such as its average energy. Some newly obtained empirical closure equations for such parameters allow to define such properties as an explicit function of the unperturbed forcing parameter alone for a general class of chaotic Lorenz 96 models. We then verify the theoretical predictions from the outputs of the simulations up to a high degree of precision. The theory is used to explain differences in the response of local and global observables, to define the intensive properties of the system, which do not depend on the spatial resolution of the Lorenz 96 model, and to generalize the concept of climate sensitivity to all time scales. We also show how to reconstruct the linear Green function, which maps perturbations of general time patterns into changes in the expectation value of the considered observable for finite as well as infinite time. Finally, we propose a simple yet general methodology to study general Climate Change problems on virtually any time scale by resorting to only well selected simulations, and by taking full advantage of ensemble methods. The specific case of globally averaged surface temperature response to a general pattern of change of the CO2 concentration is discussed. We believe that the proposed approach may constitute a mathematically rigorous and practically very effective way to approach the problem of climate sensitivity, climate prediction, and climate change from a radically new perspective.
Resumo:
We analytically study the input-output properties of a neuron whose active dendritic tree, modeled as a Cayley tree of excitable elements, is subjected to Poisson stimulus. Both single-site and two-site mean-field approximations incorrectly predict a nonequilibrium phase transition which is not allowed in the model. We propose an excitable-wave mean-field approximation which shows good agreement with previously published simulation results [Gollo et al., PLoS Comput. Biol. 5, e1000402 (2009)] and accounts for finite-size effects. We also discuss the relevance of our results to experiments in neuroscience, emphasizing the role of active dendrites in the enhancement of dynamic range and in gain control modulation.
Resumo:
Thesis (M. S.)--University of Illinois at Urbana-Champaign.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
The Wigner higher order moment spectra (WHOS)are defined as extensions of the Wigner-Ville distribution (WD)to higher order moment spectra domains. A general class oftime-frequency higher order moment spectra is also defined interms of arbitrary higher order moments of the signal as generalizations of the Cohen’s general class of time-frequency representations. The properties of the general class of time-frequency higher order moment spectra can be related to theproperties of WHOS which are, in fact, extensions of the properties of the WD. Discrete time and frequency Wigner higherorder moment spectra (DTF-WHOS) distributions are introduced for signal processing applications and are shown to beimplemented with two FFT-based algorithms. One applicationis presented where the Wigner bispectrum (WB), which is aWHOS in the third-order moment domain, is utilized for thedetection of transient signals embedded in noise. The WB iscompared with the WD in terms of simulation examples andanalysis of real sonar data. It is shown that better detectionschemes can be derived, in low signal-to-noise ratio, when theWB is applied.