963 resultados para Multivariate Normal Distribution
Resumo:
In this article, we study further properties of a skew normal distribution, called the skew-normal-Cauchy (SNC) distribution by Nadarajah and Kotz (2003). A stochastic representation is obtained which allows alternative derivations for moments, moments generating function, and skewness and kurtosis coefficients. Issues related to singularity of the Fisher information matrix are investigated.
Resumo:
In this article, we consider local influence analysis for the skew-normal linear mixed model (SN-LMM). As the observed data log-likelihood associated with the SN-LMM is intractable, Cook`s well-known approach cannot be applied to obtain measures of local influence. Instead, we develop local influence measures following the approach of Zhu and Lee (2001). This approach is based on the use of an EM-type algorithm and is measurement invariant under reparametrizations. Four specific perturbation schemes are discussed. Results obtained for a simulated data set and a real data set are reported, illustrating the usefulness of the proposed methodology.
Resumo:
Competitive Strategy literature predicts three different mechanisms of performance generation, thus distinguishing between firms that have competitive advantage, firms that have competitive disadvantage or firms that have neither. Nonetheless, previous works in the field have fitted a single normal distribution to model firm performance. Here, we develop a new approach that distinguishes among performance generating mechanisms and allows the identification of firms with competitive advantage or disadvantage. Theorizing on the positive feedback loops by which firms with competitive advantage have facilitated access to acquire new resources, we proposed a distribution we believe data on firm performance should follow. We illustrate our model by assessing its fit to data on firm performance, addressing its theoretical implications and comparing it to previous works.
Resumo:
We study the statistical distribution of firm size for USA and Brazilian publicly traded firms through the Zipf plot technique. Sale size is used to measure firm size. The Brazilian firm size distribution is given by a log-normal distribution without any adjustable parameter. However, we also need to consider different parameters of log-normal distribution for the largest firms in the distribution, which are mostly foreign firms. The log-normal distribution has to be gradually truncated after a certain critical value for USA firms. Therefore, the original hypothesis of proportional effect proposed by Gibrat is valid with some modification for very large firms. We also consider the possible mechanisms behind this distribution. (c) 2006 Published by Elsevier B.V.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.
Resumo:
There is an emerging interest in modeling spatially correlated survival data in biomedical and epidemiological studies. In this paper, we propose a new class of semiparametric normal transformation models for right censored spatially correlated survival data. This class of models assumes that survival outcomes marginally follow a Cox proportional hazard model with unspecified baseline hazard, and their joint distribution is obtained by transforming survival outcomes to normal random variables, whose joint distribution is assumed to be multivariate normal with a spatial correlation structure. A key feature of the class of semiparametric normal transformation models is that it provides a rich class of spatial survival models where regression coefficients have population average interpretation and the spatial dependence of survival times is conveniently modeled using the transformed variables by flexible normal random fields. We study the relationship of the spatial correlation structure of the transformed normal variables and the dependence measures of the original survival times. Direct nonparametric maximum likelihood estimation in such models is practically prohibited due to the high dimensional intractable integration of the likelihood function and the infinite dimensional nuisance baseline hazard parameter. We hence develop a class of spatial semiparametric estimating equations, which conveniently estimate the population-level regression coefficients and the dependence parameters simultaneously. We study the asymptotic properties of the proposed estimators, and show that they are consistent and asymptotically normal. The proposed method is illustrated with an analysis of data from the East Boston Ashma Study and its performance is evaluated using simulations.
Resumo:
The discovery that the epsilon 4 allele of the apolipoprotein E (apoE) gene is a putative risk factor for Alzheimer disease (AD) in the general population has highlighted the role of genetic influences in this extremely common and disabling illness. It has long been recognized that another genetic abnormality, trisomy 21 (Down syndrome), is associated with early and severe development of AD neuropathological lesions. It remains a challenge, however, to understand how these facts relate to the pathological changes in the brains of AD patients. We used computerized image analysis to examine the size distribution of one of the characteristic neuropathological lesions in AD, deposits of A beta peptide in senile plaques (SPs). Surprisingly, we find that a log-normal distribution fits the SP size distribution quite well, motivating a porous model of SP morphogenesis. We then analyzed SP size distribution curves in genotypically defined subgroups of AD patients. The data demonstrate that both apoE epsilon 4/AD and trisomy 21/AD lead to increased amyloid deposition, but by apparently different mechanisms. The size distribution curve is shifted toward larger plaques in trisomy 21/AD, probably reflecting increased A beta production. In apoE epsilon 4/AD, the size distribution is unchanged but the number of SP is increased compared to apoE epsilon 3, suggesting increased probability of SP initiation. These results demonstrate that subgroups of AD patients defined on the basis of molecular characteristics have quantitatively different neuropathological phenotypes.
Resumo:
The size frequency distributions of diffuse, primitive and cored senile plaques (SP) were studied in single sections of the temporal lobe from 10 patients with Alzheimer’s disease (AD). The size distribution curves were unimodal and positively skewed. The size distribution curve of the diffuse plaques was shifted towards larger plaques while those of the neuritic and cored plaques were shifted towards smaller plaques. The neuritic/diffuse plaque ratio was maximal in the 11 – 30 micron size class and the cored/ diffuse plaque ratio in the 21 – 30 micron size class. The size distribution curves of the three types of plaque deviated significantly from a log-normal distribution. Distributions expressed on a logarithmic scale were ‘leptokurtic’, i.e. with excess of observations near the mean. These results suggest that SP in AD grow to within a more restricted size range than predicted from a log-normal model. In addition, there appear to be differences in the patterns of growth of diffuse, primitive and cored plaques. If neuritic and cored plaques develop from earlier diffuse plaques, then smaller diffuse plaques are more likely to be converted to mature plaques.
Resumo:
This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.
Resumo:
Suppose two or more variables are jointly normally distributed. If there is a common relationship between these variables it would be very important to quantify this relationship by a parameter called the correlation coefficient which measures its strength, and the use of it can develop an equation for predicting, and ultimately draw testable conclusion about the parent population. This research focused on the correlation coefficient ρ for the bivariate and trivariate normal distribution when equal variances and equal covariances are considered. Particularly, we derived the maximum Likelihood Estimators (MLE) of the distribution parameters assuming all of them are unknown, and we studied the properties and asymptotic distribution of . Showing this asymptotic normality, we were able to construct confidence intervals of the correlation coefficient ρ and test hypothesis about ρ. With a series of simulations, the performance of our new estimators were studied and were compared with those estimators that already exist in the literature. The results indicated that the MLE has a better or similar performance than the others.
Resumo:
A deterministic mathematical model for steady-state unidirectional solidification is proposed to predict the columnar-to-equiaxed transition. In the model, which is an extension to the classic model proposed by Hunt [Hunt JD. Mater Sci Eng 1984;65:75], equiaxed grains nucleate according to either a normal or a log-normal distribution of nucleation undercoolings. Growth maps are constructed, indicating either columnar or equiaxed solidification as a function of the velocity of isotherms and temperature gradient. The fields A columnar and equiaxed growth change significantly with the spread of the nucleation undercooling distribution. Increasing the spread Favors columnar solidification if the dimensionless velocity of the isotherms is larger than 1. For a velocity less than 1, however, equiaxed solidification is initially favored, but columnar solidification is enhanced for a larger increase in the spread. This behavior was confirmed by a stochastic model, which showed that an increase in the distribution spread Could change the grain structure from completely columnar to 50% columnar grains. (c) 2008 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
Resumo:
Over the years, crop insurance programs became the focus of agricultural policy in the USA, Spain, Mexico, and more recently in Brazil. Given the increasing interest in insurance, accurate calculation of the premium rate is of great importance. We address the crop-yield distribution issue and its implications in pricing an insurance contract considering the dynamic structure of the data and incorporating the spatial correlation in the Hierarchical Bayesian framework. Results show that empirical (insurers) rates are higher in low risk areas and lower in high risk areas. Such methodological improvement is primarily important in situations of limited data.
Resumo:
We introduce the log-beta Weibull regression model based on the beta Weibull distribution (Famoye et al., 2005; Lee et al., 2007). We derive expansions for the moment generating function which do not depend on complicated functions. The new regression model represents a parametric family of models that includes as sub-models several widely known regression models that can be applied to censored survival data. We employ a frequentist analysis, a jackknife estimator, and a parametric bootstrap for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Further, for different parameter settings, sample sizes, and censoring percentages, several simulations are performed. In addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We define martingale and deviance residuals to evaluate the model assumptions. The extended regression model is very useful for the analysis of real data and could give more realistic fits than other special regression models.
Resumo:
Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.
Resumo:
In a seminal paper, Aitchison and Lauder (1985) introduced classical kernel densityestimation techniques in the context of compositional data analysis. Indeed, they gavetwo options for the choice of the kernel to be used in the kernel estimator. One ofthese kernels is based on the use the alr transformation on the simplex SD jointly withthe normal distribution on RD-1. However, these authors themselves recognized thatthis method has some deficiencies. A method for overcoming these dificulties based onrecent developments for compositional data analysis and multivariate kernel estimationtheory, combining the ilr transformation with the use of the normal density with a fullbandwidth matrix, was recently proposed in Martín-Fernández, Chacón and Mateu-Figueras (2006). Here we present an extensive simulation study that compares bothmethods in practice, thus exploring the finite-sample behaviour of both estimators