924 resultados para Multivariate statistics
Some characterization problems associated with the bivariate exponential and geometric distributions
Resumo:
It is highly desirable that any multivariate distribution possessescharacteristic properties that are generalisation in some sense of the corresponding results in the univariate case. Therefore it is of interest to examine whether a multivariate distribution can admit such characterizations. In the exponential context, the question to be answered is, in what meaning— ful way can one extend the unique properties in the univariate case in a bivariate set up? Since the lack of memory property is the best studied and most useful property of the exponential law, our first endeavour in the present thesis, is to suitably extend this property and its equivalent forms so as to characterize the Gumbel's bivariate exponential distribution. Though there are many forms of bivariate exponential distributions, a matching interest has not been shown in developing corresponding discrete versions in the form of bivariate geometric distributions. Accordingly, attempt is also made to introduce the geometric version of the Gumbel distribution and examine several of its characteristic properties. A major area where exponential models are successfully applied being reliability theory, we also look into the role of these bivariate laws in that context. The present thesis is organised into five Chapters
Resumo:
Humans distinguish materials such as metal, plastic, and paper effortlessly at a glance. Traditional computer vision systems cannot solve this problem at all. Recognizing surface reflectance properties from a single photograph is difficult because the observed image depends heavily on the amount of light incident from every direction. A mirrored sphere, for example, produces a different image in every environment. To make matters worse, two surfaces with different reflectance properties could produce identical images. The mirrored sphere simply reflects its surroundings, so in the right artificial setting, it could mimic the appearance of a matte ping-pong ball. Yet, humans possess an intuitive sense of what materials typically "look like" in the real world. This thesis develops computational algorithms with a similar ability to recognize reflectance properties from photographs under unknown, real-world illumination conditions. Real-world illumination is complex, with light typically incident on a surface from every direction. We find, however, that real-world illumination patterns are not arbitrary. They exhibit highly predictable spatial structure, which we describe largely in the wavelet domain. Although they differ in several respects from the typical photographs, illumination patterns share much of the regularity described in the natural image statistics literature. These properties of real-world illumination lead to predictable image statistics for a surface with given reflectance properties. We construct a system that classifies a surface according to its reflectance from a single photograph under unknown illuminination. Our algorithm learns relationships between surface reflectance and certain statistics computed from the observed image. Like the human visual system, we solve the otherwise underconstrained inverse problem of reflectance estimation by taking advantage of the statistical regularity of illumination. For surfaces with homogeneous reflectance properties and known geometry, our system rivals human performance.
Resumo:
We formulate density estimation as an inverse operator problem. We then use convergence results of empirical distribution functions to true distribution functions to develop an algorithm for multivariate density estimation. The algorithm is based upon a Support Vector Machine (SVM) approach to solving inverse operator problems. The algorithm is implemented and tested on simulated data from different distributions and different dimensionalities, gaussians and laplacians in $R^2$ and $R^{12}$. A comparison in performance is made with Gaussian Mixture Models (GMMs). Our algorithm does as well or better than the GMMs for the simulations tested and has the added advantage of being automated with respect to parameters.
Resumo:
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples -- in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.
Resumo:
These notes have been prepared as support to a short course on compositional data analysis. Their aim is to transmit the basic concepts and skills for simple applications, thus setting the premises for more advanced projects
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
One of the disadvantages of old age is that there is more past than future: this, however, may be turned into an advantage if the wealth of experience and, hopefully, wisdom gained in the past can be reflected upon and throw some light on possible future trends. To an extent, then, this talk is necessarily personal, certainly nostalgic, but also self critical and inquisitive about our understanding of the discipline of statistics. A number of almost philosophical themes will run through the talk: search for appropriate modelling in relation to the real problem envisaged, emphasis on sensible balances between simplicity and complexity, the relative roles of theory and practice, the nature of communication of inferential ideas to the statistical layman, the inter-related roles of teaching, consultation and research. A list of keywords might be: identification of sample space and its mathematical structure, choices between transform and stay, the role of parametric modelling, the role of a sample space metric, the underused hypothesis lattice, the nature of compositional change, particularly in relation to the modelling of processes. While the main theme will be relevance to compositional data analysis we shall point to substantial implications for general multivariate analysis arising from experience of the development of compositional data analysis…
Resumo:
The literature related to skew–normal distributions has grown rapidly in recent years but at the moment few applications concern the description of natural phenomena with this type of probability models, as well as the interpretation of their parameters. The skew–normal distributions family represents an extension of the normal family to which a parameter (λ) has been added to regulate the skewness. The development of this theoretical field has followed the general tendency in Statistics towards more flexible methods to represent features of the data, as adequately as possible, and to reduce unrealistic assumptions as the normality that underlies most methods of univariate and multivariate analysis. In this paper an investigation on the shape of the frequency distribution of the logratio ln(Cl−/Na+) whose components are related to waters composition for 26 wells, has been performed. Samples have been collected around the active center of Vulcano island (Aeolian archipelago, southern Italy) from 1977 up to now at time intervals of about six months. Data of the logratio have been tentatively modeled by evaluating the performance of the skew–normal model for each well. Values of the λ parameter have been compared by considering temperature and spatial position of the sampling points. Preliminary results indicate that changes in λ values can be related to the nature of environmental processes affecting the data
Resumo:
Precision of released figures is not only an important quality feature of official statistics, it is also essential for a good understanding of the data. In this paper we show a case study of how precision could be conveyed if the multivariate nature of data has to be taken into account. In the official release of the Swiss earnings structure survey, the total salary is broken down into several wage components. We follow Aitchison's approach for the analysis of compositional data, which is based on logratios of components. We first present diferent multivariate analyses of the compositional data whereby the wage components are broken down by economic activity classes. Then we propose a number of ways to assess precision
Resumo:
A compositional time series is obtained when a compositional data vector is observed at different points in time. Inherently, then, a compositional time series is a multivariate time series with important constraints on the variables observed at any instance in time. Although this type of data frequently occurs in situations of real practical interest, a trawl through the statistical literature reveals that research in the field is very much in its infancy and that many theoretical and empirical issues still remain to be addressed. Any appropriate statistical methodology for the analysis of compositional time series must take into account the constraints which are not allowed for by the usual statistical techniques available for analysing multivariate time series. One general approach to analyzing compositional time series consists in the application of an initial transform to break the positive and unit sum constraints, followed by the analysis of the transformed time series using multivariate ARIMA models. In this paper we discuss the use of the additive log-ratio, centred log-ratio and isometric log-ratio transforms. We also present results from an empirical study designed to explore how the selection of the initial transform affects subsequent multivariate ARIMA modelling as well as the quality of the forecasts
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuously cored boreholes, 100 to 220m deep were drilled in the northern part of the Po Plain by Regione Lombardia in the last five years. Quantitative provenance analysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carried out by using multivariate statistical analysis (principal component analysis, PCA, and similarity analysis) on an integrated data set, including high-resolution bulk petrography and heavy-mineral analyses on Pleistocene sands and of 250 major and minor modern rivers draining the southern flank of the Alps from West to East (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations, metamorphic and quartzofeldspathic detritus from the Western and Central Alps was carried from the axial belt to the Po basin longitudinally parallel to the SouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenario rapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset of the first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA and similarity analysis from core samples show that the longitudinal trunk river at this time was shifted southward by the rapid southward and westward progradation of transverse alluvial river systems fed from the Central and Southern Alps. Sediments were transported southward by braided river systems as well as glacial sediments transported by Alpine valley glaciers invaded the alluvial plain. Kew words: Detrital modes; Modern sands; Provenance; Principal Components Analysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing one or more parameters in their definition. Methods that can be linked in this way are correspondence analysis, unweighted or weighted logratio analysis (the latter also known as "spectral mapping"), nonsymmetric correspondence analysis, principal component analysis (with and without logarithmic transformation of the data) and multidimensional scaling. In this presentation I will show how several of these methods, which are frequently used in compositional data analysis, may be linked through parametrizations such as power transformations, linear transformations and convex linear combinations. Since the methods of interest here all lead to visual maps of data, a "movie" can be made where where the linking parameter is allowed to vary in small steps: the results are recalculated "frame by frame" and one can see the smooth change from one method to another. Several of these "movies" will be shown, giving a deeper insight into the similarities and differences between these methods
Resumo:
What test for your data? SPSS Statistics Coach can help
Resumo:
Exercises and solutions for an introductory statistics course for MSc students. Diagrams for the questions are all together in the support.zip file, as .eps files
Resumo:
A causa de los conflictos armados, como el de Colombia, se han desplazado por la fuerza a millones de personas, entre ellas una importante parte de la población infantil. Este estudio tuvo como objetivo evaluar la salud mental de los niños desplazados internos en edad preescolar en Bogotá Colombia, e identificar los determinantes de la salud mental en estos niños. Métodos: Estudio transversal realizado entre 279 niños que asisten a cuatro jardines infantiles en un barrio marginal de Bogotá. La salud mental infantil se evaluó con el instrumento validado de Comportamiento Infantil (CBCL) 1,5-5 años, aplicados a padres y cuidadores. Se realizo un análisis univariado y multivariado de regresión logística para evaluar la asociación entre el desplazamiento y la salud mental de los niños y para identificar las relaciones con la salud mental en los niños desplazados. Resultados: los Niños desplazados (n = 90) se identificaron con más frecuencia sobre los puntos de corte límite para las escalas CBCL que los no desplazados (n = 189) (por ejemplo, problemas totales 46,7 vs 22,8%;p \ 0,001). La asociación entre el desplazamiento y la presencia de problemas CBCL totales se mantuvo después del ajuste por factores socio-demográficos (OR Ajustado 3.3 del 95%: 1,5; 6,9). Donde la salud mental del cuidador explica en parte la asociación. En los niños desplazados, la salud mental del cuidador (p \ 0,01) y el funcionamiento familiar (p \ 0,01) se asociaron independientemente con la salud mental de los niños. La exposición a eventos traumáticos y el apoyo social también se asociaron con la salud mental del niño, sin embargo, las asociaciones no fueron independientes. Conclusión: En este barrio marginal de Bogotá, los niños en edad preescolar registrados como desplazados internos presentan peor salud mental que los no desplazados. El funcionamiento familiar y la salud mental del cuidador fueron fuerte e independientemente asociados con la salud mental de los niños y niñas desplazados.