38 resultados para multivariate binary data
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
The use of simple and multiple correspondence analysis is well-established in socialscience research for understanding relationships between two or more categorical variables.By contrast, canonical correspondence analysis, which is a correspondence analysis with linearrestrictions on the solution, has become one of the most popular multivariate techniques inecological research. Multivariate ecological data typically consist of frequencies of observedspecies across a set of sampling locations, as well as a set of observed environmental variablesat the same locations. In this context the principal dimensions of the biological variables aresought in a space that is constrained to be related to the environmental variables. Thisrestricted form of correspondence analysis has many uses in social science research as well,as is demonstrated in this paper. We first illustrate the result that canonical correspondenceanalysis of an indicator matrix, restricted to be related an external categorical variable, reducesto a simple correspondence analysis of a set of concatenated (or stacked ) tables. Then weshow how canonical correspondence analysis can be used to focus on, or partial out, aparticular set of response categories in sample survey data. For example, the method can beused to partial out the influence of missing responses, which usually dominate the results of amultiple correspondence analysis.
Resumo:
In the analysis of multivariate categorical data, typically the analysis of questionnaire data, it is often advantageous, for substantive and technical reasons, to analyse a subset of response categories. In multiple correspondence analysis, where each category is coded as a column of an indicator matrix or row and column of Burt matrix, it is not correct to simply analyse the corresponding submatrix of data, since the whole geometric structure is different for the submatrix . A simple modification of the correspondence analysis algorithm allows the overall geometric structure of the complete data set to be retained while calculating the solution for the selected subset of points. This strategy is useful for analysing patterns of response amongst any subset of categories and relating these patterns to demographic factors, especially for studying patterns of particular responses such as missing and neutral responses. The methodology is illustrated using data from the International Social Survey Program on Family and Changing Gender Roles in 1994.
Resumo:
The generalization of simple (two-variable) correspondence analysis to more than two categorical variables, commonly referred to as multiple correspondence analysis, is neither obvious nor well-defined. We present two alternative ways of generalizing correspondence analysis, one based on the quantification of the variables and intercorrelation relationships, and the other based on the geometric ideas of simple correspondence analysis. We propose a version of multiple correspondence analysis, with adjusted principal inertias, as the method of choice for the geometric definition, since it contains simple correspondence analysis as an exact special case, which is not the situation of the standard generalizations. We also clarify the issue of supplementary point representation and the properties of joint correspondence analysis, a method that visualizes all two-way relationships between the variables. The methodology is illustrated using data on attitudes to science from the International Social Survey Program on Environment in 1993.
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods
Resumo:
We consider the application of normal theory methods to the estimation and testing of a general type of multivariate regressionmodels with errors--in--variables, in the case where various data setsare merged into a single analysis and the observable variables deviatepossibly from normality. The various samples to be merged can differ on the set of observable variables available. We show that there is a convenient way to parameterize the model so that, despite the possiblenon--normality of the data, normal--theory methods yield correct inferencesfor the parameters of interest and for the goodness--of--fit test. Thetheory described encompasses both the functional and structural modelcases, and can be implemented using standard software for structuralequations models, such as LISREL, EQS, LISCOMP, among others. An illustration with Monte Carlo data is presented.
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods.
Resumo:
Panel data can be arranged into a matrix in two ways, called 'long' and 'wide' formats (LFand WF). The two formats suggest two alternative model approaches for analyzing paneldata: (i) univariate regression with varying intercept; and (ii) multivariate regression withlatent variables (a particular case of structural equation model, SEM). The present papercompares the two approaches showing in which circumstances they yield equivalent?insome cases, even numerically equal?results. We show that the univariate approach givesresults equivalent to the multivariate approach when restrictions of time invariance (inthe paper, the TI assumption) are imposed on the parameters of the multivariate model.It is shown that the restrictions implicit in the univariate approach can be assessed bychi-square difference testing of two nested multivariate models. In addition, commontests encountered in the econometric analysis of panel data, such as the Hausman test, areshown to have an equivalent representation as chi-square difference tests. Commonalitiesand differences between the univariate and multivariate approaches are illustrated usingan empirical panel data set of firms' profitability as well as a simulated panel data.
Resumo:
Within the special geometry of the simplex, the sample space of compositional data, compositional orthonormal coordinates allow the application of any multivariate statistical approach. The search for meaningful coordinates has suggested balances (between two groups of parts)—based on a sequential binary partition of a D-part composition—and a representation in form of a CoDa-dendrogram. Projected samples are represented in a dendrogram-like graph showing: (a) the way of grouping parts; (b) the explanatory role of subcompositions generated in the partition process; (c) the decomposition of the variance; (d) the center and quantiles of each balance. The representation is useful for the interpretation of balances and to describe the sample in a single diagram independently of the number of parts. Also, samples of two or more populations, as well as several samples from the same population, can be represented in the same graph, as long as they have the same parts registered. The approach is illustrated with an example of food consumption in Europe
Resumo:
This paper proposes a contemporaneous-threshold multivariate smooth transition autoregressive (C-MSTAR) model in which the regime weights depend on the ex ante probabilities that latent regime-specific variables exceed certain threshold values. A key feature of the model is that the transition function depends on all the parameters of the model as well as on the data. Since the mixing weights are also a function of the regime-specific innovation covariance matrix, the model can account for contemporaneous regime-specific co-movements of the variables. The stability and distributional properties of the proposed model are discussed, as well as issues of estimation, testing and forecasting. The practical usefulness of the C-MSTAR model is illustrated by examining the relationship between US stock prices and interest rates.
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuouslycored boreholes, 100 to 220m deep were drilled in the northern part of the PoPlain by Regione Lombardia in the last five years. Quantitative provenanceanalysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carriedout by using multivariate statistical analysis (principal component analysis, PCA,and similarity analysis) on an integrated data set, including high-resolution bulkpetrography and heavy-mineral analyses on Pleistocene sands and of 250 majorand minor modern rivers draining the southern flank of the Alps from West toEast (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations,metamorphic and quartzofeldspathic detritus from the Western and Central Alpswas carried from the axial belt to the Po basin longitudinally parallel to theSouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenariorapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset ofthe first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA andsimilarity analysis from core samples show that the longitudinal trunk river at thistime was shifted southward by the rapid southward and westward progradation oftransverse alluvial river systems fed from the Central and Southern Alps.Sediments were transported southward by braided river systems as well as glacialsediments transported by Alpine valley glaciers invaded the alluvial plain.Kew words: Detrital modes; Modern sands; Provenance; Principal ComponentsAnalysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
Planners in public and private institutions would like coherent forecasts of the components of age-specic mortality, such as causes of death. This has been di cult toachieve because the relative values of the forecast components often fail to behave ina way that is coherent with historical experience. In addition, when the group forecasts are combined the result is often incompatible with an all-groups forecast. It hasbeen shown that cause-specic mortality forecasts are pessimistic when compared withall-cause forecasts (Wilmoth, 1995). This paper abandons the conventional approachof using log mortality rates and forecasts the density of deaths in the life table. Sincethese values obey a unit sum constraint for both conventional single-decrement life tables (only one absorbing state) and multiple-decrement tables (more than one absorbingstate), they are intrinsically relative rather than absolute values across decrements aswell as ages. Using the methods of Compositional Data Analysis pioneered by Aitchison(1986), death densities are transformed into the real space so that the full range of multivariate statistics can be applied, then back-transformed to positive values so that theunit sum constraint is honoured. The structure of the best-known, single-decrementmortality-rate forecasting model, devised by Lee and Carter (1992), is expressed incompositional form and the results from the two models are compared. The compositional model is extended to a multiple-decrement form and used to forecast mortalityby cause of death for Japan
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developedto explore patterns in high-dimensional multivariate data. The conventional versionof the algorithm involves the use of Euclidean metric in the process of adaptation ofthe model vectors, thus rendering in theory a whole methodology incompatible withnon-Euclidean geometries.In this contribution we explore the two main aspects of the problem:1. Whether the conventional approach using Euclidean metric can shed valid resultswith compositional data.2. If a modification of the conventional approach replacing vectorial sum and scalarmultiplication by the canonical operators in the simplex (i.e. perturbation andpowering) can converge to an adequate solution.Preliminary tests showed that both methodologies can be used on compositional data.However, the modified version of the algorithm performs poorer than the conventionalversion, in particular, when the data is pathological. Moreover, the conventional ap-proach converges faster to a solution, when data is \well-behaved".Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
Compositional data naturally arises from the scientific analysis of the chemicalcomposition of archaeological material such as ceramic and glass artefacts. Data of thistype can be explored using a variety of techniques, from standard multivariate methodssuch as principal components analysis and cluster analysis, to methods based upon theuse of log-ratios. The general aim is to identify groups of chemically similar artefactsthat could potentially be used to answer questions of provenance.This paper will demonstrate work in progress on the development of a documentedlibrary of methods, implemented using the statistical package R, for the analysis ofcompositional data. R is an open source package that makes available very powerfulstatistical facilities at no cost. We aim to show how, with the aid of statistical softwaresuch as R, traditional exploratory multivariate analysis can easily be used alongside, orin combination with, specialist techniques of compositional data analysis.The library has been developed from a core of basic R functionality, together withpurpose-written routines arising from our own research (for example that reported atCoDaWork'03). In addition, we have included other appropriate publicly availabletechniques and libraries that have been implemented in R by other authors. Availablefunctions range from standard multivariate techniques through to various approaches tolog-ratio analysis and zero replacement. We also discuss and demonstrate a smallselection of relatively new techniques that have hitherto been little-used inarchaeometric applications involving compositional data. The application of the libraryto the analysis of data arising in archaeometry will be demonstrated; results fromdifferent analyses will be compared; and the utility of the various methods discussed
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix