957 resultados para joint correspondence analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The generalization of simple (two-variable) correspondence analysis to more than two categorical variables, commonly referred to as multiple correspondence analysis, is neither obvious nor well-defined. We present two alternative ways of generalizing correspondence analysis, one based on the quantification of the variables and intercorrelation relationships, and the other based on the geometric ideas of simple correspondence analysis. We propose a version of multiple correspondence analysis, with adjusted principal inertias, as the method of choice for the geometric definition, since it contains simple correspondence analysis as an exact special case, which is not the situation of the standard generalizations. We also clarify the issue of supplementary point representation and the properties of joint correspondence analysis, a method that visualizes all two-way relationships between the variables. The methodology is illustrated using data on attitudes to science from the International Social Survey Program on Environment in 1993.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The generalization of simple correspondence analysis, for two categorical variables, to multiple correspondence analysis where they may be three or more variables, is not straighforward, both from a mathematical and computational point of view. In this paper we detail the exact computational steps involved in performing a multiple correspondence analysis, including the special aspects of adjusting the principal inertias to correct the percentages of inertia, supplementary points and subset analysis. Furthermore, we give the algorithm for joint correspondence analysis where the cross-tabulations of all unique pairs of variables are analysed jointly. The code in the R language for every step of the computations is given, as well as the results of each computation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding the genetic architecture of quantitative traits can greatly assist the design of strategies for their manipulation in plant-breeding programs. For a number of traits, genetic variation can be the result of segregation of a few major genes and many polygenes (minor genes). The joint segregation analysis (JSA) is a maximum-likelihood approach for fitting segregation models through the simultaneous use of phenotypic information from multiple generations. Our objective in this paper was to use computer simulation to quantify the power of the JSA method for testing the mixed-inheritance model for quantitative traits when it was applied to the six basic generations: both parents (P-1 and P-2), F-1, F-2, and both backcross generations (B-1 and B-2) derived from crossing the F-1 to each parent. A total of 1968 genetic model-experiment scenarios were considered in the simulation study to quantify the power of the method. Factors that interacted to influence the power of the JSA method to correctly detect genetic models were: (1) whether there were one or two major genes in combination with polygenes, (2) the heritability of the major genes and polygenes, (3) the level of dispersion of the major genes and polygenes between the two parents, and (4) the number of individuals examined in each generation (population size). The greatest levels of power were observed for the genetic models defined with simple inheritance; e.g., the power was greater than 90% for the one major gene model, regardless of the population size and major-gene heritability. Lower levels of power were observed for the genetic models with complex inheritance (major genes and polygenes), low heritability, small population sizes and a large dispersion of favourable genes among the two parents; e.g., the power was less than 5% for the two major-gene model with a heritability value of 0.3 and population sizes of 100 individuals. The JSA methodology was then applied to a previously studied sorghum data-set to investigate the genetic control of the putative drought resistance-trait osmotic adjustment in three crosses. The previous study concluded that there were two major genes segregating for osmotic adjustment in the three crosses. Application of the JSA method resulted in a change in the proposed genetic model. The presence of the two major genes was confirmed with the addition of an unspecified number of polygenes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article jointly examines the differences of laboratory versions of the Dutch clock open auction, a sealed-bid auction to represent book building, and a two-stage sealed bid auction to proxy for the “competitive IPO”, a recent innovation used in a few European equity initial public offerings. We investigate pricing, seller allocation, and buyer welfare allocation efficiency and conclude that the book building emulation seems to be as price efficient as the Dutch auction, even after investor learning, whereas the competitive IPO is not price efficient, regardless of learning. The competitive IPO is the most seller allocative efficient method because it maximizes offer proceeds. The Dutch auction emerges as the most buyer welfare allocative efficient method. Underwriters are probably seeking pricing efficiency rather than seller or buyer welfare allocative efficiency and their discretionary pricing and allocation must be important since book building is prominent worldwide.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We compare correspondance análisis to the logratio approach based on compositional data. We also compare correspondance análisis and an alternative approach using Hellinger distance, for representing categorical data in a contingency table. We propose a coefficient which globally measures the similarity between these approaches. This coefficient can be decomposed into several components, one component for each principal dimension, indicating the contribution of the dimensions to the difference between the two representations. These three methods of representation can produce quite similar results. One illustrative example is given

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Starting with logratio biplots for compositional data, which are based on the principle of subcompositional coherence, and then adding weights, as in correspondence analysis, we rediscover Lewi's spectral map and many connections to analyses of two-way tables of non-negative data. Thanks to the weighting, the method also achieves the property of distributional equivalence

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of simple and multiple correspondence analysis is well-established in socialscience research for understanding relationships between two or more categorical variables.By contrast, canonical correspondence analysis, which is a correspondence analysis with linearrestrictions on the solution, has become one of the most popular multivariate techniques inecological research. Multivariate ecological data typically consist of frequencies of observedspecies across a set of sampling locations, as well as a set of observed environmental variablesat the same locations. In this context the principal dimensions of the biological variables aresought in a space that is constrained to be related to the environmental variables. Thisrestricted form of correspondence analysis has many uses in social science research as well,as is demonstrated in this paper. We first illustrate the result that canonical correspondenceanalysis of an indicator matrix, restricted to be related an external categorical variable, reducesto a simple correspondence analysis of a set of concatenated (or stacked ) tables. Then weshow how canonical correspondence analysis can be used to focus on, or partial out, aparticular set of response categories in sample survey data. For example, the method can beused to partial out the influence of missing responses, which usually dominate the results of amultiple correspondence analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We compare two methods for visualising contingency tables and developa method called the ratio map which combines the good properties of both.The first is a biplot based on the logratio approach to compositional dataanalysis. This approach is founded on the principle of subcompositionalcoherence, which assures that results are invariant to considering subsetsof the composition. The second approach, correspondence analysis, isbased on the chi-square approach to contingency table analysis. Acornerstone of correspondence analysis is the principle of distributionalequivalence, which assures invariance in the results when rows or columnswith identical conditional proportions are merged. Both methods may bedescribed as singular value decompositions of appropriately transformedmatrices. Correspondence analysis includes a weighting of the rows andcolumns proportional to the margins of the table. If this idea of row andcolumn weights is introduced into the logratio biplot, we obtain a methodwhich obeys both principles of subcompositional coherence and distributionalequivalence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Correspondence analysis, when used to visualize relationships in a table of counts(for example, abundance data in ecology), has been frequently criticized as being too sensitiveto objects (for example, species) that occur with very low frequency or in very few samples. Inthis statistical report we show that this criticism is generally unfounded. We demonstrate this inseveral data sets by calculating the actual contributions of rare objects to the results ofcorrespondence analysis and canonical correspondence analysis, both to the determination ofthe principal axes and to the chi-square distance. It is a fact that rare objects are oftenpositioned as outliers in correspondence analysis maps, which gives the impression that theyare highly influential, but their low weight offsets their distant positions and reduces their effecton the results. An alternative scaling of the correspondence analysis solution, the contributionbiplot, is proposed as a way of mapping the results in order to avoid the problem of outlying andlow contributing rare objects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Power transformations of positive data tables, prior to applying the correspondence analysis algorithm, are shown to open up a family of methods with direct connections to the analysis of log-ratios. Two variations of this idea are illustrated. The first approach is simply to power the original data and perform a correspondence analysis this method is shown to converge to unweighted log-ratio analysis as the power parameter tends to zero. The second approach is to apply the power transformation to thecontingency ratios, that is the values in the table relative to expected values based on the marginals this method converges to weighted log-ratio analysis, or the spectral map. Two applications are described: first, a matrix of population genetic data which is inherently two-dimensional, and second, a larger cross-tabulation with higher dimensionality, from a linguistic analysis of several books.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The application of correspondence analysis to square asymmetrictables is often unsuccessful because of the strong role played by thediagonal entries of the matrix, obscuring the data off the diagonal. A simplemodification of the centering of the matrix, coupled with the correspondingchange in row and column masses and row and column metrics, allows the tableto be decomposed into symmetric and skew--symmetric components, which canthen be analyzed separately. The symmetric and skew--symmetric analyses canbe performed using a simple correspondence analysis program if the data areset up in a special block format.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Correspondence analysis has found extensive use in ecology, archeology, linguisticsand the social sciences as a method for visualizing the patterns of association in a table offrequencies or nonnegative ratio-scale data. Inherent to the method is the expression of the datain each row or each column relative to their respective totals, and it is these sets of relativevalues (called profiles) that are visualized. This relativization of the data makes perfect sensewhen the margins of the table represent samples from sub-populations of inherently differentsizes. But in some ecological applications sampling is performed on equal areas or equalvolumes so that the absolute levels of the observed occurrences may be of relevance, in whichcase relativization may not be required. In this paper we define the correspondence analysis ofthe raw unrelativized data and discuss its properties, comparing this new method to regularcorrespondence analysis and to a related variant of non-symmetric correspondence analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the analysis of multivariate categorical data, typically the analysis of questionnaire data, it is often advantageous, for substantive and technical reasons, to analyse a subset of response categories. In multiple correspondence analysis, where each category is coded as a column of an indicator matrix or row and column of Burt matrix, it is not correct to simply analyse the corresponding submatrix of data, since the whole geometric structure is different for the submatrix . A simple modification of the correspondence analysis algorithm allows the overall geometric structure of the complete data set to be retained while calculating the solution for the selected subset of points. This strategy is useful for analysing patterns of response amongst any subset of categories and relating these patterns to demographic factors, especially for studying patterns of particular responses such as missing and neutral responses. The methodology is illustrated using data from the International Social Survey Program on Family and Changing Gender Roles in 1994.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Correspondence analysis is introduced in the brand associationliterature as an alternative tool to measure dominance, for theparticular case of free choice data. The method is also used to analysedifferences, or asymmetries, between brand-attribute associations whereattributes are associated with evoked brands, and brand-attributeassociations where brands are associated with the attributes. Anapplication to a sample of deodorants is used to illustrate the proposedmethodology.