44 resultados para Statistics analysis
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
The present study discusses retention criteria for principal components analysis (PCA) applied to Likert scale items typical in psychological questionnaires. The main aim is to recommend applied researchers to restrain from relying only on the eigenvalue-than-one criterion; alternative procedures are suggested for adjusting for sampling error. An additional objective is to add evidence on the consequences of applying this rule when PCA is used with discrete variables. The experimental conditions were studied by means of Monte Carlo sampling including several sample sizes, different number of variables and answer alternatives, and four non-normal distributions. The results suggest that even when all the items and thus the underlying dimensions are independent, eigenvalues greater than one are frequent and they can explain up to 80% of the variance in data, meeting the empirical criterion. The consequences of using Kaiser"s rule are illustrated with a clinical psychology example. The size of the eigenvalues resulted to be a function of the sample size and the number of variables, which is also the case for parallel analysis as previous research shows. To enhance the application of alternative criteria, an R package was developed for deciding the number of principal components to retain by means of confidence intervals constructed about the eigenvalues corresponding to lack of relationship between discrete variables.
Resumo:
Elite perceptions about Europe are a very important point in order to understand the current European integration process, as well as the future perspectives for the continent. This study makes a comparison between the perceptions which political and economical elites in some European countries have about the European Union process and its mechanisms. The main goal is to identify the differences in positions of each type of elites, as well as the variations among three key countries. The database built thanks to the INTUNE (Integrated and United? A quest for Citizenship in an ¨ever closer Europe¨) Project Survey on European Elites and Masses, funded by the Sixth Framework Programme of the EU [Contract CIT 3-CT-2005-513421] have being used. The questionnaire was applied between February and May 2007, in a total of 18 European countries. The national teams got a total of almost 2000 valid responses at European level. In the analysis we have showed some general descriptive statistics about the perception of Europe taking as a reference two dimensions of the INTUNE project: identity (attachment to the national level, the meaning of being a truly national, and the threats from Turkey that EU is facing at this moment) and representation (trust in European and national institutions, preferences for a national or an European army). The results are presented distinguishing between political (national MP’s in low chambers) and economical elites (presidents of corporations, general managers…) and, at the same time, among three countries: Germany as an original member of the European Union; Spain, incorporated in 1986; and a new member, Poland, joining the EU in 2004.
Resumo:
This paper presents an initial challenge to tackle the every so "tricky" points encountered when dealing with energy accounting, and thereafter illustrates how such a system of accounting can be used when assessing for the metabolic changes in societies. The paper is divided in four main sections. The first three, present a general discussion on the main issues encountered when conducting energy analyses. The last section, subsequently, combines this heuristic approach to the actual formalization of it, in quantitative terms, for the analysis of possible energy scenarios. Section one covers the broader issue of how to account for the relevant categories used when accounting for Joules of energy; emphasizing on the clear distinction between Primary Energy Sources (PES) (which are the physical exploited entities that are used to derive useable energy forms (energy carriers)) and Energy Carriers (EC) (the actual useful energy that is transmitted for the appropriate end uses within a society). Section two sheds light on the concept of Energy Return on Investment (EROI). Here, it is emphasized that, there must already be a certain amount of energy carriers available to be able to extract/exploit Primary Energy Sources to thereafter generate a net supply of energy carriers. It is pointed out that this current trend of intense energy supply has only been possible to the great use and dependence on fossil energy. Section three follows up on the discussion of EROI, indicating that a single numeric indicator such as an output/input ratio is not sufficient in assessing for the performance of energetic systems. Rather an integrated approach that incorporates (i) how big the net supply of Joules of EC can be, given an amount of extracted PES (the external constraints); (ii) how much EC needs to be invested to extract an amount of PES; and (iii) the power level that it takes for both processes to succeed, is underlined. Section four, ultimately, puts the theoretical concepts at play, assessing for how the metabolic performances of societies can be accounted for within this analytical framework.
Resumo:
Report for the scientific sojourn at the University of Reading, United Kingdom, from January until May 2008. The main objectives have been firstly to infer population structure and parameters in demographic models using a total of 13 microsatellite loci for genotyping approximately 30 individuals per population in 10 Palinurus elephas populations both from Mediterranean and Atlantic waters. Secondly, developing statistical methods to identify discrepant loci, possibly under selection and implement those methods using the R software environment. It is important to consider that the calculation of the probability distribution of the demographic and mutational parameters for a full genetic data set is numerically difficult for complex demographic history (Stephens 2003). The Approximate Bayesian Computation (ABC), based on summary statistics to infer posterior distributions of variable parameters without explicit likelihood calculations, can surmount this difficulty. This would allow to gather information on different demographic prior values (i.e. effective population sizes, migration rate, microsatellite mutation rate, mutational processes) and assay the sensitivity of inferences to demographic priors by assuming different priors.
Resumo:
Isotopic data are currently becoming an important source of information regardingsources, evolution and mixing processes of water in hydrogeologic systems. However, itis not clear how to treat with statistics the geochemical data and the isotopic datatogether. We propose to introduce the isotopic information as new parts, and applycompositional data analysis with the resulting increased composition. Results areequivalent to downscale the classical isotopic delta variables, because they are alreadyrelative (as needed in the compositional framework) and isotopic variations are almostalways very small. This methodology is illustrated and tested with the study of theLlobregat River Basin (Barcelona, NE Spain), where it is shown that, though verysmall, isotopic variations comp lement geochemical principal components, and help inthe better identification of pollution sources
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Centralnotations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform.In this way very elaborated aspects of mathematical statistics can be understoodeasily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating,combination of likelihood and robust M-estimation functions are simple additions/perturbations in A2(Pprior). Weighting observations corresponds to a weightedaddition of the corresponding evidence.Likelihood based statistics for general exponential families turns out to have aparticularly easy interpretation in terms of A2(P). Regular exponential families formfinite dimensional linear subspaces of A2(P) and they correspond to finite dimensionalsubspaces formed by their posterior in the dual information space A2(Pprior).The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P.The discussion of A2(P) valued random variables, such as estimation functionsor likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
The use of simple and multiple correspondence analysis is well-established in socialscience research for understanding relationships between two or more categorical variables.By contrast, canonical correspondence analysis, which is a correspondence analysis with linearrestrictions on the solution, has become one of the most popular multivariate techniques inecological research. Multivariate ecological data typically consist of frequencies of observedspecies across a set of sampling locations, as well as a set of observed environmental variablesat the same locations. In this context the principal dimensions of the biological variables aresought in a space that is constrained to be related to the environmental variables. Thisrestricted form of correspondence analysis has many uses in social science research as well,as is demonstrated in this paper. We first illustrate the result that canonical correspondenceanalysis of an indicator matrix, restricted to be related an external categorical variable, reducesto a simple correspondence analysis of a set of concatenated (or stacked ) tables. Then weshow how canonical correspondence analysis can be used to focus on, or partial out, aparticular set of response categories in sample survey data. For example, the method can beused to partial out the influence of missing responses, which usually dominate the results of amultiple correspondence analysis.
Resumo:
We compare two methods for visualising contingency tables and developa method called the ratio map which combines the good properties of both.The first is a biplot based on the logratio approach to compositional dataanalysis. This approach is founded on the principle of subcompositionalcoherence, which assures that results are invariant to considering subsetsof the composition. The second approach, correspondence analysis, isbased on the chi-square approach to contingency table analysis. Acornerstone of correspondence analysis is the principle of distributionalequivalence, which assures invariance in the results when rows or columnswith identical conditional proportions are merged. Both methods may bedescribed as singular value decompositions of appropriately transformedmatrices. Correspondence analysis includes a weighting of the rows andcolumns proportional to the margins of the table. If this idea of row andcolumn weights is introduced into the logratio biplot, we obtain a methodwhich obeys both principles of subcompositional coherence and distributionalequivalence.
Resumo:
Correspondence analysis, when used to visualize relationships in a table of counts(for example, abundance data in ecology), has been frequently criticized as being too sensitiveto objects (for example, species) that occur with very low frequency or in very few samples. Inthis statistical report we show that this criticism is generally unfounded. We demonstrate this inseveral data sets by calculating the actual contributions of rare objects to the results ofcorrespondence analysis and canonical correspondence analysis, both to the determination ofthe principal axes and to the chi-square distance. It is a fact that rare objects are oftenpositioned as outliers in correspondence analysis maps, which gives the impression that theyare highly influential, but their low weight offsets their distant positions and reduces their effecton the results. An alternative scaling of the correspondence analysis solution, the contributionbiplot, is proposed as a way of mapping the results in order to avoid the problem of outlying andlow contributing rare objects.
Resumo:
Standard methods for the analysis of linear latent variable models oftenrely on the assumption that the vector of observed variables is normallydistributed. This normality assumption (NA) plays a crucial role inassessingoptimality of estimates, in computing standard errors, and in designinganasymptotic chi-square goodness-of-fit test. The asymptotic validity of NAinferences when the data deviates from normality has been calledasymptoticrobustness. In the present paper we extend previous work on asymptoticrobustnessto a general context of multi-sample analysis of linear latent variablemodels,with a latent component of the model allowed to be fixed across(hypothetical)sample replications, and with the asymptotic covariance matrix of thesamplemoments not necessarily finite. We will show that, under certainconditions,the matrix $\Gamma$ of asymptotic variances of the analyzed samplemomentscan be substituted by a matrix $\Omega$ that is a function only of thecross-product moments of the observed variables. The main advantage of thisis thatinferences based on $\Omega$ are readily available in standard softwareforcovariance structure analysis, and do not require to compute samplefourth-order moments. An illustration with simulated data in the context ofregressionwith errors in variables will be presented.
Resumo:
Structural equation models are widely used in economic, socialand behavioral studies to analyze linear interrelationships amongvariables, some of which may be unobservable or subject to measurementerror. Alternative estimation methods that exploit different distributionalassumptions are now available. The present paper deals with issues ofasymptotic statistical inferences, such as the evaluation of standarderrors of estimates and chi--square goodness--of--fit statistics,in the general context of mean and covariance structures. The emphasisis on drawing correct statistical inferences regardless of thedistribution of the data and the method of estimation employed. A(distribution--free) consistent estimate of $\Gamma$, the matrix ofasymptotic variances of the vector of sample second--order moments,will be used to compute robust standard errors and a robust chi--squaregoodness--of--fit squares. Simple modifications of the usual estimateof $\Gamma$ will also permit correct inferences in the case of multi--stage complex samples. We will also discuss the conditions under which,regardless of the distribution of the data, one can rely on the usual(non--robust) inferential statistics. Finally, a multivariate regressionmodel with errors--in--variables will be used to illustrate, by meansof simulated data, various theoretical aspects of the paper.
Resumo:
In moment structure analysis with nonnormal data, asymptotic valid inferences require the computation of a consistent (under general distributional assumptions) estimate of the matrix $\Gamma$ of asymptotic variances of sample second--order moments. Such a consistent estimate involves the fourth--order sample moments of the data. In practice, the use of fourth--order moments leads to computational burden and lack of robustness against small samples. In this paper we show that, under certain assumptions, correct asymptotic inferences can be attained when $\Gamma$ is replaced by a matrix $\Omega$ that involves only the second--order moments of the data. The present paper extends to the context of multi--sample analysis of second--order moment structures, results derived in the context of (simple--sample) covariance structure analysis (Satorra and Bentler, 1990). The results apply to a variety of estimation methods and general type of statistics. An example involving a test of equality of means under covariance restrictions illustrates theoretical aspects of the paper.
Resumo:
We extend to score, Wald and difference test statistics the scaled and adjusted corrections to goodness-of-fit test statistics developed in Satorra and Bentler (1988a,b). The theory is framed in the general context of multisample analysis of moment structures, under general conditions on the distribution of observable variables. Computational issues, as well as the relation of the scaled and corrected statistics to the asymptotic robust ones, is discussed. A Monte Carlo study illustrates thecomparative performance in finite samples of corrected score test statistics.
Resumo:
Power transformations of positive data tables, prior to applying the correspondence analysis algorithm, are shown to open up a family of methods with direct connections to the analysis of log-ratios. Two variations of this idea are illustrated. The first approach is simply to power the original data and perform a correspondence analysis this method is shown to converge to unweighted log-ratio analysis as the power parameter tends to zero. The second approach is to apply the power transformation to thecontingency ratios, that is the values in the table relative to expected values based on the marginals this method converges to weighted log-ratio analysis, or the spectral map. Two applications are described: first, a matrix of population genetic data which is inherently two-dimensional, and second, a larger cross-tabulation with higher dimensionality, from a linguistic analysis of several books.
Resumo:
The application of correspondence analysis to square asymmetrictables is often unsuccessful because of the strong role played by thediagonal entries of the matrix, obscuring the data off the diagonal. A simplemodification of the centering of the matrix, coupled with the correspondingchange in row and column masses and row and column metrics, allows the tableto be decomposed into symmetric and skew--symmetric components, which canthen be analyzed separately. The symmetric and skew--symmetric analyses canbe performed using a simple correspondence analysis program if the data areset up in a special block format.