80 resultados para Multivariate Statistics
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods
Resumo:
Precision of released figures is not only an important quality feature of official statistics,it is also essential for a good understanding of the data. In this paper we show a casestudy of how precision could be conveyed if the multivariate nature of data has to betaken into account. In the official release of the Swiss earnings structure survey, the totalsalary is broken down into several wage components. We follow Aitchison's approachfor the analysis of compositional data, which is based on logratios of components. Wefirst present diferent multivariate analyses of the compositional data whereby the wagecomponents are broken down by economic activity classes. Then we propose a numberof ways to assess precision
Resumo:
A compositional time series is obtained when a compositional data vector is observed atdifferent points in time. Inherently, then, a compositional time series is a multivariatetime series with important constraints on the variables observed at any instance in time.Although this type of data frequently occurs in situations of real practical interest, atrawl through the statistical literature reveals that research in the field is very much in itsinfancy and that many theoretical and empirical issues still remain to be addressed. Anyappropriate statistical methodology for the analysis of compositional time series musttake into account the constraints which are not allowed for by the usual statisticaltechniques available for analysing multivariate time series. One general approach toanalyzing compositional time series consists in the application of an initial transform tobreak the positive and unit sum constraints, followed by the analysis of the transformedtime series using multivariate ARIMA models. In this paper we discuss the use of theadditive log-ratio, centred log-ratio and isometric log-ratio transforms. We also presentresults from an empirical study designed to explore how the selection of the initialtransform affects subsequent multivariate ARIMA modelling as well as the quality ofthe forecasts
Resumo:
Aquest projecte es centra principalment en el detector no coherent d’un GPS. Per tal de caracteritzar el procés de detecció d’un receptor, es necessita conèixer l’estadística implicada. Pel cas dels detectors no coherents convencionals, l’estadística de segon ordre intervé plenament. Les prestacions que ens dóna l’estadística de segon ordre, plasmada en la ROC, són prou bons tot i que en diferents situacions poden no ser els millors. Aquest projecte intenta reproduir el procés de detecció mitjançant l’estadística de primer ordre com a alternativa a la ja coneguda i implementada estadística de segon ordre. Per tal d’aconseguir-ho, s’usen expressions basades en el Teorema Central del Límit i de les sèries Edgeworth com a bones aproximacions. Finalment, tant l’estadística convencional com l’estadística proposada són comparades, en termes de la ROC, per tal de determinar quin detector no coherent ofereix millor prestacions en cada situació.
Resumo:
Given an observed test statistic and its degrees of freedom, one may compute the observed P value with most statistical packages. It is unknown to what extent test statistics and P values are congruent in published medical papers. Methods:We checked the congruence of statistical results reported in all the papers of volumes 409–412 of Nature (2001) and a random sample of 63 results from volumes 322–323 of BMJ (2001). We also tested whether the frequencies of the last digit of a sample of 610 test statistics deviated from a uniform distribution (i.e., equally probable digits).Results: 11.6% (21 of 181) and 11.1% (7 of 63) of the statistical results published in Nature and BMJ respectively during 2001 were incongruent, probably mostly due to rounding, transcription, or type-setting errors. At least one such error appeared in 38% and 25% of the papers of Nature and BMJ, respectively. In 12% of the cases, the significance level might change one or more orders of magnitude. The frequencies of the last digit of statistics deviated from the uniform distribution and suggested digit preference in rounding and reporting.Conclusions: this incongruence of test statistics and P values is another example that statistical practice is generally poor, even in the most renowned scientific journals, and that quality of papers should be more controlled and valued
Resumo:
To detect directional couplings from time series various measures based on distances in reconstructed state spaces were introduced. These measures can, however, be biased by asymmetries in the dynamics' structure, noise color, or noise level, which are ubiquitous in experimental signals. Using theoretical reasoning and results from model systems we identify the various sources of bias and show that most of them can be eliminated by an appropriate normalization. We furthermore diminish the remaining biases by introducing a measure based on ranks of distances. This rank-based measure outperforms existing distance-based measures concerning both sensitivity and specificity for directional couplings. Therefore, our findings are relevant for a reliable detection of directional couplings from experimental signals.
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Centralnotations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform.In this way very elaborated aspects of mathematical statistics can be understoodeasily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating,combination of likelihood and robust M-estimation functions are simple additions/perturbations in A2(Pprior). Weighting observations corresponds to a weightedaddition of the corresponding evidence.Likelihood based statistics for general exponential families turns out to have aparticularly easy interpretation in terms of A2(P). Regular exponential families formfinite dimensional linear subspaces of A2(P) and they correspond to finite dimensionalsubspaces formed by their posterior in the dual information space A2(Pprior).The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P.The discussion of A2(P) valued random variables, such as estimation functionsor likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
In this paper we propose a subsampling estimator for the distribution ofstatistics diverging at either known rates when the underlying timeseries in strictly stationary abd strong mixing. Based on our results weprovide a detailed discussion how to estimate extreme order statisticswith dependent data and present two applications to assessing financialmarket risk. Our method performs well in estimating Value at Risk andprovides a superior alternative to Hill's estimator in operationalizingSafety First portofolio selection.
Resumo:
The use of simple and multiple correspondence analysis is well-established in socialscience research for understanding relationships between two or more categorical variables.By contrast, canonical correspondence analysis, which is a correspondence analysis with linearrestrictions on the solution, has become one of the most popular multivariate techniques inecological research. Multivariate ecological data typically consist of frequencies of observedspecies across a set of sampling locations, as well as a set of observed environmental variablesat the same locations. In this context the principal dimensions of the biological variables aresought in a space that is constrained to be related to the environmental variables. Thisrestricted form of correspondence analysis has many uses in social science research as well,as is demonstrated in this paper. We first illustrate the result that canonical correspondenceanalysis of an indicator matrix, restricted to be related an external categorical variable, reducesto a simple correspondence analysis of a set of concatenated (or stacked ) tables. Then weshow how canonical correspondence analysis can be used to focus on, or partial out, aparticular set of response categories in sample survey data. For example, the method can beused to partial out the influence of missing responses, which usually dominate the results of amultiple correspondence analysis.
Resumo:
Hierarchical clustering is a popular method for finding structure in multivariate data,resulting in a binary tree constructed on the particular objects of the study, usually samplingunits. The user faces the decision where to cut the binary tree in order to determine the numberof clusters to interpret and there are various ad hoc rules for arriving at a decision. A simplepermutation test is presented that diagnoses whether non-random levels of clustering are presentin the set of objects and, if so, indicates the specific level at which the tree can be cut. The test isvalidated against random matrices to verify the type I error probability and a power study isperformed on data sets with known clusteredness to study the type II error.
Resumo:
This paper deals with the impact of "early" nineteenth-century globalization (c.1815-1860) on foreign trade in the Southern Cone (SC). Most of the evidence is drawn from bilateral trades between Britain and the SC, at a time when Britain was the main commercial partner of the new republics. The main conclusion drawn is that early globalization had a positive impact on foreign trade in the SC, and this was due to: improvements in the SC's terms of trade during this period; the SC's per capita consumption of textiles (the main manufacture traded on world markets at that time) increased substantially during this period, at a time when clothing was one of the main items of SC household budgets; British merchants brought with them capital, shipping, insurance, and also facilitated the formation of vast global networks, which further promoted the SC's exports to a wider range of outlets.
Resumo:
Structural equation models are widely used in economic, socialand behavioral studies to analyze linear interrelationships amongvariables, some of which may be unobservable or subject to measurementerror. Alternative estimation methods that exploit different distributionalassumptions are now available. The present paper deals with issues ofasymptotic statistical inferences, such as the evaluation of standarderrors of estimates and chi--square goodness--of--fit statistics,in the general context of mean and covariance structures. The emphasisis on drawing correct statistical inferences regardless of thedistribution of the data and the method of estimation employed. A(distribution--free) consistent estimate of $\Gamma$, the matrix ofasymptotic variances of the vector of sample second--order moments,will be used to compute robust standard errors and a robust chi--squaregoodness--of--fit squares. Simple modifications of the usual estimateof $\Gamma$ will also permit correct inferences in the case of multi--stage complex samples. We will also discuss the conditions under which,regardless of the distribution of the data, one can rely on the usual(non--robust) inferential statistics. Finally, a multivariate regressionmodel with errors--in--variables will be used to illustrate, by meansof simulated data, various theoretical aspects of the paper.
Resumo:
This paper extends multivariate Granger causality to take into account the subspacesalong which Granger causality occurs as well as long run Granger causality. The propertiesof these new notions of Granger causality, along with the requisite restrictions, are derivedand extensively studied for a wide variety of time series processes including linear invertibleprocess and VARMA. Using the proposed extensions, the paper demonstrates that: (i) meanreversion in L2 is an instance of long run Granger non-causality, (ii) cointegration is a specialcase of long run Granger non-causality along a subspace, (iii) controllability is a special caseof Granger causality, and finally (iv) linear rational expectations entail (possibly testable)Granger causality restriction along subspaces.
Resumo:
Graphical displays which show inter--sample distances are importantfor the interpretation and presentation of multivariate data. Except whenthe displays are two--dimensional, however, they are often difficult tovisualize as a whole. A device, based on multidimensional unfolding, isdescribed for presenting some intrinsically high--dimensional displays infewer, usually two, dimensions. This goal is achieved by representing eachsample by a pair of points, say $R_i$ and $r_i$, so that a theoreticaldistance between the $i$-th and $j$-th samples is represented twice, onceby the distance between $R_i$ and $r_j$ and once by the distance between$R_j$ and $r_i$. Self--distances between $R_i$ and $r_i$ need not be zero.The mathematical conditions for unfolding to exhibit symmetry are established.Algorithms for finding approximate fits, not constrained to be symmetric,are discussed and some examples are given.
Resumo:
This paper combines multivariate density forecasts of output growth, inflationand interest rates from a suite of models. An out-of-sample weighting scheme based onthe predictive likelihood as proposed by Eklund and Karlsson (2005) and Andersson andKarlsson (2007) is used to combine the models. Three classes of models are considered: aBayesian vector autoregression (BVAR), a factor-augmented vector autoregression (FAVAR)and a medium-scale dynamic stochastic general equilibrium (DSGE) model. Using Australiandata, we find that, at short forecast horizons, the Bayesian VAR model is assignedthe most weight, while at intermediate and longer horizons the factor model is preferred.The DSGE model is assigned little weight at all horizons, a result that can be attributedto the DSGE model producing density forecasts that are very wide when compared withthe actual distribution of observations. While a density forecast evaluation exercise revealslittle formal evidence that the optimally combined densities are superior to those from thebest-performing individual model, or a simple equal-weighting scheme, this may be a resultof the short sample available.