34 resultados para Covariance
Resumo:
In human Population Genetics, routine applications of principal component techniques are oftenrequired. Population biologists make widespread use of certain discrete classifications of humansamples into haplotypes, the monophyletic units of phylogenetic trees constructed from severalsingle nucleotide bimorphisms hierarchically ordered. Compositional frequencies of the haplotypesare recorded within the different samples. Principal component techniques are then required as adimension-reducing strategy to bring the dimension of the problem to a manageable level, say two,to allow for graphical analysis.Population biologists at large are not aware of the special features of compositional data and normally make use of the crude covariance of compositional relative frequencies to construct principalcomponents. In this short note we present our experience with using traditional linear principalcomponents or compositional principal components based on logratios, with reference to a specificdataset
Resumo:
The R-package “compositions”is a tool for advanced compositional analysis. Its basicfunctionality has seen some conceptual improvement, containing now some facilitiesto work with and represent ilr bases built from balances, and an elaborated subsys-tem for dealing with several kinds of irregular data: (rounded or structural) zeroes,incomplete observations and outliers. The general approach to these irregularities isbased on subcompositions: for an irregular datum, one can distinguish a “regular” sub-composition (where all parts are actually observed and the datum behaves typically)and a “problematic” subcomposition (with those unobserved, zero or rounded parts, orelse where the datum shows an erratic or atypical behaviour). Systematic classificationschemes are proposed for both outliers and missing values (including zeros) focusing onthe nature of irregularities in the datum subcomposition(s).To compute statistics with values missing at random and structural zeros, a projectionapproach is implemented: a given datum contributes to the estimation of the desiredparameters only on the subcompositon where it was observed. For data sets withvalues below the detection limit, two different approaches are provided: the well-knownimputation technique, and also the projection approach.To compute statistics in the presence of outliers, robust statistics are adapted to thecharacteristics of compositional data, based on the minimum covariance determinantapproach. The outlier classification is based on four different models of outlier occur-rence and Monte-Carlo-based tests for their characterization. Furthermore the packageprovides special plots helping to understand the nature of outliers in the dataset.Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator,robustness, rounded zeros
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced
Resumo:
We present a new method for constructing exact distribution-free tests (and confidence intervals) for variables that can generate more than two possible outcomes.This method separates the search for an exact test from the goal to create a non-randomized test. Randomization is used to extend any exact test relating to meansof variables with finitely many outcomes to variables with outcomes belonging to agiven bounded set. Tests in terms of variance and covariance are reduced to testsrelating to means. Randomness is then eliminated in a separate step.This method is used to create confidence intervals for the difference between twomeans (or variances) and tests of stochastic inequality and correlation.
Resumo:
We introduce a variation of the proof for weak approximations that issuitable for studying the densities of stochastic processes which areevaluations of the flow generated by a stochastic differential equation on a random variable that maybe anticipating. Our main assumption is that the process and the initial random variable have to be smooth in the Malliavin sense. Furthermore if the inverse of the Malliavin covariance matrix associated with the process under consideration is sufficiently integrable then approximations fordensities and distributions can also be achieved. We apply theseideas to the case of stochastic differential equations with boundaryconditions and the composition of two diffusions.
Resumo:
Standard methods for the analysis of linear latent variable models oftenrely on the assumption that the vector of observed variables is normallydistributed. This normality assumption (NA) plays a crucial role inassessingoptimality of estimates, in computing standard errors, and in designinganasymptotic chi-square goodness-of-fit test. The asymptotic validity of NAinferences when the data deviates from normality has been calledasymptoticrobustness. In the present paper we extend previous work on asymptoticrobustnessto a general context of multi-sample analysis of linear latent variablemodels,with a latent component of the model allowed to be fixed across(hypothetical)sample replications, and with the asymptotic covariance matrix of thesamplemoments not necessarily finite. We will show that, under certainconditions,the matrix $\Gamma$ of asymptotic variances of the analyzed samplemomentscan be substituted by a matrix $\Omega$ that is a function only of thecross-product moments of the observed variables. The main advantage of thisis thatinferences based on $\Omega$ are readily available in standard softwareforcovariance structure analysis, and do not require to compute samplefourth-order moments. An illustration with simulated data in the context ofregressionwith errors in variables will be presented.
Resumo:
In moment structure analysis with nonnormal data, asymptotic valid inferences require the computation of a consistent (under general distributional assumptions) estimate of the matrix $\Gamma$ of asymptotic variances of sample second--order moments. Such a consistent estimate involves the fourth--order sample moments of the data. In practice, the use of fourth--order moments leads to computational burden and lack of robustness against small samples. In this paper we show that, under certain assumptions, correct asymptotic inferences can be attained when $\Gamma$ is replaced by a matrix $\Omega$ that involves only the second--order moments of the data. The present paper extends to the context of multi--sample analysis of second--order moment structures, results derived in the context of (simple--sample) covariance structure analysis (Satorra and Bentler, 1990). The results apply to a variety of estimation methods and general type of statistics. An example involving a test of equality of means under covariance restrictions illustrates theoretical aspects of the paper.
Spanning tests in return and stochastic discount factor mean-variance frontiers: A unifying approach
Resumo:
We propose new spanning tests that assess if the initial and additional assets share theeconomically meaningful cost and mean representing portfolios. We prove their asymptoticequivalence to existing tests under local alternatives. We also show that unlike two-step oriterated procedures, single-step methods such as continuously updated GMM yield numericallyidentical overidentifyng restrictions tests, so there is arguably a single spanning test.To prove these results, we extend optimal GMM inference to deal with singularities in thelong run second moment matrix of the influence functions. Finally, we test for spanningusing size and book-to-market sorted US stock portfolios.
Resumo:
In this paper we analyse the observed systematic differences incosts for teaching hospitals (THhenceforth) in Spain. Concernhas been voiced regarding the existence of a bias in thefinancing of TH s has been raised once prospective budgets arein the arena for hospital finance, and claims for adjusting totake into account the legitimate extra costs of teaching onhospital expenditure are well grounded. We focus on theestimation of the impact of teaching status on average cost. Weused a version of a multiproduct hospital cost function takinginto account some relevant factors from which to derive theobserved differences. We assume that the relationship betweenthe explanatory and the dependent variables follows a flexibleform for each of the explanatory variables. We also model theunderlying covariance structure of the data. We assumed twoqualitatively different sources of variation: random effects andserial correlation. Random variation refers to both general levelvariation (through the random intercept) and the variationspecifically related to teaching status. We postulate that theimpact of the random effects is predominant over the impact ofthe serial correlation effects. The model is estimated byrestricted maximum likelihood. Our results show that costs are 9%higher (15% in the case of median costs) in teaching than innon-teaching hospitals. That is, teaching status legitimatelyexplains no more than half of the observed difference in actualcosts. The impact on costs of the teaching factor depends on thenumber of residents, with an increase of 51.11% per resident forhospitals with fewer than 204 residents (third quartile of thenumber of residents) and 41.84% for hospitals with more than 204residents. In addition, the estimated dispersion is higher amongteaching hospitals. As a result, due to the considerable observedheterogeneity, results should be interpreted with caution. From apolicy making point of view, we conclude that since a higherrelative burden for medical training is under public hospitalcommand, an explicit adjustment to the extra costs that theteaching factor imposes on hospital finance is needed, beforehospital competition for inpatient services takes place.
Resumo:
This paper investigates what has caused output and inflation volatility to fall in the USusing a small scale structural model using Bayesian techniques and rolling samples. Thereare instabilities in the posterior of the parameters describing the private sector, the policyrule and the standard deviation of the shocks. Results are robust to the specification ofthe policy rule. Changes in the parameters describing the private sector are the largest,but those of the policy rule and the covariance matrix of the shocks explain the changes most.
Resumo:
The goal of this paper is to estimate time-varying covariance matrices.Since the covariance matrix of financial returns is known to changethrough time and is an essential ingredient in risk measurement, portfolioselection, and tests of asset pricing models, this is a very importantproblem in practice. Our model of choice is the Diagonal-Vech version ofthe Multivariate GARCH(1,1) model. The problem is that the estimation ofthe general Diagonal-Vech model model is numerically infeasible indimensions higher than 5. The common approach is to estimate more restrictive models which are tractable but may not conform to the data. Our contributionis to propose an alternative estimation method that is numerically feasible,produces positive semi-definite conditional covariance matrices, and doesnot impose unrealistic a priori restrictions. We provide an empiricalapplication in the context of international stock markets, comparing thenew estimator to a number of existing ones.
Resumo:
This work proposes novel network analysis techniques for multivariate time series.We define the network of a multivariate time series as a graph where verticesdenote the components of the process and edges denote non zero long run partialcorrelations. We then introduce a two step LASSO procedure, called NETS, toestimate high dimensional sparse Long Run Partial Correlation networks. This approachis based on a VAR approximation of the process and allows to decomposethe long run linkages into the contribution of the dynamic and contemporaneousdependence relations of the system. The large sample properties of the estimatorare analysed and we establish conditions for consistent selection and estimation ofthe non zero long run partial correlations. The methodology is illustrated with anapplication to a panel of U.S. bluechips.
Resumo:
The statistical theory of signal detection and the estimation of its parameters are reviewed and applied to the case of detection of the gravitational-wave signal from a coalescing binary by a laser interferometer. The correlation integral and the covariance matrix for all possible static configurations are investigated numerically. Approximate analytic formulas are derived for the case of narrow band sensitivity configuration of the detector.
Resumo:
BACKGROUND: Previous cross-sectional studies report that cognitive impairment is associated with poor psychosocial functioning in euthymic bipolar patients. There is a lack of long-term studies to determine the course of cognitive impairment and its impact on functional outcome. Method A total of 54 subjects were assessed at baseline and 6 years later; 28 had DSM-IV TR bipolar I or II disorder (recruited, at baseline, from a Lithium Clinic Program) and 26 were healthy matched controls. They were all assessed with a cognitive battery tapping into the main cognitive domains (executive function, attention, processing speed, verbal memory and visual memory) twice over a 6-year follow-up period. All patients were euthymic (Hamilton Rating Scale for Depression score lower than 8 and Young mania rating scale score lower than 6) for at least 3 months before both evaluations. At the end of follow-up, psychosocial functioning was also evaluated by means of the Functioning Assessment Short Test. RESULTS: Repeated-measures multivariate analysis of covariance showed that there were main effects of group in the executive domain, in the inhibition domain, in the processing speed domain, and in the verbal memory domain (p<0.04). Among the clinical factors, only longer illness duration was significantly related to slow processing (p=0.01), whereas strong relationships were observed between impoverished cognition along time and poorer psychosocial functioning (p<0.05). CONCLUSIONS: Executive functioning, inhibition, processing speed and verbal memory were impaired in euthymic bipolar out-patients. Although cognitive deficits remained stable on average throughout the follow-up, they had enduring negative effects on psychosocial adaptation of patients.
Resumo:
A comparative performance analysis of four geolocation methods in terms of their theoretical root mean square positioning errors is provided. Comparison is established in two different ways: strict and average. In the strict type, methods are examined for a particular geometric configuration of base stations(BSs) with respect to mobile position, which determines a givennoise profile affecting the respective time-of-arrival (TOA) or timedifference-of-arrival (TDOA) estimates. In the average type, methodsare evaluated in terms of the expected covariance matrix ofthe position error over an ensemble of random geometries, so thatcomparison is geometry independent. Exact semianalytical equationsand associated lower bounds (depending solely on the noiseprofile) are obtained for the average covariance matrix of the positionerror in terms of the so-called information matrix specific toeach geolocation method. Statistical channel models inferred fromfield trials are used to define realistic prior probabilities for therandom geometries. A final evaluation provides extensive resultsrelating the expected position error to channel model parametersand the number of base stations.