989 resultados para sample covariance matrix
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Most biological systems are formed by component parts that are to some degree interrelated. Groups of parts that are more associated among themselves and are relatively autonomous from others are called modules. One of the consequences of modularity is that biological systems usually present an unequal distribution of the genetic variation among traits. Estimating the covariance matrix that describes these systems is a difficult problem due to a number of factors such as poor sample sizes and measurement errors. We show that this problem will be exacerbated whenever matrix inversion is required, as in directional selection reconstruction analysis. We explore the consequences of varying degrees of modularity and signal-to-noise ratio on selection reconstruction. We then present and test the efficiency of available methods for controlling noise in matrix estimates. In our simulations, controlling matrices for noise vastly improves the reconstruction of selection gradients. We also perform an analysis of selection gradients reconstruction over a New World Monkeys skull database to illustrate the impact of noise on such analyses. Noise-controlled estimates render far more plausible interpretations that are in full agreement with previous results.
Resumo:
This paper introduces a novel approach to making inference about the regression parameters in the accelerated failure time (AFT) model for current status and interval censored data. The estimator is constructed by inverting a Wald type test for testing a null proportional hazards model. A numerically efficient Markov chain Monte Carlo (MCMC) based resampling method is proposed to simultaneously obtain the point estimator and a consistent estimator of its variance-covariance matrix. We illustrate our approach with interval censored data sets from two clinical studies. Extensive numerical studies are conducted to evaluate the finite sample performance of the new estimators.
Resumo:
With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^
Resumo:
Geostrophic surface velocities can be derived from the gradients of the mean dynamic topography-the difference between the mean sea surface and the geoid. Therefore, independently observed mean dynamic topography data are valuable input parameters and constraints for ocean circulation models. For a successful fit to observational dynamic topography data, not only the mean dynamic topography on the particular ocean model grid is required, but also information about its inverse covariance matrix. The calculation of the mean dynamic topography from satellite-based gravity field models and altimetric sea surface height measurements, however, is not straightforward. For this purpose, we previously developed an integrated approach to combining these two different observation groups in a consistent way without using the common filter approaches (Becker et al. in J Geodyn 59(60):99-110, 2012, doi:10.1016/j.jog.2011.07.0069; Becker in Konsistente Kombination von Schwerefeld, Altimetrie und hydrographischen Daten zur Modellierung der dynamischen Ozeantopographie, 2012, http://nbn-resolving.de/nbn:de:hbz:5n-29199). Within this combination method, the full spectral range of the observations is considered. Further, it allows the direct determination of the normal equations (i.e., the inverse of the error covariance matrix) of the mean dynamic topography on arbitrary grids, which is one of the requirements for ocean data assimilation. In this paper, we report progress through selection and improved processing of altimetric data sets. We focus on the preprocessing steps of along-track altimetry data from Jason-1 and Envisat to obtain a mean sea surface profile. During this procedure, a rigorous variance propagation is accomplished, so that, for the first time, the full covariance matrix of the mean sea surface is available. The combination of the mean profile and a combined GRACE/GOCE gravity field model yields a mean dynamic topography model for the North Atlantic Ocean that is characterized by a defined set of assumptions. We show that including the geodetically derived mean dynamic topography with the full error structure in a 3D stationary inverse ocean model improves modeled oceanographic features over previous estimates.
Resumo:
La plupart des modèles en statistique classique repose sur une hypothèse sur la distribution des données ou sur une distribution sous-jacente aux données. La validité de cette hypothèse permet de faire de l’inférence, de construire des intervalles de confiance ou encore de tester la fiabilité du modèle. La problématique des tests d’ajustement vise à s’assurer de la conformité ou de la cohérence de l’hypothèse avec les données disponibles. Dans la présente thèse, nous proposons des tests d’ajustement à la loi normale dans le cadre des séries chronologiques univariées et vectorielles. Nous nous sommes limités à une classe de séries chronologiques linéaires, à savoir les modèles autorégressifs à moyenne mobile (ARMA ou VARMA dans le cas vectoriel). Dans un premier temps, au cas univarié, nous proposons une généralisation du travail de Ducharme et Lafaye de Micheaux (2004) dans le cas où la moyenne est inconnue et estimée. Nous avons estimé les paramètres par une méthode rarement utilisée dans la littérature et pourtant asymptotiquement efficace. En effet, nous avons rigoureusement montré que l’estimateur proposé par Brockwell et Davis (1991, section 10.8) converge presque sûrement vers la vraie valeur inconnue du paramètre. De plus, nous fournissons une preuve rigoureuse de l’inversibilité de la matrice des variances et des covariances de la statistique de test à partir de certaines propriétés d’algèbre linéaire. Le résultat s’applique aussi au cas où la moyenne est supposée connue et égale à zéro. Enfin, nous proposons une méthode de sélection de la dimension de la famille d’alternatives de type AIC, et nous étudions les propriétés asymptotiques de cette méthode. L’outil proposé ici est basé sur une famille spécifique de polynômes orthogonaux, à savoir les polynômes de Legendre. Dans un second temps, dans le cas vectoriel, nous proposons un test d’ajustement pour les modèles autorégressifs à moyenne mobile avec une paramétrisation structurée. La paramétrisation structurée permet de réduire le nombre élevé de paramètres dans ces modèles ou encore de tenir compte de certaines contraintes particulières. Ce projet inclut le cas standard d’absence de paramétrisation. Le test que nous proposons s’applique à une famille quelconque de fonctions orthogonales. Nous illustrons cela dans le cas particulier des polynômes de Legendre et d’Hermite. Dans le cas particulier des polynômes d’Hermite, nous montrons que le test obtenu est invariant aux transformations affines et qu’il est en fait une généralisation de nombreux tests existants dans la littérature. Ce projet peut être vu comme une généralisation du premier dans trois directions, notamment le passage de l’univarié au multivarié ; le choix d’une famille quelconque de fonctions orthogonales ; et enfin la possibilité de spécifier des relations ou des contraintes dans la formulation VARMA. Nous avons procédé dans chacun des projets à une étude de simulation afin d’évaluer le niveau et la puissance des tests proposés ainsi que de les comparer aux tests existants. De plus des applications aux données réelles sont fournies. Nous avons appliqué les tests à la prévision de la température moyenne annuelle du globe terrestre (univarié), ainsi qu’aux données relatives au marché du travail canadien (bivarié). Ces travaux ont été exposés à plusieurs congrès (voir par exemple Tagne, Duchesne et Lafaye de Micheaux (2013a, 2013b, 2014) pour plus de détails). Un article basé sur le premier projet est également soumis dans une revue avec comité de lecture (Voir Duchesne, Lafaye de Micheaux et Tagne (2016)).
Resumo:
La plupart des modèles en statistique classique repose sur une hypothèse sur la distribution des données ou sur une distribution sous-jacente aux données. La validité de cette hypothèse permet de faire de l’inférence, de construire des intervalles de confiance ou encore de tester la fiabilité du modèle. La problématique des tests d’ajustement vise à s’assurer de la conformité ou de la cohérence de l’hypothèse avec les données disponibles. Dans la présente thèse, nous proposons des tests d’ajustement à la loi normale dans le cadre des séries chronologiques univariées et vectorielles. Nous nous sommes limités à une classe de séries chronologiques linéaires, à savoir les modèles autorégressifs à moyenne mobile (ARMA ou VARMA dans le cas vectoriel). Dans un premier temps, au cas univarié, nous proposons une généralisation du travail de Ducharme et Lafaye de Micheaux (2004) dans le cas où la moyenne est inconnue et estimée. Nous avons estimé les paramètres par une méthode rarement utilisée dans la littérature et pourtant asymptotiquement efficace. En effet, nous avons rigoureusement montré que l’estimateur proposé par Brockwell et Davis (1991, section 10.8) converge presque sûrement vers la vraie valeur inconnue du paramètre. De plus, nous fournissons une preuve rigoureuse de l’inversibilité de la matrice des variances et des covariances de la statistique de test à partir de certaines propriétés d’algèbre linéaire. Le résultat s’applique aussi au cas où la moyenne est supposée connue et égale à zéro. Enfin, nous proposons une méthode de sélection de la dimension de la famille d’alternatives de type AIC, et nous étudions les propriétés asymptotiques de cette méthode. L’outil proposé ici est basé sur une famille spécifique de polynômes orthogonaux, à savoir les polynômes de Legendre. Dans un second temps, dans le cas vectoriel, nous proposons un test d’ajustement pour les modèles autorégressifs à moyenne mobile avec une paramétrisation structurée. La paramétrisation structurée permet de réduire le nombre élevé de paramètres dans ces modèles ou encore de tenir compte de certaines contraintes particulières. Ce projet inclut le cas standard d’absence de paramétrisation. Le test que nous proposons s’applique à une famille quelconque de fonctions orthogonales. Nous illustrons cela dans le cas particulier des polynômes de Legendre et d’Hermite. Dans le cas particulier des polynômes d’Hermite, nous montrons que le test obtenu est invariant aux transformations affines et qu’il est en fait une généralisation de nombreux tests existants dans la littérature. Ce projet peut être vu comme une généralisation du premier dans trois directions, notamment le passage de l’univarié au multivarié ; le choix d’une famille quelconque de fonctions orthogonales ; et enfin la possibilité de spécifier des relations ou des contraintes dans la formulation VARMA. Nous avons procédé dans chacun des projets à une étude de simulation afin d’évaluer le niveau et la puissance des tests proposés ainsi que de les comparer aux tests existants. De plus des applications aux données réelles sont fournies. Nous avons appliqué les tests à la prévision de la température moyenne annuelle du globe terrestre (univarié), ainsi qu’aux données relatives au marché du travail canadien (bivarié). Ces travaux ont été exposés à plusieurs congrès (voir par exemple Tagne, Duchesne et Lafaye de Micheaux (2013a, 2013b, 2014) pour plus de détails). Un article basé sur le premier projet est également soumis dans une revue avec comité de lecture (Voir Duchesne, Lafaye de Micheaux et Tagne (2016)).
Resumo:
We examined the genetic basis of clinal adaptation by determining the evolutionary response of life-history traits to laboratory natural selection along a gradient of thermal stress in Drosophila serrata. A gradient of heat stress was created by exposing larvae to a heat stress of 36degrees for 4 hr for 0, 1, 2, 3, 4, or 5 days of larval development, with the remainder of development taking place at 25degrees. Replicated lines were exposed to each level of this stress every second generation for 30 generations. At the end of selection, we conducted a complete reciprocal transfer experiment where all populations were raised in all environments, to estimate the realized additive genetic covariance matrix among clinal environments in three life-history traits. Visualization of the genetic covariance functions of the life-history traits revealed that the genetic correlation between environments generally declined as environments became more different and even became negative between the most different environments in some cases. One exception to this general pattern was a life-history trait representing the classic trade-off between development time and body size, which responded to selection in a similar genetic fashion across all environments. Adaptation to clinal environments may involve a number of distinct genetic effects along the length of the cline, the complexity of which may not be fully revealed by focusing primarily on populations at the ends of the cline.
Resumo:
Subsequent to the influential paper of [Chan, K.C., Karolyi, G.A., Longstaff, F.A., Sanders, A.B., 1992. An empirical comparison of alternative models of the short-term interest rate. Journal of Finance 47, 1209-1227], the generalised method of moments (GMM) has been a popular technique for estimation and inference relating to continuous-time models of the short-term interest rate. GMM has been widely employed to estimate model parameters and to assess the goodness-of-fit of competing short-rate specifications. The current paper conducts a series of simulation experiments to document the bias and precision of GMM estimates of short-rate parameters, as well as the size and power of [Hansen, L.P., 1982. Large sample properties of generalised method of moments estimators. Econometrica 50, 1029-1054], J-test of over-identifying restrictions. While the J-test appears to have appropriate size and good power in sample sizes commonly encountered in the short-rate literature, GMM estimates of the speed of mean reversion are shown to be severely biased. Consequently, it is dangerous to draw strong conclusions about the strength of mean reversion using GMM. In contrast, the parameter capturing the levels effect, which is important in differentiating between competing short-rate specifications, is estimated with little bias. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
2000 Mathematics Subject Classification: 62H15, 62H12.
Resumo:
Prior research has established that idiosyncratic volatility of the securities prices exhibits a positive trend. This trend and other factors have made the merits of investment diversification and portfolio construction more compelling. ^ A new optimization technique, a greedy algorithm, is proposed to optimize the weights of assets in a portfolio. The main benefits of using this algorithm are to: (a) increase the efficiency of the portfolio optimization process, (b) implement large-scale optimizations, and (c) improve the resulting optimal weights. In addition, the technique utilizes a novel approach in the construction of a time-varying covariance matrix. This involves the application of a modified integrated dynamic conditional correlation GARCH (IDCC - GARCH) model to account for the dynamics of the conditional covariance matrices that are employed. ^ The stochastic aspects of the expected return of the securities are integrated into the technique through Monte Carlo simulations. Instead of representing the expected returns as deterministic values, they are assigned simulated values based on their historical measures. The time-series of the securities are fitted into a probability distribution that matches the time-series characteristics using the Anderson-Darling goodness-of-fit criterion. Simulated and actual data sets are used to further generalize the results. Employing the S&P500 securities as the base, 2000 simulated data sets are created using Monte Carlo simulation. In addition, the Russell 1000 securities are used to generate 50 sample data sets. ^ The results indicate an increase in risk-return performance. Choosing the Value-at-Risk (VaR) as the criterion and the Crystal Ball portfolio optimizer, a commercial product currently available on the market, as the comparison for benchmarking, the new greedy technique clearly outperforms others using a sample of the S&P500 and the Russell 1000 securities. The resulting improvements in performance are consistent among five securities selection methods (maximum, minimum, random, absolute minimum, and absolute maximum) and three covariance structures (unconditional, orthogonal GARCH, and integrated dynamic conditional GARCH). ^
Resumo:
Prior research has established that idiosyncratic volatility of the securities prices exhibits a positive trend. This trend and other factors have made the merits of investment diversification and portfolio construction more compelling. A new optimization technique, a greedy algorithm, is proposed to optimize the weights of assets in a portfolio. The main benefits of using this algorithm are to: a) increase the efficiency of the portfolio optimization process, b) implement large-scale optimizations, and c) improve the resulting optimal weights. In addition, the technique utilizes a novel approach in the construction of a time-varying covariance matrix. This involves the application of a modified integrated dynamic conditional correlation GARCH (IDCC - GARCH) model to account for the dynamics of the conditional covariance matrices that are employed. The stochastic aspects of the expected return of the securities are integrated into the technique through Monte Carlo simulations. Instead of representing the expected returns as deterministic values, they are assigned simulated values based on their historical measures. The time-series of the securities are fitted into a probability distribution that matches the time-series characteristics using the Anderson-Darling goodness-of-fit criterion. Simulated and actual data sets are used to further generalize the results. Employing the S&P500 securities as the base, 2000 simulated data sets are created using Monte Carlo simulation. In addition, the Russell 1000 securities are used to generate 50 sample data sets. The results indicate an increase in risk-return performance. Choosing the Value-at-Risk (VaR) as the criterion and the Crystal Ball portfolio optimizer, a commercial product currently available on the market, as the comparison for benchmarking, the new greedy technique clearly outperforms others using a sample of the S&P500 and the Russell 1000 securities. The resulting improvements in performance are consistent among five securities selection methods (maximum, minimum, random, absolute minimum, and absolute maximum) and three covariance structures (unconditional, orthogonal GARCH, and integrated dynamic conditional GARCH).
Resumo:
This dissertation contains four essays that all share a common purpose: developing new methodologies to exploit the potential of high-frequency data for the measurement, modeling and forecasting of financial assets volatility and correlations. The first two chapters provide useful tools for univariate applications while the last two chapters develop multivariate methodologies. In chapter 1, we introduce a new class of univariate volatility models named FloGARCH models. FloGARCH models provide a parsimonious joint model for low frequency returns and realized measures, and are sufficiently flexible to capture long memory as well as asymmetries related to leverage effects. We analyze the performances of the models in a realistic numerical study and on the basis of a data set composed of 65 equities. Using more than 10 years of high-frequency transactions, we document significant statistical gains related to the FloGARCH models in terms of in-sample fit, out-of-sample fit and forecasting accuracy compared to classical and Realized GARCH models. In chapter 2, using 12 years of high-frequency transactions for 55 U.S. stocks, we argue that combining low-frequency exogenous economic indicators with high-frequency financial data improves the ability of conditionally heteroskedastic models to forecast the volatility of returns, their full multi-step ahead conditional distribution and the multi-period Value-at-Risk. Using a refined version of the Realized LGARCH model allowing for time-varying intercept and implemented with realized kernels, we document that nominal corporate profits and term spreads have strong long-run predictive ability and generate accurate risk measures forecasts over long-horizon. The results are based on several loss functions and tests, including the Model Confidence Set. Chapter 3 is a joint work with David Veredas. We study the class of disentangled realized estimators for the integrated covariance matrix of Brownian semimartingales with finite activity jumps. These estimators separate correlations and volatilities. We analyze different combinations of quantile- and median-based realized volatilities, and four estimators of realized correlations with three synchronization schemes. Their finite sample properties are studied under four data generating processes, in presence, or not, of microstructure noise, and under synchronous and asynchronous trading. The main finding is that the pre-averaged version of disentangled estimators based on Gaussian ranks (for the correlations) and median deviations (for the volatilities) provide a precise, computationally efficient, and easy alternative to measure integrated covariances on the basis of noisy and asynchronous prices. Along these lines, a minimum variance portfolio application shows the superiority of this disentangled realized estimator in terms of numerous performance metrics. Chapter 4 is co-authored with Niels S. Hansen, Asger Lunde and Kasper V. Olesen, all affiliated with CREATES at Aarhus University. We propose to use the Realized Beta GARCH model to exploit the potential of high-frequency data in commodity markets. The model produces high quality forecasts of pairwise correlations between commodities which can be used to construct a composite covariance matrix. We evaluate the quality of this matrix in a portfolio context and compare it to models used in the industry. We demonstrate significant economic gains in a realistic setting including short selling constraints and transaction costs.
Resumo:
Compressed covariance sensing using quadratic samplers is gaining increasing interest in recent literature. Covariance matrix often plays the role of a sufficient statistic in many signal and information processing tasks. However, owing to the large dimension of the data, it may become necessary to obtain a compressed sketch of the high dimensional covariance matrix to reduce the associated storage and communication costs. Nested sampling has been proposed in the past as an efficient sub-Nyquist sampling strategy that enables perfect reconstruction of the autocorrelation sequence of Wide-Sense Stationary (WSS) signals, as though it was sampled at the Nyquist rate. The key idea behind nested sampling is to exploit properties of the difference set that naturally arises in quadratic measurement model associated with covariance compression. In this thesis, we will focus on developing novel versions of nested sampling for low rank Toeplitz covariance estimation, and phase retrieval, where the latter problem finds many applications in high resolution optical imaging, X-ray crystallography and molecular imaging. The problem of low rank compressive Toeplitz covariance estimation is first shown to be fundamentally related to that of line spectrum recovery. In absence if noise, this connection can be exploited to develop a particular kind of sampler called the Generalized Nested Sampler (GNS), that can achieve optimal compression rates. In presence of bounded noise, we develop a regularization-free algorithm that provably leads to stable recovery of the high dimensional Toeplitz matrix from its order-wise minimal sketch acquired using a GNS. Contrary to existing TV-norm and nuclear norm based reconstruction algorithms, our technique does not use any tuning parameters, which can be of great practical value. The idea of nested sampling idea also finds a surprising use in the problem of phase retrieval, which has been of great interest in recent times for its convex formulation via PhaseLift, By using another modified version of nested sampling, namely the Partial Nested Fourier Sampler (PNFS), we show that with probability one, it is possible to achieve a certain conjectured lower bound on the necessary measurement size. Moreover, for sparse data, an l1 minimization based algorithm is proposed that can lead to stable phase retrieval using order-wise minimal number of measurements.
Resumo:
This paper studies a nonlinear, discrete-time matrix system arising in the stability analysis of Kalman filters. These systems present an internal coupling between the state components that gives rise to complex dynamic behavior. The problem of partial stability, which requires that a specific component of the state of the system converge exponentially, is studied and solved. The convergent state component is strongly linked with the behavior of Kalman filters, since it can be used to provide bounds for the error covariance matrix under uncertainties in the noise measurements. We exploit the special features of the system-mainly the connections with linear systems-to obtain an algebraic test for partial stability. Finally, motivated by applications in which polynomial divergence of the estimates is acceptable, we study and solve a partial semistability problem.