970 resultados para Genetic Variance-covariance Matrix
Resumo:
In this letter, we derive continuum equations for the generalization error of the Bayesian online algorithm (BOnA) for the one-layer perceptron with a spherical covariance matrix using the Rosenblatt potential and show, by numerical calculations, that the asymptotic performance of the algorithm is the same as the one for the optimal algorithm found by means of variational methods with the added advantage that the BOnA does not use any inaccessible information during learning. © 2007 IEEE.
Resumo:
Евелина Илиева Велева - Разпределението на Уишарт се среща в практиката като разпределението на извадъчната ковариационна матрица за наблюдения над многомерно нормално разпределение. Изведени са някои маргинални плътности, получени чрез интегриране на плътността на Уишарт разпределението. Доказани са необходими и достатъчни условия за положителна определеност на една матрица, които дават нужните граници за интегрирането.
Resumo:
2000 Mathematics Subject Classification: 62H10.
Resumo:
2010 Mathematics Subject Classification: 62H10.
Resumo:
Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.
Resumo:
Prices of U.S. Treasury securities vary over time and across maturities. When the market in Treasurys is sufficiently complete and frictionless, these prices may be modeled by a function time and maturity. A cross-section of this function for time held fixed is called the yield curve; the aggregate of these sections is the evolution of the yield curve. This dissertation studies aspects of this evolution. ^ There are two complementary approaches to the study of yield curve evolution here. The first is principal components analysis; the second is wavelet analysis. In both approaches both the time and maturity variables are discretized. In principal components analysis the vectors of yield curve shifts are viewed as observations of a multivariate normal distribution. The resulting covariance matrix is diagonalized; the resulting eigenvalues and eigenvectors (the principal components) are used to draw inferences about the yield curve evolution. ^ In wavelet analysis, the vectors of shifts are resolved into hierarchies of localized fundamental shifts (wavelets) that leave specified global properties invariant (average change and duration change). The hierarchies relate to the degree of localization with movements restricted to a single maturity at the base and general movements at the apex. Second generation wavelet techniques allow better adaptation of the model to economic observables. Statistically, the wavelet approach is inherently nonparametric while the wavelets themselves are better adapted to describing a complete market. ^ Principal components analysis provides information on the dimension of the yield curve process. While there is no clear demarkation between operative factors and noise, the top six principal components pick up 99% of total interest rate variation 95% of the time. An economically justified basis of this process is hard to find; for example a simple linear model will not suffice for the first principal component and the shape of this component is nonstationary. ^ Wavelet analysis works more directly with yield curve observations than principal components analysis. In fact the complete process from bond data to multiresolution is presented, including the dedicated Perl programs and the details of the portfolio metrics and specially adapted wavelet construction. The result is more robust statistics which provide balance to the more fragile principal components analysis. ^
Resumo:
Prior research has established that idiosyncratic volatility of the securities prices exhibits a positive trend. This trend and other factors have made the merits of investment diversification and portfolio construction more compelling. ^ A new optimization technique, a greedy algorithm, is proposed to optimize the weights of assets in a portfolio. The main benefits of using this algorithm are to: (a) increase the efficiency of the portfolio optimization process, (b) implement large-scale optimizations, and (c) improve the resulting optimal weights. In addition, the technique utilizes a novel approach in the construction of a time-varying covariance matrix. This involves the application of a modified integrated dynamic conditional correlation GARCH (IDCC - GARCH) model to account for the dynamics of the conditional covariance matrices that are employed. ^ The stochastic aspects of the expected return of the securities are integrated into the technique through Monte Carlo simulations. Instead of representing the expected returns as deterministic values, they are assigned simulated values based on their historical measures. The time-series of the securities are fitted into a probability distribution that matches the time-series characteristics using the Anderson-Darling goodness-of-fit criterion. Simulated and actual data sets are used to further generalize the results. Employing the S&P500 securities as the base, 2000 simulated data sets are created using Monte Carlo simulation. In addition, the Russell 1000 securities are used to generate 50 sample data sets. ^ The results indicate an increase in risk-return performance. Choosing the Value-at-Risk (VaR) as the criterion and the Crystal Ball portfolio optimizer, a commercial product currently available on the market, as the comparison for benchmarking, the new greedy technique clearly outperforms others using a sample of the S&P500 and the Russell 1000 securities. The resulting improvements in performance are consistent among five securities selection methods (maximum, minimum, random, absolute minimum, and absolute maximum) and three covariance structures (unconditional, orthogonal GARCH, and integrated dynamic conditional GARCH). ^
Resumo:
Prior research has established that idiosyncratic volatility of the securities prices exhibits a positive trend. This trend and other factors have made the merits of investment diversification and portfolio construction more compelling. A new optimization technique, a greedy algorithm, is proposed to optimize the weights of assets in a portfolio. The main benefits of using this algorithm are to: a) increase the efficiency of the portfolio optimization process, b) implement large-scale optimizations, and c) improve the resulting optimal weights. In addition, the technique utilizes a novel approach in the construction of a time-varying covariance matrix. This involves the application of a modified integrated dynamic conditional correlation GARCH (IDCC - GARCH) model to account for the dynamics of the conditional covariance matrices that are employed. The stochastic aspects of the expected return of the securities are integrated into the technique through Monte Carlo simulations. Instead of representing the expected returns as deterministic values, they are assigned simulated values based on their historical measures. The time-series of the securities are fitted into a probability distribution that matches the time-series characteristics using the Anderson-Darling goodness-of-fit criterion. Simulated and actual data sets are used to further generalize the results. Employing the S&P500 securities as the base, 2000 simulated data sets are created using Monte Carlo simulation. In addition, the Russell 1000 securities are used to generate 50 sample data sets. The results indicate an increase in risk-return performance. Choosing the Value-at-Risk (VaR) as the criterion and the Crystal Ball portfolio optimizer, a commercial product currently available on the market, as the comparison for benchmarking, the new greedy technique clearly outperforms others using a sample of the S&P500 and the Russell 1000 securities. The resulting improvements in performance are consistent among five securities selection methods (maximum, minimum, random, absolute minimum, and absolute maximum) and three covariance structures (unconditional, orthogonal GARCH, and integrated dynamic conditional GARCH).
Resumo:
The complexity of modern geochemical data sets is increasing in several aspects (number of available samples, number of elements measured, number of matrices analysed, geological-environmental variability covered, etc), hence it is becoming increasingly necessary to apply statistical methods to elucidate their structure. This paper presents an exploratory analysis of one such complex data set, the Tellus geochemical soil survey of Northern Ireland (NI). This exploratory analysis is based on one of the most fundamental exploratory tools, principal component analysis (PCA) and its graphical representation as a biplot, albeit in several variations: the set of elements included (only major oxides vs. all observed elements), the prior transformation applied to the data (none, a standardization or a logratio transformation) and the way the covariance matrix between components is estimated (classical estimation vs. robust estimation). Results show that a log-ratio PCA (robust or classical) of all available elements is the most powerful exploratory setting, providing the following insights: the first two processes controlling the whole geochemical variation in NI soils are peat coverage and a contrast between “mafic” and “felsic” background lithologies; peat covered areas are detected as outliers by a robust analysis, and can be then filtered out if required for further modelling; and peat coverage intensity can be quantified with the %Br in the subcomposition (Br, Rb, Ni).
Resumo:
Ce mémoire présente deux algorithmes qui ont pour but d’améliorer la précision de l’estimation de la direction d’arrivée de sources sonores et de leurs échos. Le premier algorithme, qui s’appelle la méthode par élimination des sources, permet d’améliorer l’estimation de la direction d’arrivée d’échos qui sont noyés dans le bruit. Le second, qui s’appelle Multiple Signal Classification à focalisation de phase, utilise l’information dans la phase à chaque fréquence pour déterminer la direction d’arrivée de sources à large bande. La combinaison de ces deux algorithmes permet de localiser des échos dont la puissance est de -17 dB par rapport à la source principale, jusqu’à un rapport échoà- bruit de -15 dB. Ce mémoire présente aussi des mesures expérimentales qui viennent confirmer les résultats obtenus lors de simulations.
Resumo:
O monitoramento da diversidade genética é fundamental em um programa de repovoamento. Avaliouse a diversidade genética de pacu Piaractus mesopotamicus (Holmberg, 1887) em duas estações de piscicultura em Andirá -Paraná, Brasil, utilizadas no programa de repovoamento do Rio Paranapanema. Foram amplificados seis loci microssatélite para avaliar 60 amostras de nadadeira. O estoque de reprodutores B apresentou maior número de alelos e heterozigose (alelos: 22 e H O: 0,628) que o estoque de reprodutores A (alelos: 21 e H O: 0,600). Alelos com baixos níveis de frequência foram observados nos dois estoques. Os coeficientes positivos de endogamia no locus Pme2 (estoque A: F IS = 0,30 e estoque B: F IS = 0,20), Pme5 (estoque B: F IS = 0,15), Pme14 (estoque A: F IS = 0,07) e Pme28 (estoque A: F IS = 0,24 e estoque B: F IS = 0,20), indicaram deficiência de heterozigotos. Foi detectada a presença de um alelo nulo no lócus Pme2. As estimativas negativas nos loci Pme4 (estoque A: F IS = -0,43 e estoque B: F IS= -0,37), Pme5 (estoque A: F IS = - 0,11), Pme14 (estoque B: F IS = - 0,15) e Pme32 (estoque A: F IS = - 0,93 e estoque B: F IS = - 0,60) foram indicativas de excesso de heterozigotos. Foi evidenciado desequilíbrio de ligação e riqueza alélica baixa só no estoque A. A diversidade genética de Nei foi alta nos dois estoques. A distância (0,085) e identidade (0,918) genética mostraram similaridade entre os estoques, o qual reflete uma possível origem comum. 6,05% da variância genética total foi devida a diferenças entre os estoques. Foi observado um recente efeito gargalo nos dois estoques. Os resultados indicaram uma alta diversidade genética nos estoques de reprodutores e baixa diferenciação genética entre eles, o que foi causado pelo manejo reprodutivo das pisciculturas, redução do tamanho populacional e intercâmbio genético entre as pisciculturas.
Resumo:
Several factors have recently converged, elevating the need for highly parallel diagnostic platforms that have the ability to detect many known, novel, and emerging pathogenic agents simultaneously. Panviral DNA microarrays represent the most robust approach for massively parallel viral surveillance and detection. The Virochip is a panviral DNA microarray that is capable of detecting all known viruses, as well as novel viruses related to known viral families, in a single assay and has been used to successfully identify known and novel viral agents in clinical human specimens. However, the usefulness and the sensitivity of the Virochip platform have not been tested on a set of clinical veterinary specimens with the high degree of genetic variance that is frequently observed with swine virus field isolates. In this report, we investigate the utility and sensitivity of the Virochip to positively detect swine viruses in both cell culture-derived samples and clinical swine samples. The Virochip successfully detected porcine reproductive and respiratory syndrome virus (PRRSV) in serum containing 6.10 × 10(2) viral copies per microliter and influenza A virus in lung lavage fluid containing 2.08 × 10(6) viral copies per microliter. The Virochip also successfully detected porcine circovirus type 2 (PCV2) in serum containing 2.50 × 10(8) viral copies per microliter and porcine respiratory coronavirus (PRCV) in turbinate tissue homogenate. Collectively, the data in this report demonstrate that the Virochip can successfully detect pathogenic viruses frequently found in swine in a variety of solid and liquid specimens, such as turbinate tissue homogenate and lung lavage fluid, as well as antemortem samples, such as serum.
Resumo:
The current approach to data analysis for the Laser Interferometry Space Antenna (LISA) depends on the time delay interferometry observables (TDI) which have to be generated before any weak signal detection can be performed. These are linear combinations of the raw data with appropriate time shifts that lead to the cancellation of the laser frequency noises. This is possible because of the multiple occurrences of the same noises in the different raw data. Originally, these observables were manually generated starting with LISA as a simple stationary array and then adjusted to incorporate the antenna's motions. However, none of the observables survived the flexing of the arms in that they did not lead to cancellation with the same structure. The principal component approach is another way of handling these noises that was presented by Romano and Woan which simplified the data analysis by removing the need to create them before the analysis. This method also depends on the multiple occurrences of the same noises but, instead of using them for cancellation, it takes advantage of the correlations that they produce between the different readings. These correlations can be expressed in a noise (data) covariance matrix which occurs in the Bayesian likelihood function when the noises are assumed be Gaussian. Romano and Woan showed that performing an eigendecomposition of this matrix produced two distinct sets of eigenvalues that can be distinguished by the absence of laser frequency noise from one set. The transformation of the raw data using the corresponding eigenvectors also produced data that was free from the laser frequency noises. This result led to the idea that the principal components may actually be time delay interferometry observables since they produced the same outcome, that is, data that are free from laser frequency noise. The aims here were (i) to investigate the connection between the principal components and these observables, (ii) to prove that the data analysis using them is equivalent to that using the traditional observables and (ii) to determine how this method adapts to real LISA especially the flexing of the antenna. For testing the connection between the principal components and the TDI observables a 10x 10 covariance matrix containing integer values was used in order to obtain an algebraic solution for the eigendecomposition. The matrix was generated using fixed unequal arm lengths and stationary noises with equal variances for each noise type. Results confirm that all four Sagnac observables can be generated from the eigenvectors of the principal components. The observables obtained from this method however, are tied to the length of the data and are not general expressions like the traditional observables, for example, the Sagnac observables for two different time stamps were generated from different sets of eigenvectors. It was also possible to generate the frequency domain optimal AET observables from the principal components obtained from the power spectral density matrix. These results indicate that this method is another way of producing the observables therefore analysis using principal components should give the same results as that using the traditional observables. This was proven by fact that the same relative likelihoods (within 0.3%) were obtained from the Bayesian estimates of the signal amplitude of a simple sinusoidal gravitational wave using the principal components and the optimal AET observables. This method fails if the eigenvalues that are free from laser frequency noises are not generated. These are obtained from the covariance matrix and the properties of LISA that are required for its computation are the phase-locking, arm lengths and noise variances. Preliminary results of the effects of these properties on the principal components indicate that only the absence of phase-locking prevented their production. The flexing of the antenna results in time varying arm lengths which will appear in the covariance matrix and, from our toy model investigations, this did not prevent the occurrence of the principal components. The difficulty with flexing, and also non-stationary noises, is that the Toeplitz structure of the matrix will be destroyed which will affect any computation methods that take advantage of this structure. In terms of separating the two sets of data for the analysis, this was not necessary because the laser frequency noises are very large compared to the photodetector noises which resulted in a significant reduction in the data containing them after the matrix inversion. In the frequency domain the power spectral density matrices were block diagonals which simplified the computation of the eigenvalues by allowing them to be done separately for each block. The results in general showed a lack of principal components in the absence of phase-locking except for the zero bin. The major difference with the power spectral density matrix is that the time varying arm lengths and non-stationarity do not show up because of the summation in the Fourier transform.
Resumo:
The coastal ocean is a complex environment with extremely dynamic processes that require a high-resolution and cross-scale modeling approach in which all hydrodynamic fields and scales are considered integral parts of the overall system. In the last decade, unstructured-grid models have been used to advance in seamless modeling between scales. On the other hand, the data assimilation methodologies to improve the unstructured-grid models in the coastal seas have been developed only recently and need significant advancements. Here, we link the unstructured-grid ocean modeling to the variational data assimilation methods. In particular, we show results from the modeling system SANIFS based on SHYFEM fully-baroclinic unstructured-grid model interfaced with OceanVar, a state-of-art variational data assimilation scheme adopted for several systems based on a structured grid. OceanVar implements a 3DVar DA scheme. The combination of three linear operators models the background error covariance matrix. The vertical part is represented using multivariate EOFs for temperature, salinity, and sea level anomaly. The horizontal part is assumed to be Gaussian isotropic and is modeled using a first-order recursive filter algorithm designed for structured and regular grids. Here we introduced a novel recursive filter algorithm for unstructured grids. A local hydrostatic adjustment scheme models the rapidly evolving part of the background error covariance. We designed two data assimilation experiments using SANIFS implementation interfaced with OceanVar over the period 2017-2018, one with only temperature and salinity assimilation by Argo profiles and the second also including sea level anomaly. The results showed a successful implementation of the approach and the added value of the assimilation for the active tracer fields. While looking at the broad basin, no significant improvements are highlighted for the sea level, requiring future investigations. Furthermore, a Machine Learning methodology based on an LSTM network has been used to predict the model SST increments.
Resumo:
The P3(00) event-related potential (ERP) component is widely used as a measure of cognitive functioning and provides a sensitive electrophysiological index of the attentional and working memory demands of a task. This study investigated what proportion of the variance in the amplitude and latency of the P3, elicited in a delayed response working memory task, could be attributed to genetic factors. In 335 adolescent twin pairs and 48 siblings, the amplitude and latency of the P3 were examined at frontal, central, and parietal sites. Additive genetic factors accounted for 48% to 61% of the variance in P3 amplitude. Approximately one-third of the genetic variation at frontal sites was mediated by a common genetic factor that also influenced the genetic variation at parietal and central sites. Familial resemblance in P3 latency was due to genetic influence that accounted for 44% to 50% of the variance. Genetic covariance in P3 latency across sites was substantial, with a large part of the variance found at parietal, central, and frontal sites attributed to a common genetic factor. The findings provide further evidence that the P3 is a promising phenotype of neural activity of the brain and has the potential to be used in linkage and association analysis in the search for quantitative trait loci (QTLs) influencing cognition.