974 resultados para Covariance matrix
Resumo:
Matrix function approximation is a current focus of worldwide interest and finds application in a variety of areas of applied mathematics and statistics. In this thesis we focus on the approximation of A^(-α/2)b, where A ∈ ℝ^(n×n) is a large, sparse symmetric positive definite matrix and b ∈ ℝ^n is a vector. In particular, we will focus on matrix function techniques for sampling from Gaussian Markov random fields in applied statistics and the solution of fractional-in-space partial differential equations. Gaussian Markov random fields (GMRFs) are multivariate normal random variables characterised by a sparse precision (inverse covariance) matrix. GMRFs are popular models in computational spatial statistics as the sparse structure can be exploited, typically through the use of the sparse Cholesky decomposition, to construct fast sampling methods. It is well known, however, that for sufficiently large problems, iterative methods for solving linear systems outperform direct methods. Fractional-in-space partial differential equations arise in models of processes undergoing anomalous diffusion. Unfortunately, as the fractional Laplacian is a non-local operator, numerical methods based on the direct discretisation of these equations typically requires the solution of dense linear systems, which is impractical for fine discretisations. In this thesis, novel applications of Krylov subspace approximations to matrix functions for both of these problems are investigated. Matrix functions arise when sampling from a GMRF by noting that the Cholesky decomposition A = LL^T is, essentially, a `square root' of the precision matrix A. Therefore, we can replace the usual sampling method, which forms x = L^(-T)z, with x = A^(-1/2)z, where z is a vector of independent and identically distributed standard normal random variables. Similarly, the matrix transfer technique can be used to build solutions to the fractional Poisson equation of the form ϕn = A^(-α/2)b, where A is the finite difference approximation to the Laplacian. Hence both applications require the approximation of f(A)b, where f(t) = t^(-α/2) and A is sparse. In this thesis we will compare the Lanczos approximation, the shift-and-invert Lanczos approximation, the extended Krylov subspace method, rational approximations and the restarted Lanczos approximation for approximating matrix functions of this form. A number of new and novel results are presented in this thesis. Firstly, we prove the convergence of the matrix transfer technique for the solution of the fractional Poisson equation and we give conditions by which the finite difference discretisation can be replaced by other methods for discretising the Laplacian. We then investigate a number of methods for approximating matrix functions of the form A^(-α/2)b and investigate stopping criteria for these methods. In particular, we derive a new method for restarting the Lanczos approximation to f(A)b. We then apply these techniques to the problem of sampling from a GMRF and construct a full suite of methods for sampling conditioned on linear constraints and approximating the likelihood. Finally, we consider the problem of sampling from a generalised Matern random field, which combines our techniques for solving fractional-in-space partial differential equations with our method for sampling from GMRFs.
Resumo:
The population Monte Carlo algorithm is an iterative importance sampling scheme for solving static problems. We examine the population Monte Carlo algorithm in a simplified setting, a single step of the general algorithm, and study a fundamental problem that occurs in applying importance sampling to high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of estimate under conditions on the importance function. We demonstrate the exponential growth of the asymptotic variance with the dimension and show that the optimal covariance matrix for the importance function can be estimated in special cases.
Resumo:
This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.
Resumo:
Real‐time kinematic (RTK) GPS techniques have been extensively developed for applications including surveying, structural monitoring, and machine automation. Limitations of the existing RTK techniques that hinder their applications for geodynamics purposes are twofold: (1) the achievable RTK accuracy is on the level of a few centimeters and the uncertainty of vertical component is 1.5–2 times worse than those of horizontal components and (2) the RTK position uncertainty grows in proportional to the base‐torover distances. The key limiting factor behind the problems is the significant effect of residual tropospheric errors on the positioning solutions, especially on the highly correlated height component. This paper develops the geometry‐specified troposphere decorrelation strategy to achieve the subcentimeter kinematic positioning accuracy in all three components. The key is to set up a relative zenith tropospheric delay (RZTD) parameter to absorb the residual tropospheric effects and to solve the established model as an ill‐posed problem using the regularization method. In order to compute a reasonable regularization parameter to obtain an optimal regularized solution, the covariance matrix of positional parameters estimated without the RZTD parameter, which is characterized by observation geometry, is used to replace the quadratic matrix of their “true” values. As a result, the regularization parameter is adaptively computed with variation of observation geometry. The experiment results show that new method can efficiently alleviate the model’s ill condition and stabilize the solution from a single data epoch. Compared to the results from the conventional least squares method, the new method can improve the longrange RTK solution precision from several centimeters to the subcentimeter in all components. More significantly, the precision of the height component is even higher. Several geosciences applications that require subcentimeter real‐time solutions can largely benefit from the proposed approach, such as monitoring of earthquakes and large dams in real‐time, high‐precision GPS leveling and refinement of the vertical datum. In addition, the high‐resolution RZTD solutions can contribute to effective recovery of tropospheric slant path delays in order to establish a 4‐D troposphere tomography.
Resumo:
This paper presents a general, global approach to the problem of robot exploration, utilizing a topological data structure to guide an underlying Simultaneous Localization and Mapping (SLAM) process. A Gap Navigation Tree (GNT) is used to motivate global target selection and occluded regions of the environment (called “gaps”) are tracked probabilistically. The process of map construction and the motion of the vehicle alters both the shape and location of these regions. The use of online mapping is shown to reduce the difficulties in implementing the GNT.
Resumo:
We consider quantile regression models and investigate the induced smoothing method for obtaining the covariance matrix of the regression parameter estimates. We show that the difference between the smoothed and unsmoothed estimating functions in quantile regression is negligible. The detailed and simple computational algorithms for calculating the asymptotic covariance are provided. Intensive simulation studies indicate that the proposed method performs very well. We also illustrate the algorithm by analyzing the rainfall–runoff data from Murray Upland, Australia.
Resumo:
In the context of ambiguity resolution (AR) of Global Navigation Satellite Systems (GNSS), decorrelation among entries of an ambiguity vector, integer ambiguity search and ambiguity validations are three standard procedures for solving integer least-squares problems. This paper contributes to AR issues from three aspects. Firstly, the orthogonality defect is introduced as a new measure of the performance of ambiguity decorrelation methods, and compared with the decorrelation number and with the condition number which are currently used as the judging criterion to measure the correlation of ambiguity variance-covariance matrix. Numerically, the orthogonality defect demonstrates slightly better performance as a measure of the correlation between decorrelation impact and computational efficiency than the condition number measure. Secondly, the paper examines the relationship of the decorrelation number, the condition number, the orthogonality defect and the size of the ambiguity search space with the ambiguity search candidates and search nodes. The size of the ambiguity search space can be properly estimated if the ambiguity matrix is decorrelated well, which is shown to be a significant parameter in the ambiguity search progress. Thirdly, a new ambiguity resolution scheme is proposed to improve ambiguity search efficiency through the control of the size of the ambiguity search space. The new AR scheme combines the LAMBDA search and validation procedures together, which results in a much smaller size of the search space and higher computational efficiency while retaining the same AR validation outcomes. In fact, the new scheme can deal with the case there are only one candidate, while the existing search methods require at least two candidates. If there are more than one candidate, the new scheme turns to the usual ratio-test procedure. Experimental results indicate that this combined method can indeed improve ambiguity search efficiency for both the single constellation and dual constellations respectively, showing the potential for processing high dimension integer parameters in multi-GNSS environment.
Resumo:
Computer Experiments, consisting of a number of runs of a computer model with different inputs, are now common-place in scientific research. Using a simple fire model for illustration some guidelines are given for the size of a computer experiment. A graph is provided relating the error of prediction to the sample size which should be of use when designing computer experiments. Methods for augmenting computer experiments with extra runs are also described and illustrated. The simplest method involves adding one point at a time choosing that point with the maximum prediction variance. Another method that appears to work well is to choose points from a candidate set with maximum determinant of the variance covariance matrix of predictions.
Resumo:
Spectroscopic studies of complex clinical fluids have led to the application of a more holistic approach to their chemical analysis becoming more popular and widely employed. The efficient and effective interpretation of multidimensional spectroscopic data relies on many chemometric techniques and one such group of tools is represented by so-called correlation analysis methods. Typical of these techniques are two-dimensional correlation analysis and statistical total correlation spectroscopy (STOCSY). Whilst the former has largely been applied to optical spectroscopic analysis, STOCSY was developed and has been applied almost exclusively to NMR metabonomic studies. Using a 1H NMR study of human blood plasma, from subjects recovering from exhaustive exercise trials, the basic concepts and applications of these techniques are examined. Typical information from their application to NMR-based metabonomics is presented and their value in aiding interpretation of NMR data obtained from biological systems is illustrated. Major energy metabolites are identified in the NMR spectra and the dynamics of their appearance and removal from plasma during exercise recovery are illustrated and discussed. The complementary nature of two-dimensional correlation analysis and statistical total correlation spectroscopy are highlighted.
Resumo:
This paper investigates how best to forecast optimal portfolio weights in the context of a volatility timing strategy. It measures the economic value of a number of methods for forming optimal portfolios on the basis of realized volatility. These include the traditional econometric approach of forming portfolios from forecasts of the covariance matrix, and a novel method, where a time series of optimal portfolio weights are constructed from observed realized volatility and directly forecast. The approach proposed here of directly forecasting portfolio weights shows a great deal of merit. Resulting portfolios are of equivalent economic benefit to a number of competing approaches and are more stable across time. These findings have obvious implications for the manner in which volatility timing is undertaken in a portfolio allocation context.
Resumo:
Techniques for evaluating and selecting multivariate volatility forecasts are not yet understood as well as their univariate counterparts. This paper considers the ability of different loss functions to discriminate between a set of competing forecasting models which are subsequently applied in a portfolio allocation context. It is found that a likelihood-based loss function outperforms its competitors, including those based on the given portfolio application. This result indicates that considering the particular application of forecasts is not necessarily the most effective basis on which to select models.
Resumo:
To enhance the efficiency of regression parameter estimation by modeling the correlation structure of correlated binary error terms in quantile regression with repeated measurements, we propose a Gaussian pseudolikelihood approach for estimating correlation parameters and selecting the most appropriate working correlation matrix simultaneously. The induced smoothing method is applied to estimate the covariance of the regression parameter estimates, which can bypass density estimation of the errors. Extensive numerical studies indicate that the proposed method performs well in selecting an accurate correlation structure and improving regression parameter estimation efficiency. The proposed method is further illustrated by analyzing a dental dataset.
Resumo:
Some statistical procedures already available in literature are employed in developing the water quality index, WQI. The nature of complexity and interdependency that occur in physical and chemical processes of water could be easier explained if statistical approaches were applied to water quality indexing. The most popular statistical method used in developing WQI is the principal component analysis (PCA). In literature, the WQI development based on the classical PCA mostly used water quality data that have been transformed and normalized. Outliers may be considered in or eliminated from the analysis. However, the classical mean and sample covariance matrix used in classical PCA methodology is not reliable if the outliers exist in the data. Since the presence of outliers may affect the computation of the principal component, robust principal component analysis, RPCA should be used. Focusing in Langat River, the RPCA-WQI was introduced for the first time in this study to re-calculate the DOE-WQI. Results show that the RPCA-WQI is capable to capture similar distribution in the existing DOE-WQI.
Resumo:
A smoothed rank-based procedure is developed for the accelerated failure time model to overcome computational issues. The proposed estimator is based on an EM-type procedure coupled with the induced smoothing. "The proposed iterative approach converges provided the initial value is based on a consistent estimator, and the limiting covariance matrix can be obtained from a sandwich-type formula. The consistency and asymptotic normality of the proposed estimator are also established. Extensive simulations show that the new estimator is not only computationally less demanding but also more reliable than the other existing estimators.
Resumo:
Rank-based inference is widely used because of its robustness. This article provides optimal rank-based estimating functions in analysis of clustered data with random cluster effects. The extensive simulation studies carried out to evaluate the performance of the proposed method demonstrate that it is robust to outliers and is highly efficient given the existence of strong cluster correlations. The performance of the proposed method is satisfactory even when the correlation structure is misspecified, or when heteroscedasticity in variance is present. Finally, a real dataset is analyzed for illustration.