175 resultados para Covariance matrix
em Queensland University of Technology - ePrints Archive
Resumo:
Objective To explore the characteristics of regional distribution of cancer deaths in Shandong Province with the principle components analysis. Methods The principle components analysis with co-variance matrix for age-adjusted mortality rates and percentages of 20 types of cancer in 22 counties (cities) were carried out using SAS Software. Results Over 90% of the total information could be reflected by the top 3 principle components and the first principle component alone represented more than half of the overall regional variances. The first component mainly reflected the area differences of esophageal cancer. The second component mainly reflected the area differences of lung cancer, stomach cancer and liver cancer. The value of the first principal component scores showed a clear trend that the west areas possessed higher values and the east the lower values. Based on the top two components,the 22 counties (cities) could be divided into several geographical clusters. Conclusion The overall difference of regional distribution of cancers in Shandong is dominated by several major cancers including esophageal cancer, lung cancer, stomach cancer and liver cancer. Among them,esophageal cancer makes the largest contribution. If the range of counties (cities) analyzed could be further widened, the characteristics of regional distribution of cancer mortality would be better examined. Abstract in Chinese 目的 利用主成分分析探讨山东省恶性肿瘤死亡的地区分布特征. 方法 利用SAS软件对山东省22个县市区2004~2006午的20种恶性肿瘤标化死亡率和构成比分别进行协方差矩阵主成分分析. 结果 前3个主成分就反映了总体差异90%以上的信息,其中仅第1主成分就提供了总体差异一半以上的信息.第1主成分主要反映了食管癌的地区差异,第2主成分主要反映肺癌的地区差异,兼顾胃癌和肝癌.各地区第1主成分得分呈现西高东低的趋势,根据第1和第2主成分可以将调查地区分为若干类别,表现为明显的地理聚集性. 结论 山东省各地区恶性肿瘤死亡的总体差异主要取决于少数高发肿瘤,包括食管癌、肺癌、胃癌、肝癌等,其中以食管癌地位最为突出.如能进一步扩大分析范围,可更好地查明恶性肿瘤死亡的地区特征.
Resumo:
A 'pseudo-Bayesian' interpretation of standard errors yields a natural induced smoothing of statistical estimating functions. When applied to rank estimation, the lack of smoothness which prevents standard error estimation is remedied. Efficiency and robustness are preserved, while the smoothed estimation has excellent computational properties. In particular, convergence of the iterative equation for standard error is fast, and standard error calculation becomes asymptotically a one-step procedure. This property also extends to covariance matrix calculation for rank estimates in multi-parameter problems. Examples, and some simple explanations, are given.
Resumo:
Matrix function approximation is a current focus of worldwide interest and finds application in a variety of areas of applied mathematics and statistics. In this thesis we focus on the approximation of A^(-α/2)b, where A ∈ ℝ^(n×n) is a large, sparse symmetric positive definite matrix and b ∈ ℝ^n is a vector. In particular, we will focus on matrix function techniques for sampling from Gaussian Markov random fields in applied statistics and the solution of fractional-in-space partial differential equations. Gaussian Markov random fields (GMRFs) are multivariate normal random variables characterised by a sparse precision (inverse covariance) matrix. GMRFs are popular models in computational spatial statistics as the sparse structure can be exploited, typically through the use of the sparse Cholesky decomposition, to construct fast sampling methods. It is well known, however, that for sufficiently large problems, iterative methods for solving linear systems outperform direct methods. Fractional-in-space partial differential equations arise in models of processes undergoing anomalous diffusion. Unfortunately, as the fractional Laplacian is a non-local operator, numerical methods based on the direct discretisation of these equations typically requires the solution of dense linear systems, which is impractical for fine discretisations. In this thesis, novel applications of Krylov subspace approximations to matrix functions for both of these problems are investigated. Matrix functions arise when sampling from a GMRF by noting that the Cholesky decomposition A = LL^T is, essentially, a `square root' of the precision matrix A. Therefore, we can replace the usual sampling method, which forms x = L^(-T)z, with x = A^(-1/2)z, where z is a vector of independent and identically distributed standard normal random variables. Similarly, the matrix transfer technique can be used to build solutions to the fractional Poisson equation of the form ϕn = A^(-α/2)b, where A is the finite difference approximation to the Laplacian. Hence both applications require the approximation of f(A)b, where f(t) = t^(-α/2) and A is sparse. In this thesis we will compare the Lanczos approximation, the shift-and-invert Lanczos approximation, the extended Krylov subspace method, rational approximations and the restarted Lanczos approximation for approximating matrix functions of this form. A number of new and novel results are presented in this thesis. Firstly, we prove the convergence of the matrix transfer technique for the solution of the fractional Poisson equation and we give conditions by which the finite difference discretisation can be replaced by other methods for discretising the Laplacian. We then investigate a number of methods for approximating matrix functions of the form A^(-α/2)b and investigate stopping criteria for these methods. In particular, we derive a new method for restarting the Lanczos approximation to f(A)b. We then apply these techniques to the problem of sampling from a GMRF and construct a full suite of methods for sampling conditioned on linear constraints and approximating the likelihood. Finally, we consider the problem of sampling from a generalised Matern random field, which combines our techniques for solving fractional-in-space partial differential equations with our method for sampling from GMRFs.
Resumo:
The population Monte Carlo algorithm is an iterative importance sampling scheme for solving static problems. We examine the population Monte Carlo algorithm in a simplified setting, a single step of the general algorithm, and study a fundamental problem that occurs in applying importance sampling to high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of estimate under conditions on the importance function. We demonstrate the exponential growth of the asymptotic variance with the dimension and show that the optimal covariance matrix for the importance function can be estimated in special cases.
Resumo:
This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.
Resumo:
Real‐time kinematic (RTK) GPS techniques have been extensively developed for applications including surveying, structural monitoring, and machine automation. Limitations of the existing RTK techniques that hinder their applications for geodynamics purposes are twofold: (1) the achievable RTK accuracy is on the level of a few centimeters and the uncertainty of vertical component is 1.5–2 times worse than those of horizontal components and (2) the RTK position uncertainty grows in proportional to the base‐torover distances. The key limiting factor behind the problems is the significant effect of residual tropospheric errors on the positioning solutions, especially on the highly correlated height component. This paper develops the geometry‐specified troposphere decorrelation strategy to achieve the subcentimeter kinematic positioning accuracy in all three components. The key is to set up a relative zenith tropospheric delay (RZTD) parameter to absorb the residual tropospheric effects and to solve the established model as an ill‐posed problem using the regularization method. In order to compute a reasonable regularization parameter to obtain an optimal regularized solution, the covariance matrix of positional parameters estimated without the RZTD parameter, which is characterized by observation geometry, is used to replace the quadratic matrix of their “true” values. As a result, the regularization parameter is adaptively computed with variation of observation geometry. The experiment results show that new method can efficiently alleviate the model’s ill condition and stabilize the solution from a single data epoch. Compared to the results from the conventional least squares method, the new method can improve the longrange RTK solution precision from several centimeters to the subcentimeter in all components. More significantly, the precision of the height component is even higher. Several geosciences applications that require subcentimeter real‐time solutions can largely benefit from the proposed approach, such as monitoring of earthquakes and large dams in real‐time, high‐precision GPS leveling and refinement of the vertical datum. In addition, the high‐resolution RZTD solutions can contribute to effective recovery of tropospheric slant path delays in order to establish a 4‐D troposphere tomography.
Resumo:
This paper presents a general, global approach to the problem of robot exploration, utilizing a topological data structure to guide an underlying Simultaneous Localization and Mapping (SLAM) process. A Gap Navigation Tree (GNT) is used to motivate global target selection and occluded regions of the environment (called “gaps”) are tracked probabilistically. The process of map construction and the motion of the vehicle alters both the shape and location of these regions. The use of online mapping is shown to reduce the difficulties in implementing the GNT.
Resumo:
We consider quantile regression models and investigate the induced smoothing method for obtaining the covariance matrix of the regression parameter estimates. We show that the difference between the smoothed and unsmoothed estimating functions in quantile regression is negligible. The detailed and simple computational algorithms for calculating the asymptotic covariance are provided. Intensive simulation studies indicate that the proposed method performs very well. We also illustrate the algorithm by analyzing the rainfall–runoff data from Murray Upland, Australia.
Resumo:
In the context of ambiguity resolution (AR) of Global Navigation Satellite Systems (GNSS), decorrelation among entries of an ambiguity vector, integer ambiguity search and ambiguity validations are three standard procedures for solving integer least-squares problems. This paper contributes to AR issues from three aspects. Firstly, the orthogonality defect is introduced as a new measure of the performance of ambiguity decorrelation methods, and compared with the decorrelation number and with the condition number which are currently used as the judging criterion to measure the correlation of ambiguity variance-covariance matrix. Numerically, the orthogonality defect demonstrates slightly better performance as a measure of the correlation between decorrelation impact and computational efficiency than the condition number measure. Secondly, the paper examines the relationship of the decorrelation number, the condition number, the orthogonality defect and the size of the ambiguity search space with the ambiguity search candidates and search nodes. The size of the ambiguity search space can be properly estimated if the ambiguity matrix is decorrelated well, which is shown to be a significant parameter in the ambiguity search progress. Thirdly, a new ambiguity resolution scheme is proposed to improve ambiguity search efficiency through the control of the size of the ambiguity search space. The new AR scheme combines the LAMBDA search and validation procedures together, which results in a much smaller size of the search space and higher computational efficiency while retaining the same AR validation outcomes. In fact, the new scheme can deal with the case there are only one candidate, while the existing search methods require at least two candidates. If there are more than one candidate, the new scheme turns to the usual ratio-test procedure. Experimental results indicate that this combined method can indeed improve ambiguity search efficiency for both the single constellation and dual constellations respectively, showing the potential for processing high dimension integer parameters in multi-GNSS environment.
Resumo:
Computer Experiments, consisting of a number of runs of a computer model with different inputs, are now common-place in scientific research. Using a simple fire model for illustration some guidelines are given for the size of a computer experiment. A graph is provided relating the error of prediction to the sample size which should be of use when designing computer experiments. Methods for augmenting computer experiments with extra runs are also described and illustrated. The simplest method involves adding one point at a time choosing that point with the maximum prediction variance. Another method that appears to work well is to choose points from a candidate set with maximum determinant of the variance covariance matrix of predictions.
Resumo:
Spectroscopic studies of complex clinical fluids have led to the application of a more holistic approach to their chemical analysis becoming more popular and widely employed. The efficient and effective interpretation of multidimensional spectroscopic data relies on many chemometric techniques and one such group of tools is represented by so-called correlation analysis methods. Typical of these techniques are two-dimensional correlation analysis and statistical total correlation spectroscopy (STOCSY). Whilst the former has largely been applied to optical spectroscopic analysis, STOCSY was developed and has been applied almost exclusively to NMR metabonomic studies. Using a 1H NMR study of human blood plasma, from subjects recovering from exhaustive exercise trials, the basic concepts and applications of these techniques are examined. Typical information from their application to NMR-based metabonomics is presented and their value in aiding interpretation of NMR data obtained from biological systems is illustrated. Major energy metabolites are identified in the NMR spectra and the dynamics of their appearance and removal from plasma during exercise recovery are illustrated and discussed. The complementary nature of two-dimensional correlation analysis and statistical total correlation spectroscopy are highlighted.
Resumo:
This paper investigates how best to forecast optimal portfolio weights in the context of a volatility timing strategy. It measures the economic value of a number of methods for forming optimal portfolios on the basis of realized volatility. These include the traditional econometric approach of forming portfolios from forecasts of the covariance matrix, and a novel method, where a time series of optimal portfolio weights are constructed from observed realized volatility and directly forecast. The approach proposed here of directly forecasting portfolio weights shows a great deal of merit. Resulting portfolios are of equivalent economic benefit to a number of competing approaches and are more stable across time. These findings have obvious implications for the manner in which volatility timing is undertaken in a portfolio allocation context.
Resumo:
Techniques for evaluating and selecting multivariate volatility forecasts are not yet understood as well as their univariate counterparts. This paper considers the ability of different loss functions to discriminate between a set of competing forecasting models which are subsequently applied in a portfolio allocation context. It is found that a likelihood-based loss function outperforms its competitors, including those based on the given portfolio application. This result indicates that considering the particular application of forecasts is not necessarily the most effective basis on which to select models.
Resumo:
To enhance the efficiency of regression parameter estimation by modeling the correlation structure of correlated binary error terms in quantile regression with repeated measurements, we propose a Gaussian pseudolikelihood approach for estimating correlation parameters and selecting the most appropriate working correlation matrix simultaneously. The induced smoothing method is applied to estimate the covariance of the regression parameter estimates, which can bypass density estimation of the errors. Extensive numerical studies indicate that the proposed method performs well in selecting an accurate correlation structure and improving regression parameter estimation efficiency. The proposed method is further illustrated by analyzing a dental dataset.
Resumo:
Some statistical procedures already available in literature are employed in developing the water quality index, WQI. The nature of complexity and interdependency that occur in physical and chemical processes of water could be easier explained if statistical approaches were applied to water quality indexing. The most popular statistical method used in developing WQI is the principal component analysis (PCA). In literature, the WQI development based on the classical PCA mostly used water quality data that have been transformed and normalized. Outliers may be considered in or eliminated from the analysis. However, the classical mean and sample covariance matrix used in classical PCA methodology is not reliable if the outliers exist in the data. Since the presence of outliers may affect the computation of the principal component, robust principal component analysis, RPCA should be used. Focusing in Langat River, the RPCA-WQI was introduced for the first time in this study to re-calculate the DOE-WQI. Results show that the RPCA-WQI is capable to capture similar distribution in the existing DOE-WQI.