96 resultados para Covariance.
em Queensland University of Technology - ePrints Archive
Resumo:
Objective To explore the characteristics of regional distribution of cancer deaths in Shandong Province with the principle components analysis. Methods The principle components analysis with co-variance matrix for age-adjusted mortality rates and percentages of 20 types of cancer in 22 counties (cities) were carried out using SAS Software. Results Over 90% of the total information could be reflected by the top 3 principle components and the first principle component alone represented more than half of the overall regional variances. The first component mainly reflected the area differences of esophageal cancer. The second component mainly reflected the area differences of lung cancer, stomach cancer and liver cancer. The value of the first principal component scores showed a clear trend that the west areas possessed higher values and the east the lower values. Based on the top two components,the 22 counties (cities) could be divided into several geographical clusters. Conclusion The overall difference of regional distribution of cancers in Shandong is dominated by several major cancers including esophageal cancer, lung cancer, stomach cancer and liver cancer. Among them,esophageal cancer makes the largest contribution. If the range of counties (cities) analyzed could be further widened, the characteristics of regional distribution of cancer mortality would be better examined. Abstract in Chinese 目的 利用主成分分析探讨山东省恶性肿瘤死亡的地区分布特征. 方法 利用SAS软件对山东省22个县市区2004~2006午的20种恶性肿瘤标化死亡率和构成比分别进行协方差矩阵主成分分析. 结果 前3个主成分就反映了总体差异90%以上的信息,其中仅第1主成分就提供了总体差异一半以上的信息.第1主成分主要反映了食管癌的地区差异,第2主成分主要反映肺癌的地区差异,兼顾胃癌和肝癌.各地区第1主成分得分呈现西高东低的趋势,根据第1和第2主成分可以将调查地区分为若干类别,表现为明显的地理聚集性. 结论 山东省各地区恶性肿瘤死亡的总体差异主要取决于少数高发肿瘤,包括食管癌、肺癌、胃癌、肝癌等,其中以食管癌地位最为突出.如能进一步扩大分析范围,可更好地查明恶性肿瘤死亡的地区特征.
Resumo:
In this paper we introduce a novel domain-invariant covariance normalization (DICN) technique to relocate both in-domain and out-domain i-vectors into a third dataset-invariant space, providing an improvement for out-domain PLDA speaker verification with a very small number of unlabelled in-domain adaptation i-vectors. By capturing the dataset variance from a global mean using both development out-domain i-vectors and limited unlabelled in-domain i-vectors, we could obtain domain- invariant representations of PLDA training data. The DICN- compensated out-domain PLDA system is shown to perform as well as in-domain PLDA training with as few as 500 unlabelled in-domain i-vectors for NIST-2010 SRE and 2000 unlabelled in-domain i-vectors for NIST-2008 SRE, and considerable relative improvement over both out-domain and in-domain PLDA development if more are available.
Resumo:
Spatial data analysis has become more and more important in the studies of ecology and economics during the last decade. One focus of spatial data analysis is how to select predictors, variance functions and correlation functions. However, in general, the true covariance function is unknown and the working covariance structure is often misspecified. In this paper, our target is to find a good strategy to identify the best model from the candidate set using model selection criteria. This paper is to evaluate the ability of some information criteria (corrected Akaike information criterion, Bayesian information criterion (BIC) and residual information criterion (RIC)) for choosing the optimal model when the working correlation function, the working variance function and the working mean function are correct or misspecified. Simulations are carried out for small to moderate sample sizes. Four candidate covariance functions (exponential, Gaussian, Matern and rational quadratic) are used in simulation studies. With the summary in simulation results, we find that the misspecified working correlation structure can still capture some spatial correlation information in model fitting. When the sample size is large enough, BIC and RIC perform well even if the the working covariance is misspecified. Moreover, the performance of these information criteria is related to the average level of model fitting which can be indicated by the average adjusted R square ( [GRAPHICS] ), and overall RIC performs well.
Resumo:
We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.
Resumo:
A 'pseudo-Bayesian' interpretation of standard errors yields a natural induced smoothing of statistical estimating functions. When applied to rank estimation, the lack of smoothness which prevents standard error estimation is remedied. Efficiency and robustness are preserved, while the smoothed estimation has excellent computational properties. In particular, convergence of the iterative equation for standard error is fast, and standard error calculation becomes asymptotically a one-step procedure. This property also extends to covariance matrix calculation for rank estimates in multi-parameter problems. Examples, and some simple explanations, are given.
Resumo:
Despite the best intentions of service providers and organisations, service delivery is rarely error-free. While numerous studies have investigated specific cognitive, emotional or behavioural responses to service failure and recovery, these studies do not fully capture the complexity of the services encounter. Consequently, this research develops a more holistic understanding of how specific service recovery strategies affect the responses of customers by combining two existing models—Smith & Bolton’s (2002) model of emotional responses to service performance and Fullerton and Punj’s (1993) structural model of aberrant consumer behaviour—into a conceptual framework. Specific service recovery strategies are proposed to influence consumer cognition, emotion and behaviour. This research was conducted using a 2x2 between-subjects quasi-experimental design that was administered via written survey. The experimental design manipulated two levels of two specific service recovery strategies: compensation and apology. The effect of the four recovery strategies were investigated by collecting data from 18-25 year olds and were analysed using multivariate analysis of covariance and multiple regression analysis. The results suggest that different service recovery strategies are associated with varying scores of satisfaction, perceived distributive justice, positive emotions, negative emotions and negative functional behaviour, but not dysfunctional behaviour. These finding have significant implications for the theory and practice of managing service recovery.
Resumo:
Purpose Waiting for service by customers is an important problem for many financial service marketers. Two new approaches are proposed. First, customer evaluation of the service is increased with an ambient scent. Second a cognitive variable is identified which different iates customers by the way they value time so that they can be segmented. Methodology Pretests included focus groups which highlighted financial services and a pilot test were foll owed by a main sample of 607 subjects. Structural equation modelling and multivariate analysis of covariance were used for analysis. Findings A cognitive variable, the need for time management can be used, together with demographic and customer net worth data, to segment a customer base. Two environmental interventions, music and scent, can increase customer satisfaction among customers kept waiting in a line. Research implications Two original approaches to a rapidly growing service marketing problem are identified. Practical implications Service contact points can reduce incidence of "queue rage" and enhance customer satisfaction by either or both of two simple modifications to the service environment or a preventive strategy of offering targeted customers an alternative. Originality A new method of segmentation and a new environmental intervention are proposed .
Resumo:
1. Ecological data sets often use clustered measurements or use repeated sampling in a longitudinal design. Choosing the correct covariance structure is an important step in the analysis of such data, as the covariance describes the degree of similarity among the repeated observations. 2. Three methods for choosing the covariance are: the Akaike information criterion (AIC), the quasi-information criterion (QIC), and the deviance information criterion (DIC). We compared the methods using a simulation study and using a data set that explored effects of forest fragmentation on avian species richness over 15 years. 3. The overall success was 80.6% for the AIC, 29.4% for the QIC and 81.6% for the DIC. For the forest fragmentation study the AIC and DIC selected the unstructured covariance, whereas the QIC selected the simpler autoregressive covariance. Graphical diagnostics suggested that the unstructured covariance was probably correct. 4. We recommend using DIC for selecting the correct covariance structure.
Resumo:
Matrix function approximation is a current focus of worldwide interest and finds application in a variety of areas of applied mathematics and statistics. In this thesis we focus on the approximation of A^(-α/2)b, where A ∈ ℝ^(n×n) is a large, sparse symmetric positive definite matrix and b ∈ ℝ^n is a vector. In particular, we will focus on matrix function techniques for sampling from Gaussian Markov random fields in applied statistics and the solution of fractional-in-space partial differential equations. Gaussian Markov random fields (GMRFs) are multivariate normal random variables characterised by a sparse precision (inverse covariance) matrix. GMRFs are popular models in computational spatial statistics as the sparse structure can be exploited, typically through the use of the sparse Cholesky decomposition, to construct fast sampling methods. It is well known, however, that for sufficiently large problems, iterative methods for solving linear systems outperform direct methods. Fractional-in-space partial differential equations arise in models of processes undergoing anomalous diffusion. Unfortunately, as the fractional Laplacian is a non-local operator, numerical methods based on the direct discretisation of these equations typically requires the solution of dense linear systems, which is impractical for fine discretisations. In this thesis, novel applications of Krylov subspace approximations to matrix functions for both of these problems are investigated. Matrix functions arise when sampling from a GMRF by noting that the Cholesky decomposition A = LL^T is, essentially, a `square root' of the precision matrix A. Therefore, we can replace the usual sampling method, which forms x = L^(-T)z, with x = A^(-1/2)z, where z is a vector of independent and identically distributed standard normal random variables. Similarly, the matrix transfer technique can be used to build solutions to the fractional Poisson equation of the form ϕn = A^(-α/2)b, where A is the finite difference approximation to the Laplacian. Hence both applications require the approximation of f(A)b, where f(t) = t^(-α/2) and A is sparse. In this thesis we will compare the Lanczos approximation, the shift-and-invert Lanczos approximation, the extended Krylov subspace method, rational approximations and the restarted Lanczos approximation for approximating matrix functions of this form. A number of new and novel results are presented in this thesis. Firstly, we prove the convergence of the matrix transfer technique for the solution of the fractional Poisson equation and we give conditions by which the finite difference discretisation can be replaced by other methods for discretising the Laplacian. We then investigate a number of methods for approximating matrix functions of the form A^(-α/2)b and investigate stopping criteria for these methods. In particular, we derive a new method for restarting the Lanczos approximation to f(A)b. We then apply these techniques to the problem of sampling from a GMRF and construct a full suite of methods for sampling conditioned on linear constraints and approximating the likelihood. Finally, we consider the problem of sampling from a generalised Matern random field, which combines our techniques for solving fractional-in-space partial differential equations with our method for sampling from GMRFs.
Resumo:
The population Monte Carlo algorithm is an iterative importance sampling scheme for solving static problems. We examine the population Monte Carlo algorithm in a simplified setting, a single step of the general algorithm, and study a fundamental problem that occurs in applying importance sampling to high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of estimate under conditions on the importance function. We demonstrate the exponential growth of the asymptotic variance with the dimension and show that the optimal covariance matrix for the importance function can be estimated in special cases.
Resumo:
Purpose: To investigate associations between the diurnal variation in a range of corneal parameters, including anterior and posterior corneal topography, and regional corneal thickness. ----- Methods: Fifteen subjects had their corneas measured using a rotating Scheimpflug camera (Pentacam) every 3-7 hours over a 24-hour period. Anterior and posterior corneal axial curvature, pachymetry and anterior chamber depth were analysed. The best fitting corneal sphero-cylinder from the axial curvature, and the average corneal thickness for a series of different corneal regions were calculated. Intraocular pressure and axial length were also measured at each measurement session. Repeated measures ANOVA were used to investigate diurnal change in these parameters. Analysis of covariance was used to examine associations between the measured ocular parameters. ----- Results: Significant diurnal variation was found to occur in both the anterior and posterior corneal curvature and in the regional corneal thickness. Flattening of the anterior corneal best sphere was observed at the early morning measurement (p < 0.0001). The posterior cornea also underwent a significant steepening (p < 0.0001) and change in astigmatism 90/180° at this time. A significant swelling of the cornea (p < 0.0001) was also found to occur immediately after waking. Highly significant associations were found between the diurnal variation in corneal thickness and the changes in corneal curvature. ----- Conclusions: Significant diurnal variation occurs in the regional thickness and the shape of the anterior and posterior cornea. The largest changes in the cornea were typically evident upon waking. The observed non-uniform regional corneal thickness changes resulted in a steepening of the posterior cornea, and a flattening of the anterior cornea to occur at this time.
Resumo:
An experimental investigation has been made of a round, non-buoyant plume of nitric oxide, NO, in a turbulent grid flow of ozone, 03, using the Turbulent Smog Chamber at the University of Sydney. The measurements have been made at a resolution not previously reported in the literature. The reaction is conducted at non-equilibrium so there is significant interaction between turbulent mixing and chemical reaction. The plume has been characterized by a set of constant initial reactant concentration measurements consisting of radial profiles at various axial locations. Whole plume behaviour can thus be characterized and parameters are selected for a second set of fixed physical location measurements where the effects of varying the initial reactant concentrations are investigated. Careful experiment design and specially developed chemilurninescent analysers, which measure fluctuating concentrations of reactive scalars, ensure that spatial and temporal resolutions are adequate to measure the quantities of interest. Conserved scalar theory is used to define a conserved scalar from the measured reactive scalars and to define frozen, equilibrium and reaction dominated cases for the reactive scalars. Reactive scalar means and the mean reaction rate are bounded by frozen and equilibrium limits but this is not always the case for the reactant variances and covariances. The plume reactant statistics are closer to the equilibrium limit than those for the ambient reactant. The covariance term in the mean reaction rate is found to be negative and significant for all measurements made. The Toor closure was found to overestimate the mean reaction rate by 15 to 65%. Gradient model turbulent diffusivities had significant scatter and were not observed to be affected by reaction. The ratio of turbulent diffusivities for the conserved scalar mean and that for the r.m.s. was found to be approximately 1. Estimates of the ratio of the dissipation timescales of around 2 were found downstream. Estimates of the correlation coefficient between the conserved scalar and its dissipation (parallel to the mean flow) were found to be between 0.25 and the significant value of 0.5. Scalar dissipations for non-reactive and reactive scalars were found to be significantly different. Conditional statistics are found to be a useful way of investigating the reactive behaviour of the plume, effectively decoupling the interaction of chemical reaction and turbulent mixing. It is found that conditional reactive scalar means lack significant transverse dependence as has previously been found theoretically by Klimenko (1995). It is also found that conditional variance around the conditional reactive scalar means is relatively small, simplifying the closure for the conditional reaction rate. These properties are important for the Conditional Moment Closure (CMC) model for turbulent reacting flows recently proposed by Klimenko (1990) and Bilger (1993). Preliminary CMC model calculations are carried out for this flow using a simple model for the conditional scalar dissipation. Model predictions and measured conditional reactive scalar means compare favorably. The reaction dominated limit is found to indicate the maximum reactedness of a reactive scalar and is a limiting case of the CMC model. Conventional (unconditional) reactive scalar means obtained from the preliminary CMC predictions using the conserved scalar p.d.f. compare favorably with those found from experiment except where measuring position is relatively far upstream of the stoichiometric distance. Recommendations include applying a full CMC model to the flow and investigations both of the less significant terms in the conditional mean species equation and the small variation of the conditional mean with radius. Forms for the p.d.f.s, in addition to those found from experiments, could be useful for extending the CMC model to reactive flows in the atmosphere.
Resumo:
This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.