242 resultados para Correlation matrix
em Queensland University of Technology - ePrints Archive
Resumo:
The generation of a correlation matrix from a large set of long gene sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. The generation is not only computationally intensive but also requires significant memory resources as, typically, few gene sequences can be simultaneously stored in primary memory. The standard practice in such computation is to use frequent input/output (I/O) operations. Therefore, minimizing the number of these operations will yield much faster run-times. This paper develops an approach for the faster and scalable computing of large-size correlation matrices through the full use of available memory and a reduced number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different problems with different correlation matrix sizes. The significant performance improvement of the approach over the existing approaches is demonstrated through benchmark examples.
Resumo:
The generation of a correlation matrix for set of genomic sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. Each sequence may be millions of bases long and there may be thousands of such sequences which we wish to compare, so not all sequences may fit into main memory at the same time. Each sequence needs to be compared with every other sequence, so we will generally need to page some sequences in and out more than once. In order to minimize execution time we need to minimize this I/O. This paper develops an approach for faster and scalable computing of large-size correlation matrices through the maximal exploitation of available memory and reducing the number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different bioinformatics problems with different correlation matrix sizes. The significant performance improvement of the approach over previous work is demonstrated through benchmark examples.
Resumo:
Objective To discuss generalized estimating equations as an extension of generalized linear models by commenting on the paper of Ziegler and Vens "Generalized Estimating Equations. Notes on the Choice of the Working Correlation Matrix". Methods Inviting an international group of experts to comment on this paper. Results Several perspectives have been taken by the discussants. Econometricians have established parallels to the generalized method of moments (GMM). Statisticians discussed model assumptions and the aspect of missing data Applied statisticians; commented on practical aspects in data analysis. Conclusions In general, careful modeling correlation is encouraged when considering estimation efficiency and other implications, and a comparison of choosing instruments in GMM and generalized estimating equations, (GEE) would be worthwhile. Some theoretical drawbacks of GEE need to be further addressed and require careful analysis of data This particularly applies to the situation when data are missing at random.
Resumo:
Genetic correlation (rg) analysis determines how much of the correlation between two measures is due to common genetic influences. In an analysis of 4 Tesla diffusion tensor images (DTI) from 531 healthy young adult twins and their siblings, we generalized the concept of genetic correlation to determine common genetic influences on white matter integrity, measured by fractional anisotropy (FA), at all points of the brain, yielding an NxN genetic correlation matrix rg(x,y) between FA values at all pairs of voxels in the brain. With hierarchical clustering, we identified brain regions with relatively homogeneous genetic determinants, to boost the power to identify causal single nucleotide polymorphisms (SNP). We applied genome-wide association (GWA) to assess associations between 529,497 SNPs and FA in clusters defined by hubs of the clustered genetic correlation matrix. We identified a network of genes, with a scale-free topology, that influences white matter integrity over multiple brain regions.
Resumo:
The method of generalised estimating equations for regression modelling of clustered outcomes allows for specification of a working matrix that is intended to approximate the true correlation matrix of the observations. We investigate the asymptotic relative efficiency of the generalised estimating equation for the mean parameters when the correlation parameters are estimated by various methods. The asymptotic relative efficiency depends on three-features of the analysis, namely (i) the discrepancy between the working correlation structure and the unobservable true correlation structure, (ii) the method by which the correlation parameters are estimated and (iii) the 'design', by which we refer to both the structures of the predictor matrices within clusters and distribution of cluster sizes. Analytical and numerical studies of realistic data-analysis scenarios show that choice of working covariance model has a substantial impact on regression estimator efficiency. Protection against avoidable loss of efficiency associated with covariance misspecification is obtained when a 'Gaussian estimation' pseudolikelihood procedure is used with an AR(1) structure.
Resumo:
This paper investigates the business cycle co-movement across countries and regions since 1950 as a measure for quantifying the economic interdependence in the ongoing globalisation process. Our methodological approach is based on analysis of a correlation matrix and the networks it contains. Such an approach summarises the interaction and interdependence of all elements, and it represents a more accurate measure of the global interdependence involved in an economic system. Our results show (1) the dynamics of interdependence has been driven more by synchronisation in regional growth patterns than by the synchronisation of the world economy, and (2) world crisis periods dramatically increase the global co-movement in the world economy.
Resumo:
Lean strategies have been developed to eliminate or reduce waste and thus improve operational efficiency in a manufacturing environment. However, in practice, manufacturers encounter difficulties to select appropriate lean strategies within their resource constraints and to quantitatively evaluate the perceived value of manufacturing waste reduction. This paper presents a methodology developed to quantitatively evaluate the contribution of lean strategies selected to reduce manufacturing wastes within the manufacturers’ resource (time) constraints. A mathematical model has been developed for evaluating the perceived value of lean strategies to manufacturing waste reduction and a step-by-step methodology is provided for selecting appropriate lean strategies to improve the manufacturing performance within their resource constraints. A computer program is developed in MATLAB for finding the optimum solution. With the help of a case study, the proposed methodology and developed model has been validated. A ‘lean strategy-wastes’ correlation matrix has been proposed to establish the relationship between the manufacturing wastes and lean strategies. Using the correlation matrix and applying the proposed methodology and developed mathematical model, authors came out with optimised perceived value of reduction of a manufacturer's wastes by implementing appropriate lean strategies within a manufacturer's resources constraints. Results also demonstrate that the perceived value of reduction of manufacturing wastes can significantly be changed based on policies and product strategy taken by a manufacturer. The proposed methodology can also be used in dynamic situations by changing the input in the programme developed in MATLAB. By identifying appropriate lean strategies for specific manufacturing wastes, a manufacturer can better prioritise implementation efforts and resources to maximise the success of implementing lean strategies in their organisation.
Resumo:
This research aimed to develop a framework for performance evaluation of public hospitals in Vietnam that is culturally, socially, and politically appropriate. The research included both qualitative and quantitative methods and identified and validated novel instruments to measure patient satisfaction and job satisfaction of hospital staff and to determine a set of hospital indicators that reflect the quality of hospital performance. New models for understanding the determinants of patient and staff satisfaction were developed along with a new performance indicator framework for hospital performance. These instruments will now be applied to the evaluation of hospital services in Khanh Hoa Province, permitting longer term evaluation of their effectiveness in changing system wide performance and satisfaction.
Resumo:
This article analyses co-movements in a wide group of commodity prices during the time period 1992–2010. Our methodological approach is based on the correlation matrix and the networks inside. Through this approach we are able to summarize global interaction and interdependence, capturing the existing heterogeneity in the degrees of synchronization between commodity prices. Our results produce two main findings: (a) we do not observe a persistent increase in the degree of co-movement of the commodity prices in our time sample, however from mid-2008 to the end of 2009 co-movements almost doubled when compared with the average correlation; (b) we observe three groups of commodities which have exhibited similar price dynamics (metals, oil and grains, and oilseeds) and which have increased their degree of co-movement during the sampled period.
Resumo:
To enhance the efficiency of regression parameter estimation by modeling the correlation structure of correlated binary error terms in quantile regression with repeated measurements, we propose a Gaussian pseudolikelihood approach for estimating correlation parameters and selecting the most appropriate working correlation matrix simultaneously. The induced smoothing method is applied to estimate the covariance of the regression parameter estimates, which can bypass density estimation of the errors. Extensive numerical studies indicate that the proposed method performs well in selecting an accurate correlation structure and improving regression parameter estimation efficiency. The proposed method is further illustrated by analyzing a dental dataset.
Resumo:
The importance of modelling correlation has long been recognised in the field of portfolio management, with largedimensional multivariate problems increasingly becoming the focus of research. This paper provides a straightforward and commonsense approach toward investigating a number of models used to generate forecasts of the correlation matrix for large-dimensional problems.We find evidence in favour of assuming equicorrelation across various portfolio sizes, particularly during times of crisis. During periods of market calm, however, the suitability of the constant conditional correlation model cannot be discounted, especially for large portfolios. A portfolio allocation problem is used to compare forecasting methods. The global minimum variance portfolio and Model Confidence Set are used to compare methods, while portfolio weight stability and relative economic value are also considered.
Resumo:
In this paper we analyse two variants of SIMON family of light-weight block ciphers against variants of linear cryptanalysis and present the best linear cryptanalytic results on these variants of reduced-round SIMON to date. We propose a time-memory trade-off method that finds differential/linear trails for any permutation allowing low Hamming weight differential/linear trails. Our method combines low Hamming weight trails found by the correlation matrix representing the target permutation with heavy Hamming weight trails found using a Mixed Integer Programming model representing the target differential/linear trail. Our method enables us to find a 17-round linear approximation for SIMON-48 which is the best current linear approximation for SIMON-48. Using only the correlation matrix method, we are able to find a 14-round linear approximation for SIMON-32 which is also the current best linear approximation for SIMON-32. The presented linear approximations allow us to mount a 23-round key recovery attack on SIMON-32 and a 24-round Key recovery attack on SIMON-48/96 which are the current best results on SIMON-32 and SIMON-48. In addition we have an attack on 24 rounds of SIMON-32 with marginal complexity.
Resumo:
We consider the analysis of longitudinal data when the covariance function is modeled by additional parameters to the mean parameters. In general, inconsistent estimators of the covariance (variance/correlation) parameters will be produced when the "working" correlation matrix is misspecified, which may result in great loss of efficiency of the mean parameter estimators (albeit the consistency is preserved). We consider using different "Working" correlation models for the variance and the mean parameters. In particular, we find that an independence working model should be used for estimating the variance parameters to ensure their consistency in case the correlation structure is misspecified. The designated "working" correlation matrices should be used for estimating the mean and the correlation parameters to attain high efficiency for estimating the mean parameters. Simulation studies indicate that the proposed algorithm performs very well. We also applied different estimation procedures to a data set from a clinical trial for illustration.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.
Resumo:
The previous investigations have shown that the modal strain energy correlation method, MSEC, could successfully identify the damage of truss bridge structures. However, it has to incorporate the sensitivity matrix to estimate damage and is not reliable in certain damage detection cases. This paper presents an improved MSEC method where the prediction of modal strain energy change vector is differently obtained by running the eigensolutions on-line in optimisation iterations. The particular trail damage treatment group maximising the fitness function close to unity is identified as the detected damage location. This improvement is then compared with the original MSEC method along with other typical correlation-based methods on the finite element model of a simple truss bridge. The contributions to damage detection accuracy of each considered mode is also weighed and discussed. The iterative searching process is operated by using genetic algorithm. The results demonstrate that the improved MSEC method suffices the demand in detecting the damage of truss bridge structures, even when noised measurement is considered.