38 resultados para Principle Component Analysis (PCA)

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we use the quantum Jensen-Shannon divergence as a means to establish the similarity between a pair of graphs and to develop a novel graph kernel. In quantum theory, the quantum Jensen-Shannon divergence is defined as a distance measure between quantum states. In order to compute the quantum Jensen-Shannon divergence between a pair of graphs, we first need to associate a density operator with each of them. Hence, we decide to simulate the evolution of a continuous-time quantum walk on each graph and we propose a way to associate a suitable quantum state with it. With the density operator of this quantum state to hand, the graph kernel is defined as a function of the quantum Jensen-Shannon divergence between the graph density operators. We evaluate the performance of our kernel on several standard graph datasets from bioinformatics. We use the Principle Component Analysis (PCA) on the kernel matrix to embed the graphs into a feature space for classification. The experimental results demonstrate the effectiveness of the proposed approach. © 2013 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Statnotes 24 and 25, multiple linear regression, a statistical method that examines the relationship between a single dependent variable (Y) and two or more independent variables (X), was described. The principle objective of such an analysis was to determine which of the X variables had a significant influence on Y and to construct an equation that predicts Y from the X variables. ‘Principal components analysis’ (PCA) and ‘factor analysis’ (FA) are also methods of examining the relationships between different variables but they differ from multiple regression in that no distinction is made between the dependent and independent variables, all variables being essentially treated the same. Originally, PCA and FA were regarded as distinct methods but in recent times they have been combined into a single analysis, PCA often being the first stage of a FA. The basic objective of a PCA/FA is to examine the relationships between the variables or the ‘structure’ of the variables and to determine whether these relationships can be explained by a smaller number of ‘factors’. This statnote describes the use of PCA/FA in the analysis of the differences between the DNA profiles of different MRSA strains introduced in Statnote 26.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rhizome of cassava plants (Manihot esculenta Crantz) was catalytically pyrolysed at 500 °C using analytical pyrolysis–gas chromatography/mass spectrometry (Py–GC/MS) method in order to investigate the relative effect of various catalysts on pyrolysis products. Selected catalysts expected to affect bio-oil properties were used in this study. These include zeolites and related materials (ZSM-5, Al-MCM-41 and Al-MSU-F type), metal oxides (zinc oxide, zirconium (IV) oxide, cerium (IV) oxide and copper chromite) catalysts, proprietary commercial catalysts (Criterion-534 and alumina-stabilised ceria-MI-575) and natural catalysts (slate, char and ashes derived from char and biomass). The pyrolysis product distributions were monitored using models in principal components analysis (PCA) technique. The results showed that the zeolites, proprietary commercial catalysts, copper chromite and biomass-derived ash were selective to the reduction of most oxygenated lignin derivatives. The use of ZSM-5, Criterion-534 and Al-MSU-F catalysts enhanced the formation of aromatic hydrocarbons and phenols. No single catalyst was found to selectively reduce all carbonyl products. Instead, most of the carbonyl compounds containing hydroxyl group were reduced by zeolite and related materials, proprietary catalysts and copper chromite. The PCA model for carboxylic acids showed that zeolite ZSM-5 and Al-MSU-F tend to produce significant amounts of acetic and formic acids.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new principled domain independent watermarking framework is presented. The new approach is based on embedding the message in statistically independent sources of the covertext to mimimise covertext distortion, maximise the information embedding rate and improve the method's robustness against various attacks. Experiments comparing the performance of the new approach, on several standard attacks show the current proposed approach to be competitive with other state of the art domain-specific methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A novel approach to watermarking of audio signals using Independent Component Analysis (ICA) is proposed. It exploits the statistical independence of components obtained by practical ICA algorithms to provide a robust watermarking scheme with high information rate and low distortion. Numerical simulations have been performed on audio signals, showing good robustness of the watermark against common attacks with unnoticeable distortion, even for high information rates. An important aspect of the method is its domain independence: it can be used to hide information in other types of data, with minor technical adaptations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal components analysis (PCA) has been described for over 50 years; however, it is rarely applied to the analysis of epidemiological data. In this study PCA was critically appraised in its ability to reveal relationships between pulsed-field gel electrophoresis (PFGE) profiles of methicillin- resistant Staphylococcus aureus (MRSA) in comparison to the more commonly employed cluster analysis and representation by dendrograms. The PFGE type following SmaI chromosomal digest was determined for 44 multidrug-resistant hospital-acquired methicillin-resistant S. aureus (MR-HA-MRSA) isolates, two multidrug-resistant community-acquired MRSA (MR-CA-MRSA), 50 hospital-acquired MRSA (HA-MRSA) isolates (from the University Hospital Birmingham, NHS Trust, UK) and 34 community-acquired MRSA (CA-MRSA) isolates (from general practitioners in Birmingham, UK). Strain relatedness was determined using Dice band-matching with UPGMA clustering and PCA. The results indicated that PCA revealed relationships between MRSA strains, which were more strongly correlated with known epidemiology, most likely because, unlike cluster analysis, PCA does not have the constraint of generating a hierarchic classification. In addition, PCA provides the opportunity for further analysis to identify key polymorphic bands within complex genotypic profiles, which is not always possible with dendrograms. Here we provide a detailed description of a PCA method for the analysis of PFGE profiles to complement further the epidemiological study of infectious disease. © 2005 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ten cases of neuronal intermediate filament inclusion disease (NIFID) were studied quantitatively. The α-internexin positive neurofilament inclusions (NI) were most abundant in the motor cortex and CA sectors of the hippocampus. The densities of the NI and the swollen achromatic neurons (SN) were similar in laminae II/III and V/VI but glial cell density was greater in V/VI. The density of the NI was positively correlated with the SN and the glial cells. Principal components analysis (PCA) suggested that PC1 was associated with variation in neuronal loss in the frontal/temporal lobes and PC2 with neuronal loss in the frontal lobe and NI density in the parahippocampal gyrus. The data suggest: 1) frontal and temporal lobe degeneration in NIFID is associated with the widespread formation of NI and SN, 2) NI and SN affect cortical laminae II/III and V/VI, 3) the NI and SN affect closely related neuronal populations, and 4) variations in neuronal loss and in the density of NI were the most important sources of pathological heterogeneity. © Springer-Verlag 2005.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PCA/FA is a method of analyzing complex data sets in which there are no clearly defined X or Y variables. It has multiple uses including the study of the pattern of variation between individual entities such as patients with particular disorders and the detailed study of descriptive variables. In most applications, variables are related to a smaller number of ‘factors’ or PCs that account for the maximum variance in the data and hence, may explain important trends among the variables. An increasingly important application of the method is in the ‘validation’ of questionnaires that attempt to relate subjective aspects of a patients experience with more objective measures of vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using a Markov switching unobserved component model we decompose the term premium of the North American CDX index into a permanent and a stationary component. We establish that the inversion of the CDX term premium is induced by sudden changes in the unobserved stationary component, which represents the evolution of the fundamentals underpinning the probability of default in the economy. We find evidence that the monetary policy response from the Fed during the crisis period was effective in reducing the volatility of the term premium. We also show that equity returns make a substantial contribution to the term premium over the entire sample period.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.