15 resultados para two-dimensional principal component analysis (2DPCA)
em Aston University Research Archive
Resumo:
In this paper, we first present a simple but effective L1-norm-based two-dimensional principal component analysis (2DPCA). Traditional L2-norm-based least squares criterion is sensitive to outliers, while the newly proposed L1-norm 2DPCA is robust. Experimental results demonstrate its advantages. © 2006 IEEE.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
Rhizome of cassava plants (Manihot esculenta Crantz) was catalytically pyrolysed at 500 °C using analytical pyrolysis–gas chromatography/mass spectrometry (Py–GC/MS) method in order to investigate the relative effect of various catalysts on pyrolysis products. Selected catalysts expected to affect bio-oil properties were used in this study. These include zeolites and related materials (ZSM-5, Al-MCM-41 and Al-MSU-F type), metal oxides (zinc oxide, zirconium (IV) oxide, cerium (IV) oxide and copper chromite) catalysts, proprietary commercial catalysts (Criterion-534 and alumina-stabilised ceria-MI-575) and natural catalysts (slate, char and ashes derived from char and biomass). The pyrolysis product distributions were monitored using models in principal components analysis (PCA) technique. The results showed that the zeolites, proprietary commercial catalysts, copper chromite and biomass-derived ash were selective to the reduction of most oxygenated lignin derivatives. The use of ZSM-5, Criterion-534 and Al-MSU-F catalysts enhanced the formation of aromatic hydrocarbons and phenols. No single catalyst was found to selectively reduce all carbonyl products. Instead, most of the carbonyl compounds containing hydroxyl group were reduced by zeolite and related materials, proprietary catalysts and copper chromite. The PCA model for carboxylic acids showed that zeolite ZSM-5 and Al-MSU-F tend to produce significant amounts of acetic and formic acids.
Resumo:
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.
Resumo:
The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.
Resumo:
This thesis describes the Generative Topographic Mapping (GTM) --- a non-linear latent variable model, intended for modelling continuous, intrinsically low-dimensional probability distributions, embedded in high-dimensional spaces. It can be seen as a non-linear form of principal component analysis or factor analysis. It also provides a principled alternative to the self-organizing map --- a widely established neural network model for unsupervised learning --- resolving many of its associated theoretical problems. An important, potential application of the GTM is visualization of high-dimensional data. Since the GTM is non-linear, the relationship between data and its visual representation may be far from trivial, but a better understanding of this relationship can be gained by computing the so-called magnification factor. In essence, the magnification factor relates the distances between data points, as they appear when visualized, to the actual distances between those data points. There are two principal limitations of the basic GTM model. The computational effort required will grow exponentially with the intrinsic dimensionality of the density model. However, if the intended application is visualization, this will typically not be a problem. The other limitation is the inherent structure of the GTM, which makes it most suitable for modelling moderately curved probability distributions of approximately rectangular shape. When the target distribution is very different to that, theaim of maintaining an `interpretable' structure, suitable for visualizing data, may come in conflict with the aim of providing a good density model. The fact that the GTM is a probabilistic model means that results from probability theory and statistics can be used to address problems such as model complexity. Furthermore, this framework provides solid ground for extending the GTM to wider contexts than that of this thesis.
Resumo:
Principal components analysis (PCA) has been described for over 50 years; however, it is rarely applied to the analysis of epidemiological data. In this study PCA was critically appraised in its ability to reveal relationships between pulsed-field gel electrophoresis (PFGE) profiles of methicillin- resistant Staphylococcus aureus (MRSA) in comparison to the more commonly employed cluster analysis and representation by dendrograms. The PFGE type following SmaI chromosomal digest was determined for 44 multidrug-resistant hospital-acquired methicillin-resistant S. aureus (MR-HA-MRSA) isolates, two multidrug-resistant community-acquired MRSA (MR-CA-MRSA), 50 hospital-acquired MRSA (HA-MRSA) isolates (from the University Hospital Birmingham, NHS Trust, UK) and 34 community-acquired MRSA (CA-MRSA) isolates (from general practitioners in Birmingham, UK). Strain relatedness was determined using Dice band-matching with UPGMA clustering and PCA. The results indicated that PCA revealed relationships between MRSA strains, which were more strongly correlated with known epidemiology, most likely because, unlike cluster analysis, PCA does not have the constraint of generating a hierarchic classification. In addition, PCA provides the opportunity for further analysis to identify key polymorphic bands within complex genotypic profiles, which is not always possible with dendrograms. Here we provide a detailed description of a PCA method for the analysis of PFGE profiles to complement further the epidemiological study of infectious disease. © 2005 Elsevier B.V. All rights reserved.
Resumo:
In Statnotes 24 and 25, multiple linear regression, a statistical method that examines the relationship between a single dependent variable (Y) and two or more independent variables (X), was described. The principle objective of such an analysis was to determine which of the X variables had a significant influence on Y and to construct an equation that predicts Y from the X variables. ‘Principal components analysis’ (PCA) and ‘factor analysis’ (FA) are also methods of examining the relationships between different variables but they differ from multiple regression in that no distinction is made between the dependent and independent variables, all variables being essentially treated the same. Originally, PCA and FA were regarded as distinct methods but in recent times they have been combined into a single analysis, PCA often being the first stage of a FA. The basic objective of a PCA/FA is to examine the relationships between the variables or the ‘structure’ of the variables and to determine whether these relationships can be explained by a smaller number of ‘factors’. This statnote describes the use of PCA/FA in the analysis of the differences between the DNA profiles of different MRSA strains introduced in Statnote 26.
Resumo:
A Principal Components Analysis of neuropathological data from 79 Alzheimer’s disease (AD) cases was performed to determine whether there was evidence for subtypes of the disease. Two principal components were extracted from the data which accounted for 72% and 12% of the total variance respectively. The results suggested that 1) AD was heterogeneous but subtypes could not be clearly defined; 2) the heterogeneity, in part, reflected disease onset; 3) familial cases did not constitute a distinct subtype of AD and 4) there were two forms of late onset AD, one of which was associated with less senile plaque and neurofibrillary tangle development but with a greater degree of brain atherosclerosis.
Resumo:
A principal components analysis was carried out on neuropathological data collected from 79 cases of Alzheimer's disease (AD) diagnosed in a single centre. The purpose of the study was to determine whether on neuropathological criteria there was evidence for clearly defined subtypes of the disease. Two principal components (PC1 and PC2) were extracted from the data. PC1 was considerable more important than PC2 accounting for 72% of the total variance. When plotted in relation to the first two principal components the majority of cases (65/79) were distributed in a single cluster within which subgroupings were not clearly evident. In addition, there were a number of individual, mainly early-onset cases, which were neither related to each other nor to the main cluster. The distribution of each neuropathological feature was examined in relation to PC1 and 2, Disease onset, rhe degree of gross brain atrophy, neuronal loss and the devlopment of senile plaques (SP) and neurofibrillary tangles (NFT) were negatively correlated with PC1. The devlopment of SP and NFT and the degree of brain athersclerosis were positively correlated with PC2. These results suggested: 1) that there were different forms of AD but no clear division of the cases into subclasses could be made based on the neuropathological criteria used; the cases showing a more continuous distribution from one form to another, 2) that disease onset was an important variable and was associated with a greater development of pathological changes, 3) familial cases were not a distinct subclass of AD; the cases being widely distributed in relation to PC1 and PC2 and 4) that there may be two forms of late-onset AD whic grade into each other, one of which was associated with less SP and NFT development but with a greater degree of brain atherosclerosis.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
Resumo:
Potential applications of high-damping and high-stiffness composites have motivated extensive research on the effects of negative-stiffness inclusions on the overall properties of composites. Recent theoretical advances have been based on the Hashin-Shtrikman composite models, one-dimensional discrete viscoelastic systems and a two-dimensional nested triangular viscoelastic network. In this paper, we further analyze the two-dimensional triangular structure containing pre-selected negative-stiffness components to study its underlying deformation mechanisms and stability. Major new findings are structure-deformation evolution with respect to the magnitude of negative stiffness under shear loading and the phenomena related to dissipation-induced destabilization and inertia-induced stabilization, according to Lyapunov stability analysis. The evolution shows strong correlations between stiffness anomalies and deformation modes. Our stability results reveal that stable damping peaks, i.e. stably extreme effective damping properties, are achievable under hydrostatic loading when the inertia is greater than a critical value. Moreover, destabilization induced by elemental damping is observed with the critical inertia. Regardless of elemental damping, when the inertia is less than the critical value, a weaker system instability is identified.
Resumo:
Long-lived light bullets fully localized in both space and time can be generated in novel photonic media such as multicore optical fiber or waveguide arrays. In this paper we present detailed theoretical analysis on the existence and stability of the discrete-continuous light bullets using a very generic model that occurs in a number of applications.
Resumo:
An inverse turbulent cascade in a restricted two-dimensional periodic domain creates a condensate—a pair of coherent system-size vortices. We perform extensive numerical simulations of this system and carry out theoretical analysis based on momentum and energy exchanges between the turbulence and the vortices. We show that the vortices have a universal internal structure independent of the type of small-scale dissipation, small-scale forcing, and boundary conditions. The theory predicts not only the vortex inner region profile, but also the amplitude, which both perfectly agree with the numerical data.