37 resultados para principal component analysis (PCA)

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rhizome of cassava plants (Manihot esculenta Crantz) was catalytically pyrolysed at 500 °C using analytical pyrolysis–gas chromatography/mass spectrometry (Py–GC/MS) method in order to investigate the relative effect of various catalysts on pyrolysis products. Selected catalysts expected to affect bio-oil properties were used in this study. These include zeolites and related materials (ZSM-5, Al-MCM-41 and Al-MSU-F type), metal oxides (zinc oxide, zirconium (IV) oxide, cerium (IV) oxide and copper chromite) catalysts, proprietary commercial catalysts (Criterion-534 and alumina-stabilised ceria-MI-575) and natural catalysts (slate, char and ashes derived from char and biomass). The pyrolysis product distributions were monitored using models in principal components analysis (PCA) technique. The results showed that the zeolites, proprietary commercial catalysts, copper chromite and biomass-derived ash were selective to the reduction of most oxygenated lignin derivatives. The use of ZSM-5, Criterion-534 and Al-MSU-F catalysts enhanced the formation of aromatic hydrocarbons and phenols. No single catalyst was found to selectively reduce all carbonyl products. Instead, most of the carbonyl compounds containing hydroxyl group were reduced by zeolite and related materials, proprietary catalysts and copper chromite. The PCA model for carboxylic acids showed that zeolite ZSM-5 and Al-MSU-F tend to produce significant amounts of acetic and formic acids.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal components analysis (PCA) has been described for over 50 years; however, it is rarely applied to the analysis of epidemiological data. In this study PCA was critically appraised in its ability to reveal relationships between pulsed-field gel electrophoresis (PFGE) profiles of methicillin- resistant Staphylococcus aureus (MRSA) in comparison to the more commonly employed cluster analysis and representation by dendrograms. The PFGE type following SmaI chromosomal digest was determined for 44 multidrug-resistant hospital-acquired methicillin-resistant S. aureus (MR-HA-MRSA) isolates, two multidrug-resistant community-acquired MRSA (MR-CA-MRSA), 50 hospital-acquired MRSA (HA-MRSA) isolates (from the University Hospital Birmingham, NHS Trust, UK) and 34 community-acquired MRSA (CA-MRSA) isolates (from general practitioners in Birmingham, UK). Strain relatedness was determined using Dice band-matching with UPGMA clustering and PCA. The results indicated that PCA revealed relationships between MRSA strains, which were more strongly correlated with known epidemiology, most likely because, unlike cluster analysis, PCA does not have the constraint of generating a hierarchic classification. In addition, PCA provides the opportunity for further analysis to identify key polymorphic bands within complex genotypic profiles, which is not always possible with dendrograms. Here we provide a detailed description of a PCA method for the analysis of PFGE profiles to complement further the epidemiological study of infectious disease. © 2005 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ten cases of neuronal intermediate filament inclusion disease (NIFID) were studied quantitatively. The α-internexin positive neurofilament inclusions (NI) were most abundant in the motor cortex and CA sectors of the hippocampus. The densities of the NI and the swollen achromatic neurons (SN) were similar in laminae II/III and V/VI but glial cell density was greater in V/VI. The density of the NI was positively correlated with the SN and the glial cells. Principal components analysis (PCA) suggested that PC1 was associated with variation in neuronal loss in the frontal/temporal lobes and PC2 with neuronal loss in the frontal lobe and NI density in the parahippocampal gyrus. The data suggest: 1) frontal and temporal lobe degeneration in NIFID is associated with the widespread formation of NI and SN, 2) NI and SN affect cortical laminae II/III and V/VI, 3) the NI and SN affect closely related neuronal populations, and 4) variations in neuronal loss and in the density of NI were the most important sources of pathological heterogeneity. © Springer-Verlag 2005.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Statnotes 24 and 25, multiple linear regression, a statistical method that examines the relationship between a single dependent variable (Y) and two or more independent variables (X), was described. The principle objective of such an analysis was to determine which of the X variables had a significant influence on Y and to construct an equation that predicts Y from the X variables. ‘Principal components analysis’ (PCA) and ‘factor analysis’ (FA) are also methods of examining the relationships between different variables but they differ from multiple regression in that no distinction is made between the dependent and independent variables, all variables being essentially treated the same. Originally, PCA and FA were regarded as distinct methods but in recent times they have been combined into a single analysis, PCA often being the first stage of a FA. The basic objective of a PCA/FA is to examine the relationships between the variables or the ‘structure’ of the variables and to determine whether these relationships can be explained by a smaller number of ‘factors’. This statnote describes the use of PCA/FA in the analysis of the differences between the DNA profiles of different MRSA strains introduced in Statnote 26.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PCA/FA is a method of analyzing complex data sets in which there are no clearly defined X or Y variables. It has multiple uses including the study of the pattern of variation between individual entities such as patients with particular disorders and the detailed study of descriptive variables. In most applications, variables are related to a smaller number of ‘factors’ or PCs that account for the maximum variance in the data and hence, may explain important trends among the variables. An increasingly important application of the method is in the ‘validation’ of questionnaires that attempt to relate subjective aspects of a patients experience with more objective measures of vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Three hypotheses have been proposed to explain neuropathological heterogeneity in Alzheimer's disease (AD): the presence of distinct subtypes ('subtype hypothesis'), variation in the stage of the disease ('phase hypothesis') and variation in the origin and progression of the disease ('compensation hypothesis'). To test these hypotheses, variation in the distribution and severity of senile plaques (SP) and neurofibrillary tangles (NFT) was studied in 80 cases of AD using principal components analysis (PCA). Principal components analysis using the cases as variables (Q-type analysis) suggested that individual differences between patients were continuously distributed rather than the cases being clustered into distinct subtypes. In addition, PCA using the abundances of SP and NFT as variables (R-type analysis) suggested that variations in the presence and abundance of lesions in the frontal and occipital lobes, the cingulate gyrus and the posterior parahippocampal gyrus were the most important sources of heterogeneity consistent with the presence of different stages of the disease. In addition, in a subgroup of patients, individual differences were related to apolipoprotein E (ApoE) genotype, the presence and severity of SP in the frontal and occipital cortex being significantly increased in patients expressing apolipoprotein (Apo)E allele ε4. It was concluded that some of the neuropathological heterogeneity in our AD cases may be consistent with the 'phase hypothesis'. A major factor determining this variation in late-onset cases was ApoE genotype with accelerated rates of spread of the pathology in patients expressing allele ε4.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aeromonas genomes were investigated by restriction digesting chromosomal DNA with the endonuclease XbaI, separation of restriction fragments by pulsed field gel electrophoresis (PFGE) and principal components analysis (PCA) of resulting separation patterns. A. salmonicida salmonicida were unique amongst the isolates investigated. Separation profiles of these isolates were similar and all characterised by a distinct absence of bands in the 250kb region. Principal components analysis represented these strains as a clearly defined homogeneous group separated by insignificant Euclidian distances. However, A. salmonicida achromogenes isolates in common with those of A. hydrophila and A. sobria were shown by principal components analysis to be more heterogeneous in nature. Fragments from these isolates were more uniform in size distribution but as demonstrated by the Euclidian distances attained through PCA potentially characteristic of each strain. Furthermore passaging of Aeromonas isolates through an appropriate host did not greatly modify fragment separation profiles, indicative of the genomic stability of test aeromonads and the potential of restriction digesting/PFGE/PCA in Aeromonas typing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Principal Components Analysis (PCA) was carried out on the density of lesions revealed by different stains in a total of 47 brain regions from six elderly patients with Alzheimer’s disease (AD). The aim was to determine the relationships between the density of senile plaques (SP) revealed by the Glees and Gallyas stains and A4 deposits and between the plaques and neurofibrillary tangles (NFT) in the same brain region. The analysis indicated that the populations of plaques revealed by the Glees and Gallyas stains were closely related to the A4 protein deposits but none of the lesions were related to NFT. The data suggest: 1) that neocortical regions differ from the hippocampus in the relative development of A4 and NFT; the former having more A4 deposits and the latter more NFT and 2) that the processes that lead to the formation of SP and NFT occur independently of each other in the same brain region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Studies suggest that frontotemporal lobar degeneration with transactive response (TAR) DNA-binding protein of 43kDa (TDP-43) proteinopathy (FTLD-TDP) is heterogeneous with division into four or five subtypes. To determine the degree of heterogeneity and the validity of the subtypes, we studied neuropathological variation within the frontal and temporal lobes of 94 cases of FTLD-TDP using quantitative estimates of density and principal components analysis (PCA). A PCA based on the density of TDP-43 immunoreactive neuronal cytoplasmic inclusions (NCI), oligodendroglial inclusions (GI), neuronal intranuclear inclusions (NII), and dystrophic neurites (DN), surviving neurons, enlarged neurons (EN), and vacuolation suggested that cases were not segregated into distinct subtypes. Variation in the density of the vacuoles was the greatest source of variation between cases. A PCA based on TDP-43 pathology alone suggested that cases of FTLD-TDP with progranulin (GRN) mutation segregated to some degree. The pathological phenotype of all four subtypes overlapped but subtypes 1 and 4 were the most distinctive. Cases with coexisting motor neuron disease (MND) or hippocampal sclerosis (HS) also appeared to segregate to some extent. We suggest: 1) pathological variation in FTLD-TDP is best described as a ‘continuum’ without clearly distinct subtypes, 2) vacuolation was the single greatest source of variation and reflects the ‘stage’ of the disease, and 3) within the FTLD-TDP ‘continuum’ cases with GRN mutation and with coexisting MND or HS may have a more distinctive pathology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The densities of diffuse, primitive, and classic ß-amyloid (Aß) deposits were studied in the temporal lobe in cognitively normal brain, dementia with Lewy bodies (DLB), familial Alzheimer’s disease (FAD), and sporadic AD (SAD). Principal components analysis (PCA) was used to determine whether there were distinct differences between groups or whether Aß pathology was more continuously distributed from group to group. Three principal components (PC) were extracted from the data accounting for 56% of the total variance. Plots of cases in relation to the PC did not result in distinct groups but suggested overlap in Aß deposition between the groups. In addition, there were linear correlations between the densities of Aß deposits and the distribution of the cases along the PC in specific brain regions suggesting continuous variation from group to group. PC1 was associated with the degree of maturation of Aß deposits, PC2 with differences between FAD and SAD, and PC3 with the degree of spread of Aß pathology into the hippocampus. Apolipoprotein E (APOE) genotype was not associated with variation in Aß deposition between cases. PCA may be a useful method of studying the pathological interface between closely related neurodegenerative disorders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.