935 resultados para principal components analysis
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
Molecular interactions between microcrystalline cellulose (MCC) and water were investigated by attenuated total reflection infrared (ATR/IR) spectroscopy. Moisture-content-dependent IR spectra during a drying process of wet MCC were measured. In order to distinguish overlapping O–H stretching bands arising from both cellulose and water, principal component analysis (PCA) and, generalized two-dimensional correlation spectroscopy (2DCOS) and second derivative analysis were applied to the obtained spectra. Four typical drying stages were clearly separated by PCA, and spectral variations in each stage were analyzed by 2DCOS. In the drying time range of 0–41 min, a decrease in the broad band around 3390 cm−1 was observed, indicating that bulk water was evaporated. In the drying time range of 49–195 min, decreases in the bands at 3412, 3344 and 3286 cm−1 assigned to the O6H6cdots, three dots, centeredO3′ interchain hydrogen bonds (H-bonds), the O3H3cdots, three dots, centeredO5 intrachain H-bonds and the H-bonds in Iβ phase in MCC, respectively, were observed. The result of the second derivative analysis suggests that water molecules mainly interact with the O6H6cdots, three dots, centeredO3′ interchain H-bonds. Thus, the H-bonding network in MCC is stabilized by H-bonds between OH groups constructing O6H6cdots, three dots, centeredO3′ interchain H-bonds and water, and the removal of the water molecules induces changes in the H-bonding network in MCC.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
The use of quantitative methods has become increasingly important in the study of neurodegenerative disease. Disorders such as Alzheimer's disease (AD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This article reviews the advantages and limitations of the different methods of quantifying the abundance of pathological lesions in histological sections, including estimates of density, frequency, coverage, and the use of semiquantitative scores. The major sampling methods by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are also described. In addition, the data analysis methods commonly used to analyse quantitative data in neuropathology, including analyses of variance (ANOVA) and principal components analysis (PCA), are discussed. These methods are illustrated with reference to particular problems in the pathological diagnosis of AD and dementia with Lewy bodies (DLB).
Resumo:
Plasmid constitutions of Aeromonas salmonicida isolates were characterised by flat-bed and pulsed field gel electrophoresis. Resolution of plasmids by pulsed field gel electrophoresis was greater and more consistent than that achieved by flat-bed gel electrophoresis. The number of plasmids separated by pulsed field gel electrophoresis varied between A. salmonicida isolates, with five being the most common number present in the isolates used in this study. Plasmid profiles were diverse and the reproducibility of the distances migrated facilitated the use of principal components analysis for the characterisation of the isolates. Isolates were grouped according to the number of plasmids supported. Further principal components analysis of groups of isolates supporting five and seven plasmids showed a spatial separation of plasmids based upon distance migrated. Principal components analysis of plasmid profiles and antimicrobial minimum inhibitory concentrations could not be correlated suggesting that resistance to antimicrobial agents is not associated with either one plasmid or a particular plasmid constitution.
Resumo:
This book is aimed primarily at microbiologists who are undertaking research and who require a basic knowledge of statistics to analyse their experimental data. Computer software employing a wide range of data analysis methods is widely available to experimental scientists. The availability of this software, however, makes it essential that investigators understand the basic principles of statistics. Statistical analysis of data can be complex with many different methods of approach, each of which applies in a particular experimental circumstance. Hence, it is possible to apply an incorrect statistical method to data and to draw the wrong conclusions from an experiment. The purpose of this book, which has its origin in a series of articles published in the Society for Applied Microbiology journal ‘The Microbiologist’, is an attempt to present the basic logic of statistics as clearly as possible and therefore, to dispel some of the myths that often surround the subject. The 28 ‘Statnotes’ deal with various topics that are likely to be encountered, including the nature of variables, the comparison of means of two or more groups, non-parametric statistics, analysis of variance, correlating variables, and more complex methods such as multiple linear regression and principal components analysis. In each case, the relevant statistical method is illustrated with examples drawn from experiments in microbiological research. The text incorporates a glossary of the most commonly used statistical terms and there are two appendices designed to aid the investigator in the selection of the most appropriate test.
Resumo:
The pattern of correlation between two sets of variables can be tested using canonical variate analysis (CVA). CVA, like principal components analysis (PCA) and factor analysis (FA) (Statnote 27, Hilton & Armstrong, 2011b), is a multivariate analysis Essentially, as in PCA/FA, the objective is to determine whether the correlations between two sets of variables can be explained by a smaller number of ‘axes of correlation’ or ‘canonical roots’.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
Resumo:
The use of quantitative methods has become increasingly important in the study of neuropathology and especially in neurodegenerative disease. Disorders such as Alzheimer's disease (AD) and the frontotemporal dementias (FTD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This chapter reviews the advantages and limitations of the different methods of quantifying pathological lesions in histological sections including estimates of density, frequency, coverage, and the use of semi-quantitative scores. The sampling strategies by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are described. In addition, data analysis methods commonly used to analysis quantitative data in neuropathology, including analysis of variance (ANOVA), polynomial curve fitting, multiple regression, classification trees, and principal components analysis (PCA), are discussed. These methods are illustrated with reference to quantitative studies of a variety of neurodegenerative disorders.
Resumo:
This dissertation establishes a novel data-driven method to identify language network activation patterns in pediatric epilepsy through the use of the Principal Component Analysis (PCA) on functional magnetic resonance imaging (fMRI). A total of 122 subjects’ data sets from five different hospitals were included in the study through a web-based repository site designed here at FIU. Research was conducted to evaluate different classification and clustering techniques in identifying hidden activation patterns and their associations with meaningful clinical variables. The results were assessed through agreement analysis with the conventional methods of lateralization index (LI) and visual rating. What is unique in this approach is the new mechanism designed for projecting language network patterns in the PCA-based decisional space. Synthetic activation maps were randomly generated from real data sets to uniquely establish nonlinear decision functions (NDF) which are then used to classify any new fMRI activation map into typical or atypical. The best nonlinear classifier was obtained on a 4D space with a complexity (nonlinearity) degree of 7. Based on the significant association of language dominance and intensities with the top eigenvectors of the PCA decisional space, a new algorithm was deployed to delineate primary cluster members without intensity normalization. In this case, three distinct activations patterns (groups) were identified (averaged kappa with rating 0.65, with LI 0.76) and were characterized by the regions of: (1) the left inferior frontal Gyrus (IFG) and left superior temporal gyrus (STG), considered typical for the language task; (2) the IFG, left mesial frontal lobe, right cerebellum regions, representing a variant left dominant pattern by higher activation; and (3) the right homologues of the first pattern in Broca's and Wernicke's language areas. Interestingly, group 2 was found to reflect a different language compensation mechanism than reorganization. Its high intensity activation suggests a possible remote effect on the right hemisphere focus on traditionally left-lateralized functions. In retrospect, this data-driven method provides new insights into mechanisms for brain compensation/reorganization and neural plasticity in pediatric epilepsy.
Resumo:
The elemental analysis of soil is useful in forensic and environmental sciences. Methods were developed and optimized for two laser-based multi-element analysis techniques: laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) and laser-induced breakdown spectroscopy (LIBS). This work represents the first use of a 266 nm laser for forensic soil analysis by LIBS. Sample preparation methods were developed and optimized for a variety of sample types, including pellets for large bulk soil specimens (470 mg) and sediment-laden filters (47 mg), and tape-mounting for small transfer evidence specimens (10 mg). Analytical performance for sediment filter pellets and tape-mounted soils was similar to that achieved with bulk pellets. An inter-laboratory comparison exercise was designed to evaluate the performance of the LA-ICP-MS and LIBS methods, as well as for micro X-ray fluorescence (μXRF), across multiple laboratories. Limits of detection (LODs) were 0.01-23 ppm for LA-ICP-MS, 0.25-574 ppm for LIBS, 16-4400 ppm for μXRF, and well below the levels normally seen in soils. Good intra-laboratory precision (≤ 6 % relative standard deviation (RSD) for LA-ICP-MS; ≤ 8 % for μXRF; ≤ 17 % for LIBS) and inter-laboratory precision (≤ 19 % for LA-ICP-MS; ≤ 25 % for μXRF) were achieved for most elements, which is encouraging for a first inter-laboratory exercise. While LIBS generally has higher LODs and RSDs than LA-ICP-MS, both were capable of generating good quality multi-element data sufficient for discrimination purposes. Multivariate methods using principal components analysis (PCA) and linear discriminant analysis (LDA) were developed for discriminations of soils from different sources. Specimens from different sites that were indistinguishable by color alone were discriminated by elemental analysis. Correct classification rates of 94.5 % or better were achieved in a simulated forensic discrimination of three similar sites for both LIBS and LA-ICP-MS. Results for tape-mounted specimens were nearly identical to those achieved with pellets. Methods were tested on soils from USA, Canada and Tanzania. Within-site heterogeneity was site-specific. Elemental differences were greatest for specimens separated by large distances, even within the same lithology. Elemental profiles can be used to discriminate soils from different locations and narrow down locations even when mineralogy is similar.
Resumo:
This dissertation establishes a novel data-driven method to identify language network activation patterns in pediatric epilepsy through the use of the Principal Component Analysis (PCA) on functional magnetic resonance imaging (fMRI). A total of 122 subjects’ data sets from five different hospitals were included in the study through a web-based repository site designed here at FIU. Research was conducted to evaluate different classification and clustering techniques in identifying hidden activation patterns and their associations with meaningful clinical variables. The results were assessed through agreement analysis with the conventional methods of lateralization index (LI) and visual rating. What is unique in this approach is the new mechanism designed for projecting language network patterns in the PCA-based decisional space. Synthetic activation maps were randomly generated from real data sets to uniquely establish nonlinear decision functions (NDF) which are then used to classify any new fMRI activation map into typical or atypical. The best nonlinear classifier was obtained on a 4D space with a complexity (nonlinearity) degree of 7. Based on the significant association of language dominance and intensities with the top eigenvectors of the PCA decisional space, a new algorithm was deployed to delineate primary cluster members without intensity normalization. In this case, three distinct activations patterns (groups) were identified (averaged kappa with rating 0.65, with LI 0.76) and were characterized by the regions of: 1) the left inferior frontal Gyrus (IFG) and left superior temporal gyrus (STG), considered typical for the language task; 2) the IFG, left mesial frontal lobe, right cerebellum regions, representing a variant left dominant pattern by higher activation; and 3) the right homologues of the first pattern in Broca's and Wernicke's language areas. Interestingly, group 2 was found to reflect a different language compensation mechanism than reorganization. Its high intensity activation suggests a possible remote effect on the right hemisphere focus on traditionally left-lateralized functions. In retrospect, this data-driven method provides new insights into mechanisms for brain compensation/reorganization and neural plasticity in pediatric epilepsy.
Resumo:
Finite-Differences Time-Domain (FDTD) algorithms are well established tools of computational electromagnetism. Because of their practical implementation as computer codes, they are affected by many numerical artefact and noise. In order to obtain better results we propose using Principal Component Analysis (PCA) based on multivariate statistical techniques. The PCA has been successfully used for the analysis of noise and spatial temporal structure in a sequence of images. It allows a straightforward discrimination between the numerical noise and the actual electromagnetic variables, and the quantitative estimation of their respective contributions. Besides, The GDTD results can be filtered to clean the effect of the noise. In this contribution we will show how the method can be applied to several FDTD simulations: the propagation of a pulse in vacuum, the analysis of two-dimensional photonic crystals. In this last case, PCA has revealed hidden electromagnetic structures related to actual modes of the photonic crystal.