937 resultados para principal components analysis (PCA) algorithm
Resumo:
The application of the Electro-Mechanical Impedance (EMI) method for damage detection in Structural Health Monitoring has noticeable increased in recent years. EMI method utilizes piezoelectric transducers for directly measuring the mechanical properties of the host structure, obtaining the so called impedance measurement, highly influenced by the variations of dynamic parameters of the structure. These measurements usually contain a large number of frequency points, as well as a high number of dimensions, since each frequency range swept can be considered as an independent variable. That makes this kind of data hard to handle, increasing the computational costs and being substantially time-consuming. In that sense, the Principal Component Analysis (PCA)-based data compression has been employed in this work, in order to enhance the analysis capability of the raw data. Furthermore, a Support Vector Machine (SVM), which has been widespread used in machine learning and pattern recognition fields, has been applied in this study in order to model any possible existing pattern in the PCAcompress data, using for that just the first two Principal Components. Different known non-damaged and damaged measurements of an experimental tested beam were used as training input data for the SVM algorithm, using as test input data the same amount of cases measured in beams with unknown structural health conditions. Thus, the purpose of this work is to demonstrate how, with a few impedance measurements of a beam as raw data, its healthy status can be determined based on pattern recognition procedures.
Resumo:
A method is given for determining the time course and spatial extent of consistently and transiently task-related activations from other physiological and artifactual components that contribute to functional MRI (fMRI) recordings. Independent component analysis (ICA) was used to analyze two fMRI data sets from a subject performing 6-min trials composed of alternating 40-sec Stroop color-naming and control task blocks. Each component consisted of a fixed three-dimensional spatial distribution of brain voxel values (a “map”) and an associated time course of activation. For each trial, the algorithm detected, without a priori knowledge of their spatial or temporal structure, one consistently task-related component activated during each Stroop task block, plus several transiently task-related components activated at the onset of one or two of the Stroop task blocks only. Activation patterns occurring during only part of the fMRI trial are not observed with other techniques, because their time courses cannot easily be known in advance. Other ICA components were related to physiological pulsations, head movements, or machine noise. By using higher-order statistics to specify stricter criteria for spatial independence between component maps, ICA produced improved estimates of the temporal and spatial extent of task-related activation in our data compared with principal component analysis (PCA). ICA appears to be a promising tool for exploratory analysis of fMRI data, particularly when the time courses of activation are not known in advance.
Resumo:
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
Resumo:
We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.
Resumo:
Relationships among quality factors in retailed free-range, corn-fed, organic, and conventional chicken breasts (9) were modeled using chemometric approaches. Use of principal component analysis (PCA) to neutral lipid composition data explained the majority (93%) of variability (variance) in fatty acid contents in 2 significant multivariate factors. PCA explained 88 and 75% variance in 3 factors for, respectively, flame ionization detection (FID) and nitrogen phosphorus (NPD) components in chromatographic flavor data from cooked chicken after simultaneous distillation extraction. Relationships to tissue antioxidant contents were modeled. Partial least square regression (PLS2), interrelating total data matrices, provided no useful models. By using single antioxidants as Y variables in PLS (1), good models (r2 values > 0.9) were obtained for alpha-tocopherol, glutathione, catalase, glutathione peroxidase, and reductase and FID flavor components and among the variables total mono and polyunsaturated fatty acids and subsets of FID, and saturated fatty acid and NPD components. Alpha-tocopherol had a modest (r2 = 0.63) relationship with neutral lipid n-3 fatty acid content. Such factors thus relate to flavor development and quality in chicken breast meat.
Resumo:
Prices of U.S. Treasury securities vary over time and across maturities. When the market in Treasurys is sufficiently complete and frictionless, these prices may be modeled by a function time and maturity. A cross-section of this function for time held fixed is called the yield curve; the aggregate of these sections is the evolution of the yield curve. This dissertation studies aspects of this evolution. ^ There are two complementary approaches to the study of yield curve evolution here. The first is principal components analysis; the second is wavelet analysis. In both approaches both the time and maturity variables are discretized. In principal components analysis the vectors of yield curve shifts are viewed as observations of a multivariate normal distribution. The resulting covariance matrix is diagonalized; the resulting eigenvalues and eigenvectors (the principal components) are used to draw inferences about the yield curve evolution. ^ In wavelet analysis, the vectors of shifts are resolved into hierarchies of localized fundamental shifts (wavelets) that leave specified global properties invariant (average change and duration change). The hierarchies relate to the degree of localization with movements restricted to a single maturity at the base and general movements at the apex. Second generation wavelet techniques allow better adaptation of the model to economic observables. Statistically, the wavelet approach is inherently nonparametric while the wavelets themselves are better adapted to describing a complete market. ^ Principal components analysis provides information on the dimension of the yield curve process. While there is no clear demarkation between operative factors and noise, the top six principal components pick up 99% of total interest rate variation 95% of the time. An economically justified basis of this process is hard to find; for example a simple linear model will not suffice for the first principal component and the shape of this component is nonstationary. ^ Wavelet analysis works more directly with yield curve observations than principal components analysis. In fact the complete process from bond data to multiresolution is presented, including the dedicated Perl programs and the details of the portfolio metrics and specially adapted wavelet construction. The result is more robust statistics which provide balance to the more fragile principal components analysis. ^
Resumo:
The complexity of modern geochemical data sets is increasing in several aspects (number of available samples, number of elements measured, number of matrices analysed, geological-environmental variability covered, etc), hence it is becoming increasingly necessary to apply statistical methods to elucidate their structure. This paper presents an exploratory analysis of one such complex data set, the Tellus geochemical soil survey of Northern Ireland (NI). This exploratory analysis is based on one of the most fundamental exploratory tools, principal component analysis (PCA) and its graphical representation as a biplot, albeit in several variations: the set of elements included (only major oxides vs. all observed elements), the prior transformation applied to the data (none, a standardization or a logratio transformation) and the way the covariance matrix between components is estimated (classical estimation vs. robust estimation). Results show that a log-ratio PCA (robust or classical) of all available elements is the most powerful exploratory setting, providing the following insights: the first two processes controlling the whole geochemical variation in NI soils are peat coverage and a contrast between “mafic” and “felsic” background lithologies; peat covered areas are detected as outliers by a robust analysis, and can be then filtered out if required for further modelling; and peat coverage intensity can be quantified with the %Br in the subcomposition (Br, Rb, Ni).
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
LOPES-DOS-SANTOS, V. , CONDE-OCAZIONEZ, S. ; NICOLELIS, M. A. L. , RIBEIRO, S. T. , TORT, A. B. L. . Neuronal assembly detection and cell membership specification by principal component analysis. Plos One, v. 6, p. e20996, 2011.
Resumo:
LOPES-DOS-SANTOS, V. , CONDE-OCAZIONEZ, S. ; NICOLELIS, M. A. L. , RIBEIRO, S. T. , TORT, A. B. L. . Neuronal assembly detection and cell membership specification by principal component analysis. Plos One, v. 6, p. e20996, 2011.
Resumo:
Phenotypic variation in plants can be evaluated by morphological characterization using visual attributes. Fruits have been the major descriptors for identification of different varieties of fruit crops. However, even in their absence, farmers, breeders and interested stakeholders require to distinguish between different mango varieties. This study aimed at determining diversity in mango germplasm from the Upper Athi River (UAR) and providing useful alternative descriptors for the identification of different mango varieties in the absence of fruits. A total of 20 International Plant Genetic Resources Institute (IPGRI) descriptors for mango were selected for use in the visual assessment of 98 mango accessions from 15 sites of the UAR region of eastern Kenya. Purposive sampling was used to identify farmers growing diverse varieties of mangoes. Evaluation of the descriptors was performed on-site and the data collected were then subjected to multivariate analysis including Principal Component Analysis (PCA) and Cluster analysis, one- way analysis of variance (ANOVA) and Chi square tests. Results classified the accessions into two major groups corresponding to indigenous and exotic varieties. The PCA showed the first six principal components accounting for 75.12% of the total variance. A strong and highly significant correlation was observed between the color of fully grown leaves, leaf blade width, leaf blade length and petiole length and also between the leaf attitude, color of young leaf, stem circumference, tree height, leaf margin, growth habit and fragrance. Useful descriptors for morphological evaluation were 14 out of the selected 20; however, ANOVA and Chi square test revealed that diversity in the accessions was majorly as a result of variations in color of young leaves, leaf attitude, leaf texture, growth habit, leaf blade length, leaf blade width and petiole length traits. These results reveal that mango germplasm in the UAR has significant diversity and that other morphological traits apart from fruits can be useful in morphological characterization of mango.
Resumo:
A Flood Vulnerability Index (FloodVI) was developed using Principal Component Analysis (PCA) and a new aggregation method based on Cluster Analysis (CA). PCA simplifies a large number of variables into a few uncorrelated factors representing the social, economic, physical and environmental dimensions of vulnerability. CA groups areas that have the same characteristics in terms of vulnerability into vulnerability classes. The grouping of the areas determines their classification contrary to other aggregation methods in which the areas' classification determines their grouping. While other aggregation methods distribute the areas into classes, in an artificial manner, by imposing a certain probability for an area to belong to a certain class, as determined by the assumption that the aggregation measure used is normally distributed, CA does not constrain the distribution of the areas by the classes. FloodVI was designed at the neighbourhood level and was applied to the Portuguese municipality of Vila Nova de Gaia where several flood events have taken place in the recent past. The FloodVI sensitivity was assessed using three different aggregation methods: the sum of component scores, the first component score and the weighted sum of component scores. The results highlight the sensitivity of the FloodVI to different aggregation methods. Both sum of component scores and weighted sum of component scores have shown similar results. The first component score aggregation method classifies almost all areas as having medium vulnerability and finally the results obtained using the CA show a distinct differentiation of the vulnerability where hot spots can be clearly identified. The information provided by records of previous flood events corroborate the results obtained with CA, because the inundated areas with greater damages are those that are identified as high and very high vulnerability areas by CA. This supports the fact that CA provides a reliable FloodVI.