925 resultados para probabilistic principal component analysis (probabilistic PCA)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rapid development in industry have contributed to more complex systems that are prone to failure. In applications where the presence of faults may lead to premature failure, fault detection and diagnostics tools are often implemented. The goal of this research is to improve the diagnostic ability of existing FDD methods. Kernel Principal Component Analysis has good fault detection capability, however it can only detect the fault and identify few variables that have contribution on occurrence of fault and thus not precise in diagnosing. Hence, KPCA was used to detect abnormal events and the most contributed variables were taken out for more analysis in diagnosis phase. The diagnosis phase was done in both qualitative and quantitative manner. In qualitative mode, a networked-base causality analysis method was developed to show the causal effect between the most contributing variables in occurrence of the fault. In order to have more quantitative diagnosis, a Bayesian network was constructed to analyze the problem in probabilistic perspective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work outlines the theoretical advantages of multivariate methods in biomechanical data, validates the proposed methods and outlines new clinical findings relating to knee osteoarthritis that were made possible by this approach. New techniques were based on existing multivariate approaches, Partial Least Squares (PLS) and Non-negative Matrix Factorization (NMF) and validated using existing data sets. The new techniques developed, PCA-PLS-LDA (Principal Component Analysis – Partial Least Squares – Linear Discriminant Analysis), PCA-PLS-MLR (Principal Component Analysis – Partial Least Squares –Multiple Linear Regression) and Waveform Similarity (based on NMF) were developed to address the challenging characteristics of biomechanical data, variability and correlation. As a result, these new structure-seeking technique revealed new clinical findings. The first new clinical finding relates to the relationship between pain, radiographic severity and mechanics. Simultaneous analysis of pain and radiographic severity outcomes, a first in biomechanics, revealed that the knee adduction moment’s relationship to radiographic features is mediated by pain in subjects with moderate osteoarthritis. The second clinical finding was quantifying the importance of neuromuscular patterns in brace effectiveness for patients with knee osteoarthritis. I found that brace effectiveness was more related to the patient’s unbraced neuromuscular patterns than it was to mechanics, and that these neuromuscular patterns were more complicated than simply increased overall muscle activity, as previously thought.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Microsecond long Molecular Dynamics (MD) trajectories of biomolecular processes are now possible due to advances in computer technology. Soon, trajectories long enough to probe dynamics over many milliseconds will become available. Since these timescales match the physiological timescales over which many small proteins fold, all atom MD simulations of protein folding are now becoming popular. To distill features of such large folding trajectories, we must develop methods that can both compress trajectory data to enable visualization, and that can yield themselves to further analysis, such as the finding of collective coordinates and reduction of the dynamics. Conventionally, clustering has been the most popular MD trajectory analysis technique, followed by principal component analysis (PCA). Simple clustering used in MD trajectory analysis suffers from various serious drawbacks, namely, (i) it is not data driven, (ii) it is unstable to noise and change in cutoff parameters, and (iii) since it does not take into account interrelationships amongst data points, the separation of data into clusters can often be artificial. Usually, partitions generated by clustering techniques are validated visually, but such validation is not possible for MD trajectories of protein folding, as the underlying structural transitions are not well understood. Rigorous cluster validation techniques may be adapted, but it is more crucial to reduce the dimensions in which MD trajectories reside, while still preserving their salient features. PCA has often been used for dimension reduction and while it is computationally inexpensive, being a linear method, it does not achieve good data compression. In this thesis, I propose a different method, a nonmetric multidimensional scaling (nMDS) technique, which achieves superior data compression by virtue of being nonlinear, and also provides a clear insight into the structural processes underlying MD trajectories. I illustrate the capabilities of nMDS by analyzing three complete villin headpiece folding and six norleucine mutant (NLE) folding trajectories simulated by Freddolino and Schulten [1]. Using these trajectories, I make comparisons between nMDS, PCA and clustering to demonstrate the superiority of nMDS. The three villin headpiece trajectories showed great structural heterogeneity. Apart from a few trivial features like early formation of secondary structure, no commonalities between trajectories were found. There were no units of residues or atoms found moving in concert across the trajectories. A flipping transition, corresponding to the flipping of helix 1 relative to the plane formed by helices 2 and 3 was observed towards the end of the folding process in all trajectories, when nearly all native contacts had been formed. However, the transition occurred through a different series of steps in all trajectories, indicating that it may not be a common transition in villin folding. The trajectories showed competition between local structure formation/hydrophobic collapse and global structure formation in all trajectories. Our analysis on the NLE trajectories confirms the notion that a tight hydrophobic core inhibits correct 3-D rearrangement. Only one of the six NLE trajectories folded, and it showed no flipping transition. All the other trajectories get trapped in hydrophobically collapsed states. The NLE residues were found to be buried deeply into the core, compared to the corresponding lysines in the villin headpiece, thereby making the core tighter and harder to undo for 3-D rearrangement. Our results suggest that the NLE may not be a fast folder as experiments suggest. The tightness of the hydrophobic core may be a very important factor in the folding of larger proteins. It is likely that chaperones like GroEL act to undo the tight hydrophobic core of proteins, after most secondary structure elements have been formed, so that global rearrangement is easier. I conclude by presenting facts about chaperone-protein complexes and propose further directions for the study of protein folding.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phenotypic variation in plants can be evaluated by morphological characterization using visual attributes. Fruits have been the major descriptors for identification of different varieties of fruit crops. However, even in their absence, farmers, breeders and interested stakeholders require to distinguish between different mango varieties. This study aimed at determining diversity in mango germplasm from the Upper Athi River (UAR) and providing useful alternative descriptors for the identification of different mango varieties in the absence of fruits. A total of 20 International Plant Genetic Resources Institute (IPGRI) descriptors for mango were selected for use in the visual assessment of 98 mango accessions from 15 sites of the UAR region of eastern Kenya. Purposive sampling was used to identify farmers growing diverse varieties of mangoes. Evaluation of the descriptors was performed on-site and the data collected were then subjected to multivariate analysis including Principal Component Analysis (PCA) and Cluster analysis, one- way analysis of variance (ANOVA) and Chi square tests. Results classified the accessions into two major groups corresponding to indigenous and exotic varieties. The PCA showed the first six principal components accounting for 75.12% of the total variance. A strong and highly significant correlation was observed between the color of fully grown leaves, leaf blade width, leaf blade length and petiole length and also between the leaf attitude, color of young leaf, stem circumference, tree height, leaf margin, growth habit and fragrance. Useful descriptors for morphological evaluation were 14 out of the selected 20; however, ANOVA and Chi square test revealed that diversity in the accessions was majorly as a result of variations in color of young leaves, leaf attitude, leaf texture, growth habit, leaf blade length, leaf blade width and petiole length traits. These results reveal that mango germplasm in the UAR has significant diversity and that other morphological traits apart from fruits can be useful in morphological characterization of mango.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The current approach to data analysis for the Laser Interferometry Space Antenna (LISA) depends on the time delay interferometry observables (TDI) which have to be generated before any weak signal detection can be performed. These are linear combinations of the raw data with appropriate time shifts that lead to the cancellation of the laser frequency noises. This is possible because of the multiple occurrences of the same noises in the different raw data. Originally, these observables were manually generated starting with LISA as a simple stationary array and then adjusted to incorporate the antenna's motions. However, none of the observables survived the flexing of the arms in that they did not lead to cancellation with the same structure. The principal component approach is another way of handling these noises that was presented by Romano and Woan which simplified the data analysis by removing the need to create them before the analysis. This method also depends on the multiple occurrences of the same noises but, instead of using them for cancellation, it takes advantage of the correlations that they produce between the different readings. These correlations can be expressed in a noise (data) covariance matrix which occurs in the Bayesian likelihood function when the noises are assumed be Gaussian. Romano and Woan showed that performing an eigendecomposition of this matrix produced two distinct sets of eigenvalues that can be distinguished by the absence of laser frequency noise from one set. The transformation of the raw data using the corresponding eigenvectors also produced data that was free from the laser frequency noises. This result led to the idea that the principal components may actually be time delay interferometry observables since they produced the same outcome, that is, data that are free from laser frequency noise. The aims here were (i) to investigate the connection between the principal components and these observables, (ii) to prove that the data analysis using them is equivalent to that using the traditional observables and (ii) to determine how this method adapts to real LISA especially the flexing of the antenna. For testing the connection between the principal components and the TDI observables a 10x 10 covariance matrix containing integer values was used in order to obtain an algebraic solution for the eigendecomposition. The matrix was generated using fixed unequal arm lengths and stationary noises with equal variances for each noise type. Results confirm that all four Sagnac observables can be generated from the eigenvectors of the principal components. The observables obtained from this method however, are tied to the length of the data and are not general expressions like the traditional observables, for example, the Sagnac observables for two different time stamps were generated from different sets of eigenvectors. It was also possible to generate the frequency domain optimal AET observables from the principal components obtained from the power spectral density matrix. These results indicate that this method is another way of producing the observables therefore analysis using principal components should give the same results as that using the traditional observables. This was proven by fact that the same relative likelihoods (within 0.3%) were obtained from the Bayesian estimates of the signal amplitude of a simple sinusoidal gravitational wave using the principal components and the optimal AET observables. This method fails if the eigenvalues that are free from laser frequency noises are not generated. These are obtained from the covariance matrix and the properties of LISA that are required for its computation are the phase-locking, arm lengths and noise variances. Preliminary results of the effects of these properties on the principal components indicate that only the absence of phase-locking prevented their production. The flexing of the antenna results in time varying arm lengths which will appear in the covariance matrix and, from our toy model investigations, this did not prevent the occurrence of the principal components. The difficulty with flexing, and also non-stationary noises, is that the Toeplitz structure of the matrix will be destroyed which will affect any computation methods that take advantage of this structure. In terms of separating the two sets of data for the analysis, this was not necessary because the laser frequency noises are very large compared to the photodetector noises which resulted in a significant reduction in the data containing them after the matrix inversion. In the frequency domain the power spectral density matrices were block diagonals which simplified the computation of the eigenvalues by allowing them to be done separately for each block. The results in general showed a lack of principal components in the absence of phase-locking except for the zero bin. The major difference with the power spectral density matrix is that the time varying arm lengths and non-stationarity do not show up because of the summation in the Fourier transform.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Flood Vulnerability Index (FloodVI) was developed using Principal Component Analysis (PCA) and a new aggregation method based on Cluster Analysis (CA). PCA simplifies a large number of variables into a few uncorrelated factors representing the social, economic, physical and environmental dimensions of vulnerability. CA groups areas that have the same characteristics in terms of vulnerability into vulnerability classes. The grouping of the areas determines their classification contrary to other aggregation methods in which the areas' classification determines their grouping. While other aggregation methods distribute the areas into classes, in an artificial manner, by imposing a certain probability for an area to belong to a certain class, as determined by the assumption that the aggregation measure used is normally distributed, CA does not constrain the distribution of the areas by the classes. FloodVI was designed at the neighbourhood level and was applied to the Portuguese municipality of Vila Nova de Gaia where several flood events have taken place in the recent past. The FloodVI sensitivity was assessed using three different aggregation methods: the sum of component scores, the first component score and the weighted sum of component scores. The results highlight the sensitivity of the FloodVI to different aggregation methods. Both sum of component scores and weighted sum of component scores have shown similar results. The first component score aggregation method classifies almost all areas as having medium vulnerability and finally the results obtained using the CA show a distinct differentiation of the vulnerability where hot spots can be clearly identified. The information provided by records of previous flood events corroborate the results obtained with CA, because the inundated areas with greater damages are those that are identified as high and very high vulnerability areas by CA. This supports the fact that CA provides a reliable FloodVI.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Flavanones (hesperidin, naringenin, naringin, and poncirin) in industrial, hand-squeezed orange juices and from fresh-in-squeeze machines orange juices were determined by HPLC/DAD analysis using a previously described liquid-liquid extraction method. Method validation including the accuracy was performed by using recovery tests. Samples (36) collected from different Brazilian locations and brands were analyzed. Concentrations were determined using an external standard curve. The limits of detection (LOD) and the limits of quantification (LOQ) calculated were 0.0037, 1.87, 0.0147, and 0.0066 mg 100 g(-1) and 0.0089, 7.84, 0.0302, and 0.0200 mg 100 g(-1) for naringin, hesperidin, poncirin, and naringenin, respectively. The results demonstrated that hesperidin was present at the highest concentration levels, especially in the industrial orange juices. Its average content and concentration range were 69.85 and 18.80-139.00 mg 100 g(-1). The other flavanones showed the lowest concentration levels. The average contents and concentration ranges found were 0.019, 0.01-0.30, and 0.12 and 0.1-0.17, 0.13, and 0.01-0.36 mg 100 g(-1), respectively. The results were also evaluated using the principal component analysis (PCA) multivariate analysis technique which showed that poncirin, naringenin, and naringin were the principal elements that contributed to the variability in the sample concentrations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this study was to develop a methodology using Raman hyperspectral imaging and chemometric methods for identification of pre- and post-blast explosive residues on banknote surfaces. The explosives studied were of military, commercial and propellant uses. After the acquisition of the hyperspectral imaging, independent component analysis (ICA) was applied to extract the pure spectra and the distribution of the corresponding image constituents. The performance of the methodology was evaluated by the explained variance and the lack of fit of the models, by comparing the ICA recovered spectra with the reference spectra using correlation coefficients and by the presence of rotational ambiguity in the ICA solutions. The methodology was applied to forensic samples to solve an automated teller machine explosion case. Independent component analysis proved to be a suitable method of resolving curves, achieving equivalent performance with the multivariate curve resolution with alternating least squares (MCR-ALS) method. At low concentrations, MCR-ALS presents some limitations, as it did not provide the correct solution. The detection limit of the methodology presented in this study was 50μgcm(-2).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This manuscript aims to show the basic concepts and practical application of Principal Component Analysis (PCA) as a tutorial, using Matlab or Octave computing environment for beginners, undergraduate and graduate students. As a practical example it is shown the exploratory analysis of edible vegetable oils by mid infrared spectroscopy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Descriptive terminology and sensory profile of three varieties of brazilian varietal white wines (cultivars Riesling, Gewürztraminer and Chardonnay) were developed by a methodology based on the Quantitative Descriptive Analysis (QDA). The sensory panel consensually defined the sensory descriptors, their respective reference materials and the descriptive evaluation ballot. Ten individuals were selected as judges based on their discrimination, reproducibility and individual consensus with the sensory panel. Twelve descriptors were generated showing similarities and differences among the wine samples. Each descriptor was evaluated using a nine-centimeters non-structured scale with the intensity terms anchored at its ends. The collected data were analysed by ANOVA, Tukey test and Principal Component Analysis (PCA). The results showed a great difference within the sensory profile of Riesling and Gewürztraminer wines, whereas Chardonnay wines showed a lesser variation. PCA separated samples into two groups: a first group formed by wines higher in sweetness and fruitty flavor and aroma; and a second group of wines higher in sourness, adstringency, bitterness, alcoholic and fermented flavors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A modified version of the intruder-resident paradigm was used to investigate if social recognition memory lasts at least 24 h. One hundred and forty-six adult male Wistar rats were used. Independent groups of rats were exposed to an intruder for 0.083, 0.5, 2, 24, or 168 h and tested 24 h after the first encounter with the familiar or a different conspecific. Factor analysis was employed to identify associations between behaviors and treatments. Resident rats exhibited a 24-h social recognition memory, as indicated by a 3- to 5-fold decrease in social behaviors in the second encounter with the same conspecific compared to those observed for a different conspecific, when the duration of the first encounter was 2 h or longer. It was possible to distinguish between two different categories of social behaviors and their expression depended on the duration of the first encounter. Sniffing the anogenital area (49.9% of the social behaviors), sniffing the body (17.9%), sniffing the head (3%), and following the conspecific (3.1%), exhibited mostly by resident rats, characterized social investigation and revealed long-term social recognition memory. However, dominance (23.8%) and mild aggression (2.3%), exhibited by both resident and intruders, characterized social agonistic behaviors and were not affected by memory. Differently, sniffing the environment (76.8% of the non-social behaviors) and rearing (14.3%), both exhibited mostly by adult intruder rats, characterized non-social behaviors. Together, these results show that social recognition memory in rats may last at least 24 h after a 2-h or longer exposure to the conspecific.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Medium density fiberboard (MDF) is an engineered wood product formed by breaking down selected lignin-cellulosic material residuals into fibers, combining it with wax and a resin binder, and then forming panels by applying high temperature and pressure. Because the raw material in the industrial process is ever-changing, the panel industry requires methods for monitoring the composition of their products. The aim of this study was to estimate the ratio of sugarcane (SC) bagasse to Eucalyptus wood in MDF panels using near infrared (NIR) spectroscopy. Principal component analysis (PCA) and partial least square (PLS) regressions were performed. MDF panels having different bagasse contents were easily distinguished from each other by the PCA of their NIR spectra with clearly different patterns of response. The PLS-R models for SC content of these MDF samples presented a strong coefficient of determination (0.96) between the NIR-predicted and Lab-determined values and a low standard error of prediction (similar to 1.5%) in the cross-validations. A key role of resins (adhesives), cellulose, and lignin for such PLS-R calibrations was shown. PLS-DA model correctly classified ninety-four percent of MDF samples by cross-validations and ninety-eight percent of the panels by independent test set. These NIR-based models can be useful to quickly estimate sugarcane bagasse vs. Eucalyptus wood content ratio in unknown MDF samples and to verify the quality of these engineered wood products in an online process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Natural products have widespread biological activities, including inhibition of mitochondrial enzyme systems. Some of these activities, for example cytotoxicity, may be the result of alteration of cellular bioenergetics. Based on previous computer-aided drug design (CADD) studies and considering reported data on structure-activity relationships (SAR), an assumption regarding the mechanism of action of natural products against parasitic infections involves the NADH-oxidase inhibition. In this study, chemometric tools, such as: Principal Component Analysis (PCA), Consensus PCA (CPCA), and partial least squares regression (PLS), were applied to a set of forty natural compounds, acting as NADH-oxidase inhibitors. The calculations were performed using the VolSurf+ program. The formalisms employed generated good exploratory and predictive results. The independent variables or descriptors having a hydrophobic profile were strongly correlated to the biological data.