929 resultados para Principal component analysis discriminant analysis
Resumo:
In the current context of serious climate changes, where the increase of the frequency of some extreme events occurrence can enhance the rate of periods prone to high intensity forest fires, the National Forest Authority often implements, in several Portuguese forest areas, a regular set of measures in order to control the amount of fuel mass availability (PNDFCI, 2008). In the present work we’ll present a preliminary analysis concerning the assessment of the consequences given by the implementation of prescribed fire measures to control the amount of fuel mass in soil recovery, in particular in terms of its water retention capacity, its organic matter content, pH and content of iron. This work is included in a larger study (Meira-Castro, 2009(a); Meira-Castro, 2009(b)). According to the established praxis on the data collection, embodied in multidimensional matrices of n columns (variables in analysis) by p lines (sampled areas at different depths), and also considering the quantitative data nature present in this study, we’ve chosen a methodological approach that considers the multivariate statistical analysis, in particular, the Principal Component Analysis (PCA ) (Góis, 2004). The experiments were carried out in a soil cover over a natural site of Andaluzitic schist, in Gramelas, Caminha, NW Portugal, who was able to maintain itself intact from prescribed burnings from four years and was submit to prescribed fire in March 2008. The soils samples were collected from five different plots at six different time periods. The methodological option that was adopted have allowed us to identify the most relevant relational structures inside the n variables, the p samples and in two sets at the same time (Garcia-Pereira, 1990). Consequently, and in addition to the traditional outputs produced from the PCA, we have analyzed the influence of both sampling depths and geomorphological environments in the behavior of all variables involved.
Resumo:
The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.
Resumo:
Diagnosis of several neurological disorders is based on the detection of typical pathological patterns in the electroencephalogram (EEG). This is a time-consuming task requiring significant training and experience. Automatic detection of these EEG patterns would greatly assist in quantitative analysis and interpretation. We present a method, which allows automatic detection of epileptiform events and discrimination of them from eye blinks, and is based on features derived using a novel application of independent component analysis. The algorithm was trained and cross validated using seven EEGs with epileptiform activity. For epileptiform events with compensation for eyeblinks, the sensitivity was 65 +/- 22% at a specificity of 86 +/- 7% (mean +/- SD). With feature extraction by PCA or classification of raw data, specificity reduced to 76 and 74%, respectively, for the same sensitivity. On exactly the same data, the commercially available software Reveal had a maximum sensitivity of 30% and concurrent specificity of 77%. Our algorithm performed well at detecting epileptiform events in this preliminary test and offers a flexible tool that is intended to be generalized to the simultaneous classification of many waveforms in the EEG.
Resumo:
Statistical shape analysis techniques commonly employed in the medical imaging community, such as active shape models or active appearance models, rely on principal component analysis (PCA) to decompose shape variability into a reduced set of interpretable components. In this paper we propose principal factor analysis (PFA) as an alternative and complementary tool to PCA providing a decomposition into modes of variation that can be more easily interpretable, while still being a linear efficient technique that performs dimensionality reduction (as opposed to independent component analysis, ICA). The key difference between PFA and PCA is that PFA models covariance between variables, rather than the total variance in the data. The added value of PFA is illustrated on 2D landmark data of corpora callosa outlines. Then, a study of the 3D shape variability of the human left femur is performed. Finally, we report results on vector-valued 3D deformation fields resulting from non-rigid registration of ventricles in MRI of the brain.
Resumo:
The main purpose of this article is to gain an insight into the relationships between variables describing the environmental conditions of the Far Northern section of the Great Barrier Reef, Australia, Several of the variables describing these conditions had different measurement levels and often they had non-linear relationships. Using non-linear principal component analysis, it was possible to acquire an insight into these relationships. Furthermore. three geographical areas with unique environmental characteristics could be identified. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.
Resumo:
The component structure of a 34-item scale measuring different aspects of job satisfaction was investigated among extension officers in North West Province, South Africa. A simple random sampling technique was used to select 40 extension officers from which data were collected. A structured questionnaire consisting of 34 job satisfaction and 10 personal characteristic items was administered to the extension officers. Items on job satisfaction were measured at interval level and analyzedwith Principal ComponentAnalysis. Most of the respondents (82.5%) weremales, between 40 to 45 years, 85% were married and 87.5% had a diploma as their educational qualification. Furthermore, 54% of the households size between 4 to 6 persons, whereas 75% were Christians. The majority of the extension officers lived in their job area (82.5), while 80% covered at least 3 communities and 3 farmer groups. In terms of number of farmers covered, only 40% of the extension officers covered more than 500 farmers and 45% travelled more than 40 km to reach their farmers. From the job satisfaction items 9 components were extracted to show areas for job satisfaction among extension officers. These were in-service training, research policies, communicating recommended practices, financial support for self and family, quality of technical help, opportunity to advance education, management and control of operations, rewarding system and sanctions. The results have several implications for motivating extension officers for high job performance especially with large number of clients and small number of extension agents.
Resumo:
Prices of U.S. Treasury securities vary over time and across maturities. When the market in Treasurys is sufficiently complete and frictionless, these prices may be modeled by a function time and maturity. A cross-section of this function for time held fixed is called the yield curve; the aggregate of these sections is the evolution of the yield curve. This dissertation studies aspects of this evolution. ^ There are two complementary approaches to the study of yield curve evolution here. The first is principal components analysis; the second is wavelet analysis. In both approaches both the time and maturity variables are discretized. In principal components analysis the vectors of yield curve shifts are viewed as observations of a multivariate normal distribution. The resulting covariance matrix is diagonalized; the resulting eigenvalues and eigenvectors (the principal components) are used to draw inferences about the yield curve evolution. ^ In wavelet analysis, the vectors of shifts are resolved into hierarchies of localized fundamental shifts (wavelets) that leave specified global properties invariant (average change and duration change). The hierarchies relate to the degree of localization with movements restricted to a single maturity at the base and general movements at the apex. Second generation wavelet techniques allow better adaptation of the model to economic observables. Statistically, the wavelet approach is inherently nonparametric while the wavelets themselves are better adapted to describing a complete market. ^ Principal components analysis provides information on the dimension of the yield curve process. While there is no clear demarkation between operative factors and noise, the top six principal components pick up 99% of total interest rate variation 95% of the time. An economically justified basis of this process is hard to find; for example a simple linear model will not suffice for the first principal component and the shape of this component is nonstationary. ^ Wavelet analysis works more directly with yield curve observations than principal components analysis. In fact the complete process from bond data to multiresolution is presented, including the dedicated Perl programs and the details of the portfolio metrics and specially adapted wavelet construction. The result is more robust statistics which provide balance to the more fragile principal components analysis. ^
Resumo:
Objectives: The aim of this work was to verify the differentiation between normal and pathological human carotid artery tissues by using fluorescence and reflectance spectroscopy in the 400- to 700-nm range and the spectral characterization by means of principal components analysis. Background Data: Atherosclerosis is the most common and serious pathology of the cardiovascular system. Principal components represent the main spectral characteristics that occur within the spectral data and could be used for tissue classification. Materials and Methods: Sixty postmortem carotid artery fragments (26 non-atherosclerotic and 34 atherosclerotic with non-calcified plaques) were studied. The excitation radiation consisted of a 488-nm argon laser. Two 600-mu m core optical fibers were used, one for excitation and one to collect the fluorescence radiation from the samples. The reflectance system was composed of a halogen lamp coupled to an excitation fiber positioned in one of the ports of an integrating sphere that delivered 5 mW to the sample. The photo-reflectance signal was coupled to a 1/4-m spectrograph via an optical fiber. Euclidean distance was then used to classify each principal component score into one of two classes, normal and atherosclerotic tissue, for both fluorescence and reflectance. Results: The principal components analysis allowed classification of the samples with 81% sensitivity and 88% specificity for fluorescence, and 81% sensitivity and 91% specificity for reflectance. Conclusions: Our results showed that principal components analysis could be applied to differentiate between normal and atherosclerotic tissue with high sensitivity and specificity.
Resumo:
Multi-factor approaches to analysis of real estate returns have, since the pioneering work of Chan, Hendershott and Sanders (1990), emphasised a macro-variables approach in preference to the latent factor approach that formed the original basis of the arbitrage pricing theory. With increasing use of high frequency data and trading strategies and with a growing emphasis on the risks of extreme events, the macro-variable procedure has some deficiencies. This paper explores a third way, with the use of an alternative to the standard principal components approach – independent components analysis (ICA). ICA seeks higher moment independence and maximises in relation to a chosen risk parameter. We apply an ICA based on kurtosis maximisation to weekly US REIT data using a kurtosis maximising algorithm. The results show that ICA is successful in capturing the kurtosis characteristics of REIT returns, offering possibilities for the development of risk management strategies that are sensitive to extreme events and tail distributions.
Resumo:
In order to differentiate and characterize Madeira wines according to main grape varieties, the volatile composition (higher alcohols, fatty acids, ethyl esters and carbonyl compounds) was determined for 36 monovarietal Madeira wine samples elaborated from Boal, Malvazia, Sercial and Verdelho white grape varieties. The study was carried out by headspace solid-phase microextraction technique (HS-SPME), in dynamic mode, coupled with gas chromatography–mass spectrometry (GC–MS). Corrected peak area data for 42 analytes from the above mentioned chemical groups was used for statistical purposes. Principal component analysis (PCA) was applied in order to determine the main sources of variability present in the data sets and to establish the relation between samples (objects) and volatile compounds (variables). The data obtained by GC–MS shows that the most important contributions to the differentiation of Boal wines are benzyl alcohol and (E)-hex-3-en-1-ol. Ethyl octadecanoate, (Z)-hex-3-en-1-ol and benzoic acid are the major contributions in Malvazia wines and 2-methylpropan-1-ol is associated to Sercial wines. Verdelho wines are most correlated with 5-(ethoxymethyl)-furfural, nonanone and cis-9-ethyldecenoate. A 96.4% of prediction ability was obtained by the application of stepwise linear discriminant analysis (SLDA) using the 19 variables that maximise the variance of the initial data set.
Resumo:
Concentrations of 39 organic compounds were determined in three fractions (head, heart and tail) obtained from the pot still distillation of fermented sugarcane juice. The results were evaluated using analysis of variance (ANOVA), Tukey's test, principal component analysis (PCA), hierarchical cluster analysis (HCA) and linear discriminant analysis (LDA). According to PCA and HCA, the experimental data lead to the formation of three clusters. The head fractions give rise to a more defined group. The heart and tail fractions showed some overlap consistent with its acid composition. The predictive ability of calibration and validation of the model generated by LDA for the three fractions classification were 90.5 and 100%, respectively. This model recognized as the heart twelve of the thirteen commercial cachacas (92.3%) with good sensory characteristics, thus showing potential for guiding the process of cuts.
Resumo:
Abstract Background Prostate cancer is a leading cause of death in the male population, therefore, a comprehensive study about the genes and the molecular networks involved in the tumoral prostate process becomes necessary. In order to understand the biological process behind potential biomarkers, we have analyzed a set of 57 cDNA microarrays containing ~25,000 genes. Results Principal Component Analysis (PCA) combined with the Maximum-entropy Linear Discriminant Analysis (MLDA) were applied in order to identify genes with the most discriminative information between normal and tumoral prostatic tissues. Data analysis was carried out using three different approaches, namely: (i) differences in gene expression levels between normal and tumoral conditions from an univariate point of view; (ii) in a multivariate fashion using MLDA; and (iii) with a dependence network approach. Our results show that malignant transformation in the prostatic tissue is more related to functional connectivity changes in their dependence networks than to differential gene expression. The MYLK, KLK2, KLK3, HAN11, LTF, CSRP1 and TGM4 genes presented significant changes in their functional connectivity between normal and tumoral conditions and were also classified as the top seven most informative genes for the prostate cancer genesis process by our discriminant analysis. Moreover, among the identified genes we found classically known biomarkers and genes which are closely related to tumoral prostate, such as KLK3 and KLK2 and several other potential ones. Conclusion We have demonstrated that changes in functional connectivity may be implicit in the biological process which renders some genes more informative to discriminate between normal and tumoral conditions. Using the proposed method, namely, MLDA, in order to analyze the multivariate characteristic of genes, it was possible to capture the changes in dependence networks which are related to cell transformation.
Resumo:
Dahl salt-sensitive (DS) and salt-resistant (DR) inbred rat strains represent a well established animal model for cardiovascular research. Upon prolonged administration of high-salt-containing diet, DS rats develop systemic hypertension, and as a consequence they develop left ventricular hypertrophy, followed by heart failure. The aim of this work was to explore whether this animal model is suitable to identify biomarkers that characterize defined stages of cardiac pathophysiological conditions. The work had to be performed in two stages: in the first part proteomic differences that are attributable to the two separate rat lines (DS and DR) had to be established, and in the second part the process of development of heart failure due to feeding the rats with high-salt-containing diet has to be monitored. This work describes the results of the first stage, with the outcome of protein expression profiles of left ventricular tissues of DS and DR rats kept under low salt diet. Substantial extent of quantitative and qualitative expression differences between both strains of Dahl rats in heart tissue was detected. Using Principal Component Analysis, Linear Discriminant Analysis and other statistical means we have established sets of differentially expressed proteins, candidates for further molecular analysis of the heart failure mechanisms.
Resumo:
Classical liquid-state high-resolution (HR) NMR spectroscopy has proved a powerful tool in the metabonomic analysis of liquid food samples like fruit juices. In this paper the application of (1)H high-resolution magic angle spinning (HR-MAS) NMR spectroscopy to apple tissue is presented probing its potential for metabonomic studies. The (1)H HR-MAS NMR spectra are discussed in terms of the chemical composition of apple tissue and compared to liquid-state NMR spectra of apple juice. Differences indicate that specific metabolic changes are induced by juice preparation. The feasibility of HR-MAS NMR-based multivariate analysis is demonstrated by a study distinguishing three different apple cultivars by principal component analysis (PCA). Preliminary results are shown from subsequent studies comparing three different cultivation methods by means of PCA and partial least squares discriminant analysis (PLS-DA) of the HR-MAS NMR data. The compounds responsible for discriminating organically grown apples are discussed. Finally, an outlook of our ongoing work is given including a longitudinal study on apples.