912 resultados para forward selection component analysis
Resumo:
Vigna unguiculata (L.) Walp (cowpea) is a food crop with high nutritional value that is cultivated throughout tropical and subtropical regions of the world. The main constraint on high productivity of cowpea is water deficit, caused by the long periods of drought that occur in these regions. The aim of the present study was to select elite cowpea genotypes with enhanced drought tolerance, by applying principal component analysis to 219 first-cycle progenies obtained in a recurrent selection program. The experimental design comprised a simple 15 x 15 lattice with 450 plots, each of two rows of 10 plants. Plants were grown under water-deficit conditions by applying a water depth of 205 mm representing one-half of that required by cowpea. Variables assessed were flowering, maturation, pod length, number and mass of beans/pod, mass of 100 beans, and productivity/plot. Ten elite cowpea genotypes were selected, in which principal components 1 and 2 encompassed variables related to yield (pod length, beans/pod, and productivity/plot) and life precocity (flowering and maturation), respectively.
Resumo:
The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.
Resumo:
Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2, 3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4, 5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6, 7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3, 14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15, 16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. Linear spectral unmixing has been intensively researched in the last years [9, 10, 12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21, 32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log( ) law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35, 42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data. ORA SIS [44, 45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postrocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given thresh old. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalizati on. The selected vectors are then projected onto this subspace and a simplex is found by an MV T pro cess. ORA SIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47, 48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating sign al and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selectin g the set of signal eigenvalue s that best represents the data, in the least-square sense [48,49 ], we note, however, that VCA works with projected and with unprojected data. The extraction of the end members exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FIND R algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data on to a direction orthogonal to the subspace spanned by the endmembers already determined. The new end member signature corresponds to the extreme of the projection. The algorithm iterates until all end members are exhausted. VCA performs much better than PPI and better than or comparable to N-FI NDR; yet it has a computational complexity between on e and two orders of magnitude lower than N-FINDR. The chapter is structure d as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.
Resumo:
Phenotypic data from female Canchim beef cattle were used to obtain estimates of genetic parameters for reproduction and growth traits using a linear animal mixed model. In addition, relationships among animal estimated breeding values (EBVs) for these traits were explored using principal component analysis. The traits studied in female Canchim cattle were age at first calving (AFC), age at second calving (ASC), calving interval (CI), and bodyweight at 420 days of age (BW420). The heritability estimates for AFC, ASC, CI and BW420 were 0.03±0.01, 0.07±0.01, 0.06±0.02, and 0.24±0.02, respectively. The genetic correlations for AFC with ASC, AFC with CI, AFC with BW420, ASC with CI, ASC with BW420, and CI with BW420 were 0.87±0.07, 0.23±0.02, -0.15±0.01, 0.67±0.13, -0.07±0.13, and 0.02±0.14, respectively. Standardised EBVs for AFC, ASC and CI exhibited a high association with the first principal component, whereas the standardised EBV for BW420 was closely associated with the second principal component. The heritability estimates for AFC, ASC and CI suggest that these traits would respond slowly to selection. However, selection response could be enhanced by constructing selection indices based on the principal components. © CSIRO 2013.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
A Near Infrared Spectroscopy (NIRS) industrial application was developed by the LPF-Tagralia team, and transferred to a Spanish dehydrator company (Agrotécnica Extremeña S.L.) for the classification of dehydrator onion bulbs for breeding purposes. The automated operation of the system has allowed the classification of more than one million onion bulbs during seasons 2004 to 2008 (Table 1). The performance achieved by the original model (R2=0,65; SEC=2,28ºBrix) was enough for qualitative classification thanks to the broad range of variation of the initial population (18ºBrix). Nevertheless, a reduction of the classification performance of the model has been observed with the passing of seasons. One of the reasons put forward is the reduction of the range of variation that naturally occurs during a breeding process, the other is the variations in other parameters than the variable of interest but whose effects would probably be affecting the measurements [1]. This study points to the application of Independent Component Analysis (ICA) on this highly variable dataset coming from a NIRS industrial application for the identification of the different sources of variation present through seasons.
Resumo:
The aim of our paper is to examine whether Exchange Traded Funds (ETFs) diversify away the private information of informed traders. We apply the spread decomposition models of Glosten and Harris (1998) and Madhavan, Richardson and Roomans (1997) to a sample of ETFs and their control securities. Our results indicate that ETFs have significantly lower adverse selection costs than their control securities. This suggests that private information is diversified away for these securities. Our results therefore offer one explanation for the rapid growth in the ETF market.
Resumo:
The aim of this study was to develop a methodology using Raman hyperspectral imaging and chemometric methods for identification of pre- and post-blast explosive residues on banknote surfaces. The explosives studied were of military, commercial and propellant uses. After the acquisition of the hyperspectral imaging, independent component analysis (ICA) was applied to extract the pure spectra and the distribution of the corresponding image constituents. The performance of the methodology was evaluated by the explained variance and the lack of fit of the models, by comparing the ICA recovered spectra with the reference spectra using correlation coefficients and by the presence of rotational ambiguity in the ICA solutions. The methodology was applied to forensic samples to solve an automated teller machine explosion case. Independent component analysis proved to be a suitable method of resolving curves, achieving equivalent performance with the multivariate curve resolution with alternating least squares (MCR-ALS) method. At low concentrations, MCR-ALS presents some limitations, as it did not provide the correct solution. The detection limit of the methodology presented in this study was 50μgcm(-2).
Resumo:
Three-dimensional spectroscopy techniques are becoming more and more popular, producing an increasing number of large data cubes. The challenge of extracting information from these cubes requires the development of new techniques for data processing and analysis. We apply the recently developed technique of principal component analysis (PCA) tomography to a data cube from the center of the elliptical galaxy NGC 7097 and show that this technique is effective in decomposing the data into physically interpretable information. We find that the first five principal components of our data are associated with distinct physical characteristics. In particular, we detect a low-ionization nuclear-emitting region (LINER) with a weak broad component in the Balmer lines. Two images of the LINER are present in our data, one seen through a disk of gas and dust, and the other after scattering by free electrons and/or dust particles in the ionization cone. Furthermore, we extract the spectrum of the LINER, decontaminated from stellar and extended nebular emission, using only the technique of PCA tomography. We anticipate that the scattered image has polarized light due to its scattered nature.
Resumo:
Aims. A model-independent reconstruction of the cosmic expansion rate is essential to a robust analysis of cosmological observations. Our goal is to demonstrate that current data are able to provide reasonable constraints on the behavior of the Hubble parameter with redshift, independently of any cosmological model or underlying gravity theory. Methods. Using type Ia supernova data, we show that it is possible to analytically calculate the Fisher matrix components in a Hubble parameter analysis without assumptions about the energy content of the Universe. We used a principal component analysis to reconstruct the Hubble parameter as a linear combination of the Fisher matrix eigenvectors (principal components). To suppress the bias introduced by the high redshift behavior of the components, we considered the value of the Hubble parameter at high redshift as a free parameter. We first tested our procedure using a mock sample of type Ia supernova observations, we then applied it to the real data compiled by the Sloan Digital Sky Survey (SDSS) group. Results. In the mock sample analysis, we demonstrate that it is possible to drastically suppress the bias introduced by the high redshift behavior of the principal components. Applying our procedure to the real data, we show that it allows us to determine the behavior of the Hubble parameter with reasonable uncertainty, without introducing any ad-hoc parameterizations. Beyond that, our reconstruction agrees with completely independent measurements of the Hubble parameter obtained from red-envelope galaxies.
Resumo:
This paper is concerned with the use of scientific visualization methods for the analysis of feedforward neural networks (NNs). Inevitably, the kinds of data associated with the design and implementation of neural networks are of very high dimensionality, presenting a major challenge for visualization. A method is described using the well-known statistical technique of principal component analysis (PCA). This is found to be an effective and useful method of visualizing the learning trajectories of many learning algorithms such as back-propagation and can also be used to provide insight into the learning process and the nature of the error surface.
Resumo:
Optical diagnostic methods, such as near-infrared Raman spectroscopy allow quantification and evaluation of human affecting diseases, which could be useful in identifying and diagnosing atherosclerosis in coronary arteries. The goal of the present work is to apply Independent Component Analysis (ICA) for data reduction and feature extraction of Raman spectra and to perform the Mahalanobis distance for group classification according to histopathology, obtaining feasible diagnostic information to detect atheromatous plaque. An 830nm Ti:sapphire laser pumped by an argon laser provides near-infrared excitation. A spectrograph disperses light scattered from arterial tissues over a liquid-nitrogen cooled CCD to detect the Raman spectra. A total of 111 spectra from arterial fragments were utilized.
Resumo:
Functional MRI (fMRI) data often have low signal-to-noise-ratio (SNR) and are contaminated by strong interference from other physiological sources. A promising tool for extracting signals, even under low SNR conditions, is blind source separation (BSS), or independent component analysis (ICA). BSS is based on the assumption that the detected signals are a mixture of a number of independent source signals that are linearly combined via an unknown mixing matrix. BSS seeks to determine the mixing matrix to recover the source signals based on principles of statistical independence. In most cases, extraction of all sources is unnecessary; instead, a priori information can be applied to extract only the signal of interest. Herein we propose an algorithm based on a variation of ICA, called Dependent Component Analysis (DCA), where the signal of interest is extracted using a time delay obtained from an autocorrelation analysis. We applied such method to inspect functional Magnetic Resonance Imaging (fMRI) data, aiming to find the hemodynamic response that follows neuronal activation from an auditory stimulation, in human subjects. The method localized a significant signal modulation in cortical regions corresponding to the primary auditory cortex. The results obtained by DCA were also compared to those of the General Linear Model (GLM), which is the most widely used method to analyze fMRI datasets.
Resumo:
In this paper a methodology for integrated multivariate monitoring and control of biological wastewater treatment plants during extreme events is presented. To monitor the process, on-line dynamic principal component analysis (PCA) is performed on the process data to extract the principal components that represent the underlying mechanisms of the process. Fuzzy c-means (FCM) clustering is used to classify the operational state. Performing clustering on scores from PCA solves computational problems as well as increases robustness due to noise attenuation. The class-membership information from FCM is used to derive adequate control set points for the local control loops. The methodology is illustrated by a simulation study of a biological wastewater treatment plant, on which disturbances of various types are imposed. The results show that the methodology can be used to determine and co-ordinate control actions in order to shift the control objective and improve the effluent quality.
Resumo:
Independent component analysis (ICA) has recently been proposed as a tool to unmix hyperspectral data. ICA is founded on two assumptions: 1) the observed spectrum vector is a linear mixture of the constituent spectra (endmember spectra) weighted by the correspondent abundance fractions (sources); 2)sources are statistically independent. Independent factor analysis (IFA) extends ICA to linear mixtures of independent sources immersed in noise. Concerning hyperspectral data, the first assumption is valid whenever the multiple scattering among the distinct constituent substances (endmembers) is negligible, and the surface is partitioned according to the fractional abundances. The second assumption, however, is violated, since the sum of abundance fractions associated to each pixel is constant due to physical constraints in the data acquisition process. Thus, sources cannot be statistically independent, this compromising the performance of ICA/IFA algorithms in hyperspectral unmixing. This paper studies the impact of hyperspectral source statistical dependence on ICA and IFA performances. We conclude that the accuracy of these methods tends to improve with the increase of the signature variability, of the number of endmembers, and of the signal-to-noise ratio. In any case, there are always endmembers incorrectly unmixed. We arrive to this conclusion by minimizing the mutual information of simulated and real hyperspectral mixtures. The computation of mutual information is based on fitting mixtures of Gaussians to the observed data. A method to sort ICA and IFA estimates in terms of the likelihood of being correctly unmixed is proposed.