922 resultados para principal component regression


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The fatty acids of olive oils of distinct quality grade from the most important European Union (EU) producer countries were chemically and isotopically characterized. The analytical approach utilized combined capillary column gas chromatography-mass spectrometry (GC/MS) and the novel technique of compound-specific isotope analysis (CSIA) through gas chromatography coupled to a stable isotope ratio mass spectrometer (IRMS) via a combustion (C) interface (GC/C/IRMS). This approach provides further insights into the control of the purity and geographical origin of oils sold as cold-pressed extra virgin olive oil with certified origin appellation. The results indicate that substantial enrichment in heavy carbon isotope (C-13) of the bulk oil and of individual fatty acids are related to (1) a thermally induced degradation due to deodorization or steam washing of the olive oils and (2) the potential blend with refined olive oil or other vegetable oils. The interpretation of the data is based on principal component analysis of the fatty acids concentrations and isotopic data (delta(13)C(oil), delta(13)C(16:0), delta(13)C(18:1)) and on the delta(13)C(16:0) vs delta(13)C(18:1) covariations. The differences in the delta(13)C values of palmitic and oleic acids are discussed in terms of biosynthesis of these acids in the plant tissue and admixture of distinct oils.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Biplots are graphical displays of data matrices based on the decomposition of a matrix as the product of two matrices. Elements of these two matrices are used as coordinates for the rows and columns of the data matrix, with an interpretation of the joint presentation that relies on the properties of the scalar product. Because the decomposition is not unique, there are several alternative ways to scale the row and column points of the biplot, which can cause confusion amongst users, especially when software packages are not united in their approach to this issue. We propose a new scaling of the solution, called the standard biplot, which applies equally well to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. The standard biplot also handles data matrices with widely different levels of inherent variance. Two concepts taken from correspondence analysis are important to this idea: the weighting of row and column points, and the contributions made by the points to the solution. In the standard biplot one set of points, usually the rows of the data matrix, optimally represent the positions of the cases or sample units, which are weighted and usually standardized in some way unless the matrix contains values that are comparable in their raw form. The other set of points, usually the columns, is represented in accordance with their contributions to the low-dimensional solution. As for any biplot, the projections of the row points onto vectors defined by the column points approximate the centred and (optionally) standardized data. The method is illustrated with several examples to demonstrate how the standard biplot copes in different situations to give a joint map which needs only one common scale on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot readable. The proposal also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cape Verde is a tropical oceanic ecosystem, highly fragmented and dispersed, with islands physically isolated by distance and depth. To understand how isolation affects the ecological variability in this archipelago, we conducted a research project on the community structure of the 18 commercially most important demersal fishes. An index of ecological distance based on species relative dominance (Di) is developed from Catch Per Unit Effort, derived from an extensive database of artisanal fisheries. Two ecological measures of distance between islands are calculated: at the species level, DDi, and at the community level, DD (sum of DDi). A physical isolation factor (Idb) combining distance (d) and bathymetry (b) is proposed. Covariance analysis shows that isolation factor is positively correlated with both DDi and DD, suggesting that Idb can be considered as an ecological isolation factor. The effect of Idb varies with season and species. This effect is stronger in summer (May to November), than in winter (December to April), which appears to be more unstable. Species react differently to Idb, independently of season. A principal component analysis on the monthly (DDi) for the 12 islands and the 18 species, complemented by an agglomerative hierarchical clustering, shows a geographic pattern of island organization, according to Idb. Results indicate that the ecological structure of demersal fish communities of Cape Verde archipelago, both in time and space, can be explained by a geographic isolation factor. The analytical approach used here is promising and could be tested in other archipelago systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Cape Verde Archipelago location and its biogeographical features are of special interest for Marine Ecology. However, there’s a lack of knowledge regarding the composition of the coastal ecosystems in this region, especially about benthic macroinvertebrates subtidal communities. Between August and October of 2007, eight locations around the island of São Vicente were sampled. Within each of those spots, fragments of substratum were collected and throughout the processing of the collected data, a total of 4032 individuals were counted, which belong to 81 different species. Shannon’s Entropy and Gini-Simpson’s diversity index were calculated, as the real number of species each one represented. By comparing the results, differences between sampling stations and between indices within the same sampling station were found. With the purpose of clustering the sampled locations according to the number of collected organisms by species, a dendrogram was elaborated and a principal component analysis was carried out. The considered sampling stations didn’t reveal significant differences according to the composition of their benthic macroinvertebrates subtidal communities in terms of great taxonomic groups or functional groups. It’s assumed that they differ only by minute traits.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual's DNA can be used to infer their geographic origin with surprising accuracy-often to within a few hundred kilometres.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Using one male-inherited and eight biparentally inherited microsatellite markers, we investigate the population genetic structure of the Valais chromosome race of the common shrew (Sorex araneus) in the Central Alps of Europe. Unexpectedly, the Y-chromosome microsatellite suggests nearly complete absence of male gene flow among populations from the St-Bernard and Simplon regions (Switzerland). Autosomal markers also show significant genetic structuring among these two geographical areas. Isolation by distance is significant and possible barriers to gene flow exist in the study area. Two different approaches are used to better understand the geographical patterns and the causes of this structuring. Using a principal component analysis for which testing procedure exists, and partial Mantel tests, we show that the St-Bernard pass does not represent a significant barrier to gene flow although it culminates at 2469 m, close to the highest altitudinal record for this species. Similar results are found for the Simplon pass, indicating that both passes represented potential postglacial recolonization routes into Switzerland from Italian refugia after the last Pleistocene glaciations. In contrast with the weak effect of these mountain passes, the Rhône valley lowlands significantly reduce gene flow in this species. Natural obstacles (the large Rhône river) and unsuitable habitats (dry slopes) are both present in the valley. Moreover, anthropogenic changes to landscape structures are likely to have strongly reduced available habitats for this shrew in the lowlands, thereby promoting genetic differentiation of populations found on opposite sides of the Rhône valley.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In order to interpret the biplot it is necessary to know which points usually variables are the ones that are important contributors to the solution, and this information is available separately as part of the biplot s numerical results. We propose a new scaling of the display, called the contribution biplot, which incorporates this diagnostic directly into the graphical display, showing visually the important contributors and thus facilitating the biplot interpretation and often simplifying the graphical representation considerably. The contribution biplot can be applied to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. In the contribution biplot one set of points, usually the rows of the data matrix, optimally represent the spatial positions of the cases or sample units, according to some distance measure that usually incorporates some form of standardization unless all data are comparable in scale. The other set of points, usually the columns, is represented by vectors that are related to their contributions to the low-dimensional solution. A fringe benefit is that usually only one common scale for row and column points is needed on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot legible. Furthermore, this version of the biplot also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important, when they are in fact contributing minimally to the solution.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This study represents the most extensive analysis of batch-to-batch variations in spray paint samples to date. The survey was performed as a collaborative project of the ENFSI (European Network of Forensic Science Institutes) Paint and Glass Working Group (EPG) and involved 11 laboratories. Several studies have already shown that paint samples of similar color but from different manufacturers can usually be differentiated using an appropriate analytical sequence. The discrimination of paints from the same manufacturer and color (batch-to-batch variations) is of great interest and these data are seldom found in the literature. This survey concerns the analysis of batches from different color groups (white, papaya (special shade of orange), red and black) with a wide range of analytical techniques and leads to the following conclusions. Colored batch samples are more likely to be differentiated since their pigment composition is more complex (pigment mixtures, added pigments) and therefore subject to variations. These variations may occur during the paint production but may also occur when checking the paint shade in quality control processes. For these samples, techniques aimed at color/pigment(s) characterization (optical microscopy, microspectrophotometry (MSP), Raman spectroscopy) provide better discrimination than techniques aimed at the organic (binder) or inorganic composition (fourier transform infrared spectroscopy (FTIR) or elemental analysis (SEM - scanning electron microscopy and XRF - X-ray fluorescence)). White samples contain mainly titanium dioxide as a pigment and the main differentiation is based on the binder composition (Csingle bondH stretches) detected either by FTIR or Raman. The inorganic composition (elemental analysis) also provides some discrimination. Black samples contain mainly carbon black as a pigment and are problematic with most of the spectroscopic techniques. In this case, pyrolysis-GC/MS represents the best technique to detect differences. Globally, Py-GC/MS may show a high potential of discrimination on all samples but the results are highly dependent on the specific instrumental conditions used. Finally, the discrimination of samples when data was interpreted visually as compared to statistically using principal component analysis (PCA) yielded very similar results. PCA increases sensitivity and could perform better on specific samples, but one first has to ensure that all non-informative variation (baseline deviation) is eliminated by applying correct pre-treatments. Statistical treatments can be used on a large data set and, when combined with an expert's opinion, will provide more objective criteria for decision making.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Stages of Change Readiness and Treatment Eagerness Scale (SOCRATES), a 19-item instrument developed to assess readiness to change alcohol use among individuals presenting for specialized alcohol treatment, has been used in various populations and settings. Its factor structure and concurrent validity has been described for specialized alcohol treatment settings and primary care. The purpose of this study was to determine the factor structure and concurrent validity of the SOCRATES among medical inpatients with unhealthy alcohol use not seeking help for specialized alcohol treatment. The subjects were 337 medical inpatients with unhealthy alcohol use, identified during their hospital stay. Most of them had alcohol dependence (76%). We performed an Alpha Factor Analysis (AFA) and Principal Component Analysis (PCA) of the 19 SOCRATES items, and forced 3 factors and 2 components, in order to replicate findings from Miller and Tonigan (Miller, W. R., & Tonigan, J. S., (1996). Assessing drinkers' motivations for change: The Stages of Change Readiness and Treatment Eagerness Scale (SOCRATES). Psychology of Addictive Behavior, 10, 81-89.) and Maisto et al. (Maisto, S. A., Conigliaro, J., McNeil, M., Kraemer, K., O'Connor, M., & Kelley, M. E., (1999). Factor structure of the SOCRATES in a sample of primary care patients. Addictive Behavior, 24(6), 879-892.). Our analysis supported the view that the 2 component solution proposed by Maisto et al. (Maisto, S.A., Conigliaro, J., McNeil, M., Kraemer, K., O'Connor, M., & Kelley, M.E., (1999). Factor structure of the SOCRATES in a sample of primary care patients. Addictive Behavior, 24(6), 879-892.) is more appropriate for our data than the 3 factor solution proposed by Miller and Tonigan (Miller, W. R., & Tonigan, J. S., (1996). Assessing drinkers' motivations for change: The Stages of Change Readiness and Treatment Eagerness Scale (SOCRATES). Psychology of Addictive Behavior, 10, 81-89.). The first component measured Perception of Problems and was more strongly correlated with severity of alcohol-related consequences, presence of alcohol dependence, and alcohol consumption levels (average number of drinks per day and total number of binge drinking days over the past 30 days) compared to the second component measuring Taking Action. Our findings support the view that the SOCRATES is comprised of two important readiness constructs in general medical patients identified by screening.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the joint visualization of two matrices which have common rowsand columns, for example multivariate data observed at two time pointsor split accord-ing to a dichotomous variable. Methods of interest includeprincipal components analysis for interval-scaled data, or correspondenceanalysis for frequency data or ratio-scaled variables on commensuratescales. A simple result in matrix algebra shows that by setting up thematrices in a particular block format, matrix sum and difference componentscan be visualized. The case when we have more than two matrices is alsodiscussed and the methodology is applied to data from the InternationalSocial Survey Program.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The singular value decomposition and its interpretation as alinear biplot has proved to be a powerful tool for analysing many formsof multivariate data. Here we adapt biplot methodology to the specifficcase of compositional data consisting of positive vectors each of whichis constrained to have unit sum. These relative variation biplots haveproperties relating to special features of compositional data: the studyof ratios, subcompositions and models of compositional relationships. Themethodology is demonstrated on a data set consisting of six-part colourcompositions in 22 abstract paintings, showing how the singular valuedecomposition can achieve an accurate biplot of the colour ratios and howpossible models interrelating the colours can be diagnosed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dual scaling of a subjects-by-objects table of dominance data (preferences,paired comparisons and successive categories data) has been contrasted with correspondence analysis, as if the two techniques were somehow different. In this note we show that dual scaling of dominance data is equivalent to the correspondence analysis of a table which is doubled with respect to subjects. We also show that the results of both methods can be recovered from a principal components analysis of the undoubled dominance table which is centred with respect to subject means.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.