51 resultados para PRINCIPAL COMPONENTS-ANALYSIS
Resumo:
We presented a bird-monitoring database inMediterranean landscapes (Catalonia, NE Spain) affected by wildfires and we evaluated: 1) the spatial and temporal variability in the bird community composition and 2) the influence of pre-fire habitat configuration in the composition of bird communities. The DINDIS database results fromthemonitoring of bird communities occupying all areas affected by large wildfires in Catalonia since 2000.We used bird surveys conducted from 2006 to 2009 and performed a principal components analysis to describe two main gradients of variation in the composition of bird communities, which were used as descriptors of bird communities in subsequent analyses. We then analysed the relationships of these community descriptors with bioclimatic regions within Catalonia, time since fire and pre-fire vegetation (forest or shrubland).We have conducted 1,918 bird surveys in 567 transects distributed in 56 burnt areas. Eight out of the twenty most common detected species have an unfavourable conservation status, most of them being associated to open-habitats. Both bird communities’ descriptors had a strong regional component and were related to pre-fire vegetation, and to a lesser extent to the time since fire.We came to the conclusion that the responses of bird communities to wildfires are heterogeneous, complex and context dependent. Large-scale monitoring datasets, such as DINDIS, might allow identifying factors acting at different spatial and temporal scales that affect the dynamics of species and communities, giving additional information on the causes under general trends observed using other monitoring systems
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the RodriguesTriple Junction in the Indian Ocean were studied applying classical statistical methods(fuzzy c-means clustering, linear mixing model, principal component analysis) for theextraction of endmembers and evaluating the spatial and temporal variation ofgeochemical signals. Three main factors of sedimentation were expected by the marinegeologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. Thedisplay of fuzzy membership values and/or factor scores versus depth providedconsistent results for two factors only; the ultra-basic component could not beidentified. The reason for this may be that only traditional statistical methods wereapplied, i.e. the untransformed components were used and the cosine-theta coefficient assimilarity measure.During the last decade considerable progress in compositional data analysis was madeand many case studies were published using new tools for exploratory analysis of thesedata. Therefore it makes sense to check if the application of suitable data transformations,reduction of the D-part simplex to two or three factors and visualinterpretation of the factor scores would lead to a revision of earlier results and toanswers to open questions . In this paper we follow the lines of a paper of R. Tolosana-Delgado et al. (2005) starting with a problem-oriented interpretation of the biplotscattergram, extracting compositional factors, ilr-transformation of the components andvisualization of the factor scores in a spatial context: The compositional factors will beplotted versus depth (time) of the core samples in order to facilitate the identification ofthe expected sources of the sedimentary process.Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix
Resumo:
Isotopic data are currently becoming an important source of information regardingsources, evolution and mixing processes of water in hydrogeologic systems. However, itis not clear how to treat with statistics the geochemical data and the isotopic datatogether. We propose to introduce the isotopic information as new parts, and applycompositional data analysis with the resulting increased composition. Results areequivalent to downscale the classical isotopic delta variables, because they are alreadyrelative (as needed in the compositional framework) and isotopic variations are almostalways very small. This methodology is illustrated and tested with the study of theLlobregat River Basin (Barcelona, NE Spain), where it is shown that, though verysmall, isotopic variations comp lement geochemical principal components, and help inthe better identification of pollution sources
Resumo:
Principal curves have been defined Hastie and Stuetzle (JASA, 1989) assmooth curves passing through the middle of a multidimensional dataset. They are nonlinear generalizations of the first principalcomponent, a characterization of which is the basis for the principalcurves definition.In this paper we propose an alternative approach based on a differentproperty of principal components. Consider a point in the space wherea multivariate normal is defined and, for each hyperplane containingthat point, compute the total variance of the normal distributionconditioned to belong to that hyperplane. Choose now the hyperplaneminimizing this conditional total variance and look for thecorresponding conditional mean. The first principal component of theoriginal distribution passes by this conditional mean and it isorthogonal to that hyperplane. This property is easily generalized todata sets with nonlinear structure. Repeating the search from differentstarting points, many points analogous to conditional means are found.We call them principal oriented points. When a one-dimensional curveruns the set of these special points it is called principal curve oforiented points. Successive principal curves are recursively definedfrom a generalization of the total variance.
Resumo:
We consider the joint visualization of two matrices which have common rowsand columns, for example multivariate data observed at two time pointsor split accord-ing to a dichotomous variable. Methods of interest includeprincipal components analysis for interval-scaled data, or correspondenceanalysis for frequency data or ratio-scaled variables on commensuratescales. A simple result in matrix algebra shows that by setting up thematrices in a particular block format, matrix sum and difference componentscan be visualized. The case when we have more than two matrices is alsodiscussed and the methodology is applied to data from the InternationalSocial Survey Program.
Resumo:
Projecte de recerca elaborat a partir d’una estada a la Universidad Politécnica de Madrid, Espanya, entre setembre i o desembre del 2007. Actualment la indústria aeroespacial i aeronàutica té com prioritat millorar la fiabilitat de las seves estructures a través del desenvolupament de nous sistemes per a la monitorització i detecció d’impactes. Hi ha diverses tècniques potencialment útils, i la seva aplicabilitat en una situació particular depèn críticament de la mida del defecte que permet l’estructura. Qualsevol defecte canviarà la resposta vibratòria de l’element estructural, així com el transitori de l’ona que es propaga per l’estructura elàstica. Correlacionar aquests canvis, que poden ser detectats experimentalment amb l’ocurrència del defecte, la seva localització i quantificació, és un problema molt complex. Aquest treball explora l’ús de l'Anàlisis de Components Principals (Principal Component Analysis - PCA-) basat en la formulació dels estadístics T2 i Q per tal de detectar i distingir els defectes a l'estructura, tot correlacionant els seus canvis a la resposta vibratòria. L’estructura utilitzada per l’estudi és l’ala d’una turbina d’un avió comercial. Aquesta ala s’excita en un extrem utilitzant un vibrador, i a la qual s'han adherit set sensors PZT a la superfície. S'aplica un senyal conegut i s'analitzen les respostes. Es construeix un model PCA utilitzant dades de l’estructura sense defecte. Per tal de provar el model, s'adhereix un tros d’alumini en quatre posicions diferents. Les dades dels assajos de l'estructura amb defecte es projecten sobre el model. Les components principals i les distàncies de Q-residual i T2-Hotelling s'utilitzaran per a l'anàlisi de les incidències. Q-residual indica com de bé s'adiu cadascuna de les mostres al model PCA, ja que és una mesura de la diferència, o residu, entre la mostra i la seva projecció sobre les components principals retingudes en el model. La distància T2-Hotelling és una mesura de la variació de cada mostra dins del model PCA, o el que vindria a ser el mateix, la distància al centre del model PCA.
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuouslycored boreholes, 100 to 220m deep were drilled in the northern part of the PoPlain by Regione Lombardia in the last five years. Quantitative provenanceanalysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carriedout by using multivariate statistical analysis (principal component analysis, PCA,and similarity analysis) on an integrated data set, including high-resolution bulkpetrography and heavy-mineral analyses on Pleistocene sands and of 250 majorand minor modern rivers draining the southern flank of the Alps from West toEast (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations,metamorphic and quartzofeldspathic detritus from the Western and Central Alpswas carried from the axial belt to the Po basin longitudinally parallel to theSouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenariorapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset ofthe first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA andsimilarity analysis from core samples show that the longitudinal trunk river at thistime was shifted southward by the rapid southward and westward progradation oftransverse alluvial river systems fed from the Central and Southern Alps.Sediments were transported southward by braided river systems as well as glacialsediments transported by Alpine valley glaciers invaded the alluvial plain.Kew words: Detrital modes; Modern sands; Provenance; Principal ComponentsAnalysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
En aquest treball, es proposa un nou mètode per estimar en temps real la qualitat del producte final en processos per lot. Aquest mètode permet reduir el temps necessari per obtenir els resultats de qualitat de les anàlisi de laboratori. S'utiliza un model de anàlisi de componentes principals (PCA) construït amb dades històriques en condicions normals de funcionament per discernir si un lot finalizat és normal o no. Es calcula una signatura de falla pels lots anormals i es passa a través d'un model de classificació per la seva estimació. L'estudi proposa un mètode per utilitzar la informació de les gràfiques de contribució basat en les signatures de falla, on els indicadors representen el comportament de les variables al llarg del procés en les diferentes etapes. Un conjunt de dades compost per la signatura de falla dels lots anormals històrics es construeix per cercar els patrons i entrenar els models de classifcació per estimar els resultas dels lots futurs. La metodologia proposada s'ha aplicat a un reactor seqüencial per lots (SBR). Diversos algoritmes de classificació es proven per demostrar les possibilitats de la metodologia proposada.
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by asimplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able togenerate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow definingmonitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
Els avenços en tècniques de genotipat de polimorfismes genètics a gran escala estan liderant una revolució en el camp de l’epidemiologia genètica i la genètica de poblacions humanes. La informació aportada per aquestes tècniques ha evidenciat l’existència d’estructuracions poblacionals que poden augmentar l’error en els estudis d’associació a escala genòmica (GWAS, genome-wide association studies). Estudis recents han demostrat la presència d’aquestes estructuracions a nivell interregional i intrarregional a Europa. El present projecte ha avaluat el grau d’estructuració genètica en poblacions de la Península Ibèrica i altres regions del sudoest europeu (Itàlia i França) per quantificar l’impacte que aquesta potencial estructuració pot tenir en el disseny d’estudis d’associació GWAS i reconstruir la història demogràfica de les poblacions de la Mediterrània. Per aconseguir aquests objectius, s’han analitzat mostres de DNA de 770 individus de 26 poblacions de la Península Ibèrica, França, Itàlia i d’altres països de la Mediterrània. Aquestes mostres van ser genotipades per 240000 SNPs utilitzant l’array 250K StyI d’Affymetrix en el marc d’aquest projecte o mitjançant altres arrays d’Affymetrix en els projectes internacionals HapMap i POPRES. S’han realitzat anàlisis estadístiques incloent anàlisis de components principals, Fst, identitat per descendència, desequilibri de lligament, barreres genètiques, etc. Aquests resultats han permés construir un marc de referència de la variabilitat en aquesta regió, avaluar el seu impacte en estudis d’associació i proposar mesures per evitar l’increment de qualsevol tipus d’error (tipus I i II) en estudis nacionals i internacionals. A més, també han permés reconstruir la història de les poblacions humanes de la Mediterrània així com analitzar les seves relacions demogràfiques. Donada la duració limitada d’aquesta acció (24 mesos, d’octubre de 2010 a setembre de 2012), els resultats d’aquest projecte es troben actualment en fase de redacció i conduiran a diverses publicacions en revistes internacionals i a la preparació de comunicacions a congressos.
Resumo:
The information provided by the alignment-independent GRid Independent Descriptors (GRIND) can be condensed by the application of principal component analysis, obtaining a small number of principal properties (GRIND-PP), which is more suitable for describing molecular similarity. The objective of the present study is to optimize diverse parameters involved in the obtention of the GRIND-PP and validate their suitability for applications, requiring a biologically relevant description of the molecular similarity. With this aim, GRIND-PP computed with a collection of diverse settings were used to carry out ligand-based virtual screening (LBVS) on standard conditions. The quality of the results obtained was remarkable and comparable with other LBVS methods, and their detailed statistical analysis allowed to identify the method settings more determinant for the quality of the results and their optimum. Remarkably, some of these optimum settings differ significantly from those used in previously published applications, revealing their unexplored potential. Their applicability in large compound database was also explored by comparing the equivalence of the results obtained using either computed or projected principal properties. In general, the results of the study confirm the suitability of the GRIND-PP for practical applications and provide useful hints about how they should be computed for obtaining optimum results.
Resumo:
Background: Peach fruit undergoes a rapid softening process that involves a number of metabolic changes. Storing fruit at low temperatures has been widely used to extend its postharvest life. However, this leads to undesired changes, such as mealiness and browning, which affect the quality of the fruit. In this study, a 2-D DIGE approach was designed to screen for differentially accumulated proteins in peach fruit during normal softening as well as under conditions that led to fruit chilling injury. Results:The analysis allowed us to identify 43 spots -representing about 18% of the total number analyzed- that show statistically significant changes. Thirty-nine of the proteins could be identified by mass spectrometry. Some of the proteins that changed during postharvest had been related to peach fruit ripening and cold stress in the past. However, we identified other proteins that had not been linked to these processes. A graphical display of the relationship between the differentially accumulated proteins was obtained using pairwise average-linkage cluster analysis and principal component analysis. Proteins such as endopolygalacturonase, catalase, NADP-dependent isocitrate dehydrogenase, pectin methylesterase and dehydrins were found to be very important for distinguishing between healthy and chill injured fruit. A categorization of the differentially accumulated proteins was performed using Gene Ontology annotation. The results showed that the 'response to stress', 'cellular homeostasis', 'metabolism of carbohydrates' and 'amino acid metabolism' biological processes were affected the most during the postharvest. Conclusions: Using a comparative proteomic approach with 2-D DIGE allowed us to identify proteins that showed stage-specific changes in their accumulation pattern. Several proteins that are related to response to stress, cellular homeostasis, cellular component organization and carbohydrate metabolism were detected as being differentially accumulated. Finally, a significant proportion of the proteins identified had not been associated with softening, cold storage or chilling injury-altered fruit before; thus, comparative proteomics has proven to be a valuable tool for understanding fruit softening and postharvest.
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.