42 resultados para probabilistic principal component analysis (probabilistic PCA)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biplots are graphical displays of data matrices based on the decomposition of a matrix as the product of two matrices. Elements of these two matrices are used as coordinates for the rows and columns of the data matrix, with an interpretation of the joint presentation that relies on the properties of the scalar product. Because the decomposition is not unique, there are several alternative ways to scale the row and column points of the biplot, which can cause confusion amongst users, especially when software packages are not united in their approach to this issue. We propose a new scaling of the solution, called the standard biplot, which applies equally well to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. The standard biplot also handles data matrices with widely different levels of inherent variance. Two concepts taken from correspondence analysis are important to this idea: the weighting of row and column points, and the contributions made by the points to the solution. In the standard biplot one set of points, usually the rows of the data matrix, optimally represent the positions of the cases or sample units, which are weighted and usually standardized in some way unless the matrix contains values that are comparable in their raw form. The other set of points, usually the columns, is represented in accordance with their contributions to the low-dimensional solution. As for any biplot, the projections of the row points onto vectors defined by the column points approximate the centred and (optionally) standardized data. The method is illustrated with several examples to demonstrate how the standard biplot copes in different situations to give a joint map which needs only one common scale on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot readable. The proposal also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to interpret the biplot it is necessary to know which points usually variables are the ones that are important contributors to the solution, and this information is available separately as part of the biplot s numerical results. We propose a new scaling of the display, called the contribution biplot, which incorporates this diagnostic directly into the graphical display, showing visually the important contributors and thus facilitating the biplot interpretation and often simplifying the graphical representation considerably. The contribution biplot can be applied to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. In the contribution biplot one set of points, usually the rows of the data matrix, optimally represent the spatial positions of the cases or sample units, according to some distance measure that usually incorporates some form of standardization unless all data are comparable in scale. The other set of points, usually the columns, is represented by vectors that are related to their contributions to the low-dimensional solution. A fringe benefit is that usually only one common scale for row and column points is needed on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot legible. Furthermore, this version of the biplot also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important, when they are in fact contributing minimally to the solution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The singular value decomposition and its interpretation as alinear biplot has proved to be a powerful tool for analysing many formsof multivariate data. Here we adapt biplot methodology to the specifficcase of compositional data consisting of positive vectors each of whichis constrained to have unit sum. These relative variation biplots haveproperties relating to special features of compositional data: the studyof ratios, subcompositions and models of compositional relationships. Themethodology is demonstrated on a data set consisting of six-part colourcompositions in 22 abstract paintings, showing how the singular valuedecomposition can achieve an accurate biplot of the colour ratios and howpossible models interrelating the colours can be diagnosed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A five year program of systematic multi-element geochemical exploration of the Catalonian Coastal Ranges has been initiated by the Geological Survey of Autonomic Government of Catalonia (Generalitat de Catalunya) and the Department of Geological and Geophysical Exploration (University of Barcelona). This paper reports the first stage results of this regional survey, covering an area of 530 km2 in the Montseny Mountains, NE of Barcelona (Spain). Stream sediments for metals and stream waters for fluoride were chosen because of the regional characteristics. Four target areas for future tactic survey were recognized after the prospect. The most important is a 40 km* zone in the Canoves-Vilamajor area, with high base metal values accompanied by Cd, Ni, Co, As and Sb anomalies. Keywords: Catalanides. Geochemical exploration. Stream sediments. Base metal anomalies. Principal Component Analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of high resolution accurate mass spectrometry (HRMS) operating in full scan MS mode was investigated for the quantitative determination of amoxicillin (AMX) as well as qualitative analysis of metabolomic profiles in tissues of medicated chickens. The metabolomic approach was exploited to compile analytical information on changes in the metabolome of muscle, kidney and liver from chickens subjected to a pharmacological program with AMX. Data consisting of m/z features taken throughout the entire chromatogram were extracted and filtered to be treated by Principal Component Analysis. As a result, it was found that medicated and non-treated animals were clearly clustered in distinct groups. Besides, the multivariate analysis revealed some relevant mass features contributing to this separation. In this context, recognizing those potential markers of each chicken class was a priority research for both metabolite identification and, obviously, evaluation of food quality and health effects associated to food consumption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Differences in the distribution of genotypes between individuals of the same ethnicity are an important confounder factor commonly undervalued in typical association studies conducted in radiogenomics. Objective: To evaluate the genotypic distribution of SNPs in a wide set of Spanish prostate cancer patients for determine the homogeneity of the population and to disclose potential bias. Design, Setting, and Participants: A total of 601 prostate cancer patients from Andalusia, Basque Country, Canary and Catalonia were genotyped for 10 SNPs located in 6 different genes associated to DNA repair: XRCC1 (rs25487, rs25489, rs1799782), ERCC2 (rs13181), ERCC1 (rs11615), LIG4 (rs1805388, rs1805386), ATM (rs17503908, rs1800057) and P53 (rs1042522). The SNP genotyping was made in a Biotrove OpenArrayH NT Cycler. Outcome Measurements and Statistical Analysis: Comparisons of genotypic and allelic frequencies among populations, as well as haplotype analyses were determined using the web-based environment SNPator. Principal component analysis was made using the SnpMatrix and XSnpMatrix classes and methods implemented as an R package. Non-supervised hierarchical cluster of SNP was made using MultiExperiment Viewer. Results and Limitations: We observed that genotype distribution of 4 out 10 SNPs was statistically different among the studied populations, showing the greatest differences between Andalusia and Catalonia. These observations were confirmed in cluster analysis, principal component analysis and in the differential distribution of haplotypes among the populations. Because tumor characteristics have not been taken into account, it is possible that some polymorphisms may influence tumor characteristics in the same way that it may pose a risk factor for other disease characteristics. Conclusion: Differences in distribution of genotypes within different populations of the same ethnicity could be an important confounding factor responsible for the lack of validation of SNPs associated with radiation-induced toxicity, especially when extensive meta-analysis with subjects from different countries are carried out.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web 2.0 services such as social bookmarking allow users to manage and share the links they find interesting, adding their own tags for describingthem. This is especially interesting in the field of open educational resources, asdelicious is a simple way to bridge the institutional point of view (i.e. learningobject repositories) with the individual one (i.e. personal collections), thuspromoting the discovering and sharing of such resources by other users. In this paper we propose a methodology for analyzing such tags in order to discover hidden semantics (i.e. taxonomies and vocabularies) that can be used toimprove descriptions of learning objects and make learning object repositories more visible and discoverable. We propose the use of a simple statistical analysis tool such as principal component analysis to discover which tags createclusters that can be semantically interpreted. We will compare the obtained results with a collection of resources related to open educational resources, in order to better understand the real needs of people searching for open educational resources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Differences in the distribution of genotypes between individuals of the same ethnicity are an important confounder factor commonly undervalued in typical association studies conducted in radiogenomics. Objective: To evaluate the genotypic distribution of SNPs in a wide set of Spanish prostate cancer patients for determine the homogeneity of the population and to disclose potential bias. Design, Setting, and Participants: A total of 601 prostate cancer patients from Andalusia, Basque Country, Canary and Catalonia were genotyped for 10 SNPs located in 6 different genes associated to DNA repair: XRCC1 (rs25487, rs25489, rs1799782), ERCC2 (rs13181), ERCC1 (rs11615), LIG4 (rs1805388, rs1805386), ATM (rs17503908, rs1800057) and P53 (rs1042522). The SNP genotyping was made in a Biotrove OpenArrayH NT Cycler. Outcome Measurements and Statistical Analysis: Comparisons of genotypic and allelic frequencies among populations, as well as haplotype analyses were determined using the web-based environment SNPator. Principal component analysis was made using the SnpMatrix and XSnpMatrix classes and methods implemented as an R package. Non-supervised hierarchical cluster of SNP was made using MultiExperiment Viewer. Results and Limitations: We observed that genotype distribution of 4 out 10 SNPs was statistically different among the studied populations, showing the greatest differences between Andalusia and Catalonia. These observations were confirmed in cluster analysis, principal component analysis and in the differential distribution of haplotypes among the populations. Because tumor characteristics have not been taken into account, it is possible that some polymorphisms may influence tumor characteristics in the same way that it may pose a risk factor for other disease characteristics. Conclusion: Differences in distribution of genotypes within different populations of the same ethnicity could be an important confounding factor responsible for the lack of validation of SNPs associated with radiation-induced toxicity, especially when extensive meta-analysis with subjects from different countries are carried out.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Macrofossil analysis of a composite 19 m long sediment core from Rano Raraku Lake (Easter Island)was related to litho-sedimentary and geochemical features of the sediment. Strong stratigraphical patterns are shown by indirect gradient analyses of the data. The good correspondence between the stratigraphical patterns derived from macrofossil (Correspondence Analysis) and sedimentary and geochemical data (Principal Component Analysis) shows that macrofossil associations provide sound palaeolimnological information in conjunction with sedimentary data. The main taphonomic factors in fluencing the macrofossil assemblages are run-off from the catchment, the littoral plant belt, and the depositional environment within the basin. Five main stages during the last 34,000 calibrated years BP (cal yr BP) are characterised from the lithological, geochemical, and macrofossil data. From 34 to 14.6 cal kyr BP (last glacial period) the sediments were largely derived from the catchment, indicating a high energy lake environment with much erosion and run-off bringing abundant plant trichomes, lichens, and mosses into the centre of Raraku Lake.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work analyzes sunshine duration variability in the western part of Europe (WEU) over the 1938– 2004 period. A principal component analysis is applied to cluster the original series from 79 sites into 6 regions, and then annual and seasonal mean series are constructed on regional and also for the whole WEU scales. Over the entire period studied here, the linear trend of annual sunshine duration is found to be nonsignificant. However, annual sunshine duration shows an overall decrease since the 1950s until the early 1980s, followed by a subsequent recovery during the last two decades. This behavior is in good agreement with the dimming and brightening phenomena described in previous literature. From the seasonal analysis, the most remarkable result is the similarity between spring and annual series, although the spring series has a negative trend; and the clear significant increase found for the whole WEU winter series, being especially large since the 1970s. The behavior of the major synoptic patterns for two seasons is investigated, resulting in some indications that sunshine duration evolution may be partially explained by changes in the frequency of some of them