925 resultados para probabilistic principal component analysis (probabilistic PCA)
Resumo:
The aim of this work is to present a tutorial on Multivariate Calibration, a tool which is nowadays necessary in basically most laboratories but very often misused. The basic concepts of preprocessing, principal component analysis (PCA), principal component regression (PCR) and partial least squares (PLS) are given. The two basic steps on any calibration procedure: model building and validation are fully discussed. The concepts of cross validation (to determine the number of factors to be used in the model), leverage and studentized residuals (to detect outliers) for the validation step are given. The whole calibration procedure is illustrated using spectra recorded for ternary mixtures of 2,4,6 trinitrophenolate, 2,4 dinitrophenolate and 2,5 dinitrophenolate followed by the concentration prediction of these three chemical species during a diffusion experiment through a hydrophobic liquid membrane. MATLAB software is used for numerical calculations. Most of the commands for the analysis are provided in order to allow a non-specialist to follow step by step the analysis.
Resumo:
One of the major interests in soil analysis is the evaluation of its chemical, physical and biological parameters, which are indicators of soil quality (the most important is the organic matter). Besides there is a great interest in the study of humic substances and on the assessment of pollutants, such as pesticides and heavy metals, in soils. Chemometrics is a powerful tool to deal with these problems and can help soil researchers to extract much more information from their data. In spite of this, the presence of these kinds of strategies in the literature has obtained projection only recently. The utilization of chemometric methods in soil analysis is evaluated in this article. The applications will be divided in four parts (with emphasis in the first two): (i) descriptive and exploratory methods based on Principal Component Analysis (PCA); (ii) multivariate calibration methods (MLR, PCR and PLS); (iii) methods such as Evolving Factor Analysis and SIMPLISMA; and (iv) artificial intelligence methods, such as Artificial Neural Networks.
Resumo:
The ¹H NMR data set of a series of 3-aryl (1,2,4)-oxadiazol-5-carbohydrazide benzylidene derivatives synthesized in our group was analyzed using the chemometric technique of principal component analysis (PCA). Using the original ¹H NMR data PCA allowed identifying some misassignments of the proton aromatic chemical shifts. As a consequence of this multivariate analysis, nuclear Overhauser difference experiments were performed to investigate the ambiguity of other assignments of the ortho and meta aromatic hydrogens for the compound with the bromine substituent. The effect of the 1,2,4-oxadiazol group as an electron acceptor, mainly for the hydrogens 12,13, has been highlighted.
Resumo:
The concentrations of Cu, Pb, Zn, Cr, Ni, Al, Mn and Fe were measured by atomic absorption spectrometry, of 19 topsoil samples collected in the Teresina city urban area to discriminate natural and anthropic contributions and identify possible sources of pollution. The average concentrations of Cu, Zn, Pb and Cr of the urban soils were 6.11, 8.56, 32.12 and 7,17 mg/kg-1, respectively. Statistical analysis techniques, such as principal component analysis (PCA) and hierarchical cluster analysis (HCA), were used to analyze the data. Mn, Ni and Cr levels were interpreted as natural contributions, whereas Pb, Zn and, in part, Cu were accounted for mainly by anthropic activities. High Pb levels were observed in the ancient avenues.
Resumo:
This study presents a catalogue of synoptic patterns of torrential rainfall in northeast of the Iberian Peninsula (IP). These circulation patterns were obtained by applying a T-mode Principal Component Analysis (PCA) to a daily data grid (NCEP/NCAR reanalysis) at sea level pressure (SLP). The analysis made use of 304 days which recorded >100 mm in one or more stations in provinces of Barcelona, Girona and Tarragona (coastland area of Catalonia) throughout the 1950-2005 period. The catalogue comprises 7 circulation patterns showing a great variety of atmospheric conditions and seasonal or monthly distribution. Likewise, we computed the mean index value of the Western Mediterranean Oscillation index (WeMOi) for the synoptic patterns obtained by averaging all days grouped in each pattern. The results showed a clear association between the negative values of this teleconnection index and torrential rainfall in northeast of the IP. We therefore put forward the WeMO as an essential tool for forecasting heavy rainfall in northeast of Spain
Resumo:
An activity for introducing hierarchical cluster analysis (HCA) and principal component analysis (PCA) during the Instrumental Analytical Chemistry course is presented. The posed problem involves the discrimination of mineral water samples according to their geographical origin. Thirty-seven samples of 9 different brands were considered and the results from the determination of Na, K, Mg, Ca, Sr and Ba were taken into account. Non-supervised methods for pattern recognition were explored to construct a dendrogram, score and loading plots. The devised activity can be adopted for introducing Chemometrics devoted to data handling, stressing its importance in the context of modern Analytical Chemistry.
Resumo:
In this study atmospheric particulates of PAHs were measured in Araraquara, Piracicaba and São Paulo in July 2003 (sugarcane harvest season in Araraquara and Piracicaba) and in Araraquara in March of 2003. The results were normalized to the total PAH concentrations. Comparison among the sites and principal component analysis (PCA) were used to investigate possible tracers of emission. Fluoranthene and pyrene concentrations were higher in Piracicaba and Araraquara samples. These PAH were also responsible for the largest negative loadings on the second principal component and account for the negative scores and for the formation of the Araraquara and Piracicaba group.
Resumo:
Energy dispersive X-ray fluorescence methodology (EDXRF) was used to determine Al, Ba, Ca, Cr, Fe, K, Mn, Pb, Rb, S, Si, Sr, Ti, V, Zn in pottery sherds from seven archaeological sites in the central region of Rio Grande do Sul State, Brazil. The potteries' chemical fingerprints from Ijuí River, Ibicuí Mirim River, Vacacaí Mirim River and Jacuí River were identified. Interactions between sites from the Jacuí River, Vacacaí Mirim River and Ibicui Mirim River could have occurred because some samples from these sites are overlapping in a principal component analysis (PCA) graphic. The pottery provenance could be the same.
Resumo:
Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.
Resumo:
Psychometric analysis of the AF5 multidimensional scale of self-concept in a sample of adolescents and adults in Catalonia. The aim of this study is to carry out a psychometric study of the AF5 scale in a sample of 4.825 Catalan subjects from 11 to 63 years-old. They are students from secondary compulsory education (ESO), from high school, middle-level vocational training (CFGM) and from the university. Using a principal component analysis (PCA) the theoretical validity of the components is established and the reliability of the instrument is also analyzed. Differential analyses are performed by gender and normative group using a 2 6 factorial design. The normative group variable includes the different levels classifi ed into 6 sub-groups: university, post-compulsory secondary education (high school and CFGM), 4th of ESO, 3rd of ESO, 2nd of ESO and 1st of ESO. The results indicate that the reliability of the Catalan version of the scale is similar to the original scale. The factorial structure also fi ts with the original model established beforehand. Signifi cant differences by normative group in the four components of self-concept explored (social, family, academic/occupational and physical) are observed. By gender, signifi cant differences appear in the component of physical self-concept, academic and social but not in the family component
Resumo:
Bulk and supported molybdenum based catalysts, modified by nickel, phosphorous or tungsten were studied by NEXAFS spectroscopy at the Mo L III and L II edges. The techniques of principal component analysis (PCA) together with a linear combination analysis (LCA) allowed the detection and quantification of molybdenum atoms in two different coordination states in the oxide form of the catalysts, namely tetrahedral and octahedral coordination.
Resumo:
This work aims to study spatial and seasonal variability of some chemical-physical parameters in the Turvo/Grande watershed, São Paulo State, Brazil. Water samples were taken monthly, 2007/07-2008/11, from fourteen sampling stations sited along the Turvo, Preto and Grande Rivers and its main tributaries. The Principal Component Analysis and hierarchical cluster analysis showed two distinct groups in this watershed, the first one associated for the places more impacted by domestic effluent (lower levels of dissolved oxygen in the studied region). The sampling places located to downstream (Turvo and Grande rivers) were discriminate by diffuse source of pollutants from flooding and agriculture runoffs in a second group.
Resumo:
In this work, the organic compounds of cigar samples from different brands were analyzed. The compound extraction was made using the matrix solid-phase dispersion (MSPD) technique, followed by gas chromatography and identification by mass spectrometry (GC-MS) and standards, when available. Thirty eight organic compounds were found in seven different brands. Finally, with the objective of characterizing and discriminating the cigar samples, multivariate statistical analyses were applied to data, e.g.; principal component analysis (PCA) and hierarchical cluster analysis (HCA). With such analyses, it was possible to discriminate three main groups of three quality levels.
Resumo:
The knowledge of the structure characteristic of the Organic Matter is important for the understanding of the natural process. In this context aquatic humic substances (principal fraction) were isolated from water sample collected from the two distinct rivers, using procedure recommended for International Humic Substances Society and characterized by elemental analysis, electron paramagnetic resonance and nuclear magnetic resonance (13C NMR). The results were interpreted using principal component analysis (PCA) and the statistical analyses showed different in the structural characteristics of the aquatic humic substances studied.
Resumo:
Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.