911 resultados para Regression (PCR)
Resumo:
Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i
Resumo:
A quantitative structure-activity relationship (QSAR) study of 19 quinone compounds with trypanocidal activity was performed by Partial Least Squares (PLS) and Principal Component Regression (PCR) methods with the use of leave-one-out crossvalidation procedure to build the regression models. The trypanocidal activity of the compounds is related to their first cathodic potential (Ep(c1)). The regression PLS and PCR models built in this study were also used to predict the Ep(c1) of six new quinone compounds. The PLS model was built with three principal components that described 96.50% of the total variance and present Q(2) = 0.83 and R-2 = 0.90. The results obtained with the PCR model were similar to those obtained with the PLS model. The PCR model was also built with three principal components that described 96.67% of the total variance with Q(2) = 0.83 and R-2 = 0.90. The most important descriptors for our PLS and PCR models were HOMO-1 (energy of the molecular orbital below HOMO), Q4 (atomic charge at position 4), MAXDN (maximal electrotopological negative difference), and HYF (hydrophilicity index).
Resumo:
This paper studied two different regression techniques for pelvic shape prediction, i.e., the partial least square regression (PLSR) and the principal component regression (PCR). Three different predictors such as surface landmarks, morphological parameters, or surface models of neighboring structures were used in a cross-validation study to predict the pelvic shape. Results obtained from applying these two different regression techniques were compared to the population mean model. In almost all the prediction experiments, both regression techniques unanimously generated better results than the population mean model, while the difference on prediction accuracy between these two regression methods is not statistically significant (α=0.01).
Resumo:
Water is a limited resource for which demand is growing. Contaminated water from inadequate wastewater treatment provides one of the greatest health challenges as it restricts development and increases poverty in emerging and developing countries. Therefore, the connection between wastewater and human health is linked to access to sanitation and to human waste disposal. Adequate sanitation is expected to create a barrier between disposed human excreta and sources of drinking water. Different approaches to wastewater management are required for different geographical regions and different stages of economic governance depending on the capacity to manage wastewater. Effective wastewater management can contribute to overcome the challenges of water scarcity. Separate collection of human urine at its source is one promising approach that strongly reduces the economic and load demands on wastewater treatment plants (WWTP). Treatment of source-separated urine appears as a sanitation system that is affordable, produces a valuable fertiliser, reduces pollution of water resources and promotes health. However, the technical realisation of urine separation still faces challenges. Biological hydrolysis of urea causes a strong increase of ammonia and pH. Under these conditions ammonia volatilises which can cause odour problems and significant nitrogen losses. The above problems can be avoided by urine stabilisation. Biological nitrification is a suitable process for stabilisation of urine. Urine is a highly concentrated nutrient solution which can lead to strong inhibition effects during bacterial nitrification. This can further lead to process instabilities. The major cause of instability is accumulation of the inhibitory intermediate compound nitrite, which could lead to process breakdown. Enhanced on-line nitrite monitoring can be applied in biological source-separated urine nitrification reactors as a sustainable and efficient way to improve the reactor performance, avoiding reactor failures and eventual loss of biological activity. Spectrophotometry appears as a promising candidate for the development and application of on-line nitrite monitoring. Spectroscopic methods together with chemometrics are presented in this work as a powerful tool for estimation of nitrite concentrations. Principal component regression (PCR) is applied for the estimation of nitrite concentrations using an immersible UV sensor and off-line spectra acquisition. The effect of particles and the effect of saturation, respectively, on the UV absorbance spectra are investigated. The analysis allows to conclude that (i) saturation has a substantial effect on nitrite estimation; (ii) particles appear to have less impact on nitrite estimation. In addition, improper mixing together with instabilities in the urine nitrification process appears to significantly reduce the performance of the estimation model.
Resumo:
The aim of this work is to present a tutorial on Multivariate Calibration, a tool which is nowadays necessary in basically most laboratories but very often misused. The basic concepts of preprocessing, principal component analysis (PCA), principal component regression (PCR) and partial least squares (PLS) are given. The two basic steps on any calibration procedure: model building and validation are fully discussed. The concepts of cross validation (to determine the number of factors to be used in the model), leverage and studentized residuals (to detect outliers) for the validation step are given. The whole calibration procedure is illustrated using spectra recorded for ternary mixtures of 2,4,6 trinitrophenolate, 2,4 dinitrophenolate and 2,5 dinitrophenolate followed by the concentration prediction of these three chemical species during a diffusion experiment through a hydrophobic liquid membrane. MATLAB software is used for numerical calculations. Most of the commands for the analysis are provided in order to allow a non-specialist to follow step by step the analysis.
Resumo:
Two spectrophotometric methods are described for the simultaneous determination of ezetimibe (EZE) and simvastatin (SIM) in pharmaceutical preparations. The obtained data was evaluated by using two different chemometric techniques, Principal Component Regression (PCR) and Partial Least-Squares (PLS-1). In these techniques, the concentration data matrix was prepared by using the mixtures containing these drugs in methanol. The absorbance data matrix corresponding to the concentration data matrix was obtained by the measurements of absorbances in the range of 240 - 300 nm in the intervals with Δλ = 1 nm at 61 wavelengths in their zero order spectra, then, calibration or regression was obtained by using the absorbance data matrix and concentration data matrix for the prediction of the unknown concentrations of EZE and SIM in their mixture. The procedure did not require any separation step. The linear range was found to be 5 - 20 µg mL-1 for EZE and SIM in both methods. The accuracy and precision of the methods were assessed. These methods were successfully applied to a pharmaceutical preparation, tablet; and the results were compared with each other.
Resumo:
In this work we used chemometric tools to classify and quantify the protein content in samples of milk powder. We applied the NIR diffuse reflectance spectroscopy combined with multivariate techniques. First, we carried out an exploratory method of samples by principal component analysis (PCA), then the classification of independent modeling of class analogy (SIMCA). Thus it became possible to classify the samples that were grouped by similarities in their composition. Finally, the techniques of partial least squares regression (PLS) and principal components regression (PCR) allowed the quantification of protein content in samples of milk powder, compared with the Kjeldahl reference method. A total of 53 samples of milk powder sold in the metropolitan areas of Natal, Salvador and Rio de Janeiro were acquired for analysis, in which after pre-treatment data, there were four models, which were employed for classification and quantification of samples. The methods employed after being assessed and validated showed good performance, good accuracy and reliability of the results, showing that the NIR technique can be a non invasive technique, since it produces no waste and saves time in analyzing the samples
Resumo:
Pós-graduação em Química - IQ
Resumo:
Este artigo apresenta uma aplicação do método para determinação espectrofotométrica simultânea dos íons divalentes de cobre, manganês e zinco à análise de medicamento polivitamínico/polimineral. O método usa 4-(2-piridilazo) resorcinol (PAR), calibração multivariada e técnicas de seleção de variáveis e foi otimizado o empregando-se o algoritmo das projeções sucessivas (APS) e o algoritmo genético (AG), para escolha dos comprimentos de onda mais informativos para a análise. Com essas técnicas, foi possível construir modelos de calibração por regressão linear múltipla (RLM-APS e RLM-AG). Os resultados obtidos foram comparados com modelos de regressão em componentes principais (PCR) e nos mínimos quadrados parciais (PLS). Demonstra-se a partir do erro médio quadrático de previsão (RMSEP) que os modelos apresentam desempenhos semelhantes ao prever as concentrações dos três analitos no medicamento. Todavia os modelos RLM são mais simples pois requerem um número muito menor de comprimentos de onda e são mais fáceis de interpretar que os baseados em variáveis latentes.
Resumo:
Over the past few decades, the advantages of the visible-near infra-red (VisNIR) diffuse reflectance spectrometer (DRS) method have enabled prediction of soil organic carbon (SOC). In this study, SOC was predicted using regression models for samples taken from three sites (Gununo, Maybar and Anjeni) in Ethiopia. SOC was characterized in laboratory using conventional wet chemistry and VisNIR-DRS methods. Principal component analysis (PCA), principal component regression (PCR) and partial least square regression (PLS) models were developed using Unscrambler X 10.2. PCA results show that the first two components accounted for a minimum of 96% variation which increased for individual sites and with data treatments. Correlation (r), coefficient of determination (R2) and residual prediction deviation (RPD) were used to rate four models built. PLS model (r, R2, RPD) values for Anjeni were 0.9, 0.9 and 3.6; for Gununo values 0.6, 0.3 and 1.2; for Maybar values 0.6, 0.3 and 0.9, and for the three sites values 0.7, 0.6 and 1.5, respectively. PCR model values (r, R2, RPD) for Anjeni were 0.9, 0.8 and 2.7; for Gununo values 0.5, 0.3 and 1; for Maybar values 0.5, 0.1 and 0.7, and for the three sites values 0.7, 0.5 and 1.2, respectively. Comparison and testing of models shows superior performance of PLS to PCR. Models were rated as very poor (Maybar), poor (Gununo and three sites) and excellent (Anjeni). A robust model, Anjeni, is recommended for prediction of SOC in Ethiopia.
Resumo:
Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: (1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (E LUMO) via QSAR modelling and analysis; (2) to validate the models by using internal and external cross-validation techniques; (3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl ) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: (1) Linear or Multi-linear Regression (MLR); (2) Partial Least Squares (PLS); and (3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: (1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; (2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; (3) E LUMO are shown to correlate highly with the NCl for several classes of DBPs; and (4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.
Resumo:
Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: 1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (ELUMO) via QSAR modelling and analysis; 2) to validate the models by using internal and external cross-validation techniques; 3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: 1) Linear or Multi-linear Regression (MLR); 2) Partial Least Squares (PLS); and 3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: 1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; 2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; 3) ELUMO are shown to correlate highly with the NCl for several classes of DBPs; and 4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.
Resumo:
Despite the central role of quantitative PCR (qPCR) in the quantification of mRNA transcripts, most analyses of qPCR data are still delegated to the software that comes with the qPCR apparatus. This is especially true for the handling of the fluorescence baseline. This article shows that baseline estimation errors are directly reflected in the observed PCR efficiency values and are thus propagated exponentially in the estimated starting concentrations as well as 'fold-difference' results. Because of the unknown origin and kinetics of the baseline fluorescence, the fluorescence values monitored in the initial cycles of the PCR reaction cannot be used to estimate a useful baseline value. An algorithm that estimates the baseline by reconstructing the log-linear phase downward from the early plateau phase of the PCR reaction was developed and shown to lead to very reproducible PCR efficiency values. PCR efficiency values were determined per sample by fitting a regression line to a subset of data points in the log-linear phase. The variability, as well as the bias, in qPCR results was significantly reduced when the mean of these PCR efficiencies per amplicon was used in the calculation of an estimate of the starting concentration per sample.
Resumo:
Colorectal cancer (CRC) is the second leading cause of cancer-related death in developed countries. Early detection of CRC leads to decreased CRC mortality. A blood-based CRC screening test is highly desirable due to limited invasiveness and high acceptance rate among patients compared to currently used fecal occult blood testing and colonoscopy. Here we describe the discovery and validation of a 29-gene panel in peripheral blood mononuclear cells (PBMC) for the detection of CRC and adenomatous polyps (AP). Blood samples were prospectively collected from a multicenter, case-control clinical study. First, we profiled 93 samples with 667 candidate and 3 reference genes by high throughput real-time PCR (OpenArray system). After analysis, 160 genes were retained and tested again on 51 additional samples. Low expressed and unstable genes were discarded resulting in a final dataset of 144 samples profiled with 140 genes. To define which genes, alone or in combinations had the highest potential to discriminate AP and/or CRC from controls, data were analyzed by a combination of univariate and multivariate methods. A list of 29 potentially discriminant genes was compiled and evaluated for its predictive accuracy by penalized logistic regression and bootstrap. This method discriminated AP >1cm and CRC from controls with a sensitivity of 59% and 75%, respectively, with 91% specificity. The behavior of the 29-gene panel was validated with a LightCycler 480 real-time PCR platform, commonly adopted by clinical laboratories. In this work we identified a 29-gene panel expressed in PBMC that can be used for developing a novel minimally-invasive test for accurate detection of AP and CRC using a standard real-time PCR platform.
Resumo:
The DNA extraction is a critical step in Genetically Modified Organisms analysis based on real-time PCR. In this study, the CTAB and DNeasy methods provided good quality and quantity of DNA from the texturized soy protein, infant formula, and soy milk samples. Concerning the Certified Reference Material consisting of 5% Roundup Ready® soybean, neither method yielded DNA of good quality. However, the dilution test applied in the CTAB extracts showed no interference of inhibitory substances. The PCR efficiencies of lectin target amplification were not statistically different, and the coefficients of correlation (R²) demonstrated high degree of correlation between the copy numbers and the threshold cycle (Ct) values. ANOVA showed suitable adjustment of the regression and absence of significant linear deviations. The efficiencies of the p35S amplification were not statistically different, and all R² values using DNeasy extracts were above 0.98 with no significant linear deviations. Two out of three R² values using CTAB extracts were lower than 0.98, corresponding to lower degree of correlation, and the lack-of-fit test showed significant linear deviation in one run. The comparative analysis of the Ct values for the p35S and lectin targets demonstrated no statistical significant differences between the analytical curves of each target.