144 resultados para Partial least square regression

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Medium density fiberboard (MDF) is an engineered wood product formed by breaking down selected lignin-cellulosic material residuals into fibers, combining it with wax and a resin binder, and then forming panels by applying high temperature and pressure. Because the raw material in the industrial process is ever-changing, the panel industry requires methods for monitoring the composition of their products. The aim of this study was to estimate the ratio of sugarcane (SC) bagasse to Eucalyptus wood in MDF panels using near infrared (NIR) spectroscopy. Principal component analysis (PCA) and partial least square (PLS) regressions were performed. MDF panels having different bagasse contents were easily distinguished from each other by the PCA of their NIR spectra with clearly different patterns of response. The PLS-R models for SC content of these MDF samples presented a strong coefficient of determination (0.96) between the NIR-predicted and Lab-determined values and a low standard error of prediction (similar to 1.5%) in the cross-validations. A key role of resins (adhesives), cellulose, and lignin for such PLS-R calibrations was shown. PLS-DA model correctly classified ninety-four percent of MDF samples by cross-validations and ninety-eight percent of the panels by independent test set. These NIR-based models can be useful to quickly estimate sugarcane bagasse vs. Eucalyptus wood content ratio in unknown MDF samples and to verify the quality of these engineered wood products in an online process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chlorpheniramine maleate (CLOR) enantiomers were quantified by ultraviolet spectroscopy and partial least squares regression. The CLOR enantiomers were prepared as inclusion complexes with beta-cyclodextrin and 1-butanol with mole fractions in the range from 50 to 100%. For the multivariate calibration the outliers were detected and excluded and variable selection was performed by interval partial least squares and a genetic algorithm. Figures of merit showed results for accuracy of 3.63 and 2.83% (S)-CLOR for root mean square errors of calibration and prediction, respectively. The ellipse confidence region included the point for the intercept and the slope of 1 and 0, respectively. Precision and analytical sensitivity were 0.57 and 0.50% (S)-CLOR, respectively. The sensitivity, selectivity, adjustment, and signal-to-noise ratio were also determined. The model was validated by a paired t test with the results obtained by high-performance liquid chromatography proposed by the European pharmacopoeia and circular dichroism spectroscopy. The results showed there was no significant difference between the methods at the 95% confidence level, indicating that the proposed method can be used as an alternative to standard procedures for chiral analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analyses of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique Least Square Projections ( LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in mD. In order to perform the projection, a small number of distance calculations are necessary, and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in 2D. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high-quality methods, particularly where it was mostly tested, that is, for mapping text sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a chemotaxonomic analysis of a database of triterpenoid compounds from the Celastraceae family using principal component analysis (PCA). The numbers of occurrences of thirty types of triterpene skeleton in different tribes of the family were used as variables. The study shows that PCA applied to chemical data can contribute to an intrafamilial classification of Celastraceae, once some questionable taxa affinity was observed, from chemotaxonomic inferences about genera and they are in agreement with the phylogeny previously proposed. The inclusion of Hippocrateaceae within Celastraceae is supported by the triterpene chemistry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The concentration of 15 polycyclic aromatic hydrocarbons (PAHs) in 57 samples of distillates (cachaça, rum, whiskey, and alcohol fuel) has been determined by HPLC-Fluorescence detection. The quantitative analytical profile of PAHs treated by Partial Least Square - Discriminant Analysis (PLS-DA) provided a good classification of the studied spirits based on their PAHs content. Additionally, the classification of the sugar cane derivatives according to the harvest practice was obtained treating the analytical data by Linear Discriminant Analysis (LDA), using naphthalene, acenaphthene, fluorene, phenanthrene, anthracene, fluoranthene, pyrene, benz[a]anthracene, benz[b]fluoranthene, and benz[g,h,i]perylene, as a chemical descriptors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One hundred fifteen cachaça samples derived from distillation in copper stills (73) or in stainless steels (42) were analyzed for thirty five itens by chromatography and inductively coupled plasma optical emission spectrometry. The analytical data were treated through Factor Analysis (FA), Partial Least Square Discriminant Analysis (PLS-DA) and Quadratic Discriminant Analysis (QDA). The FA explained 66.0% of the database variance. PLS-DA showed that it is possible to distinguish between the two groups of cachaças with 52.8% of the database variance. QDA was used to build up a classification model using acetaldehyde, ethyl carbamate, isobutyl alcohol, benzaldehyde, acetic acid and formaldehyde as chemical descriptors. The model presented 91.7% of accuracy on predicting the apparatus in which unknown samples were distilled.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Natural products have widespread biological activities, including inhibition of mitochondrial enzyme systems. Some of these activities, for example cytotoxicity, may be the result of alteration of cellular bioenergetics. Based on previous computer-aided drug design (CADD) studies and considering reported data on structure-activity relationships (SAR), an assumption regarding the mechanism of action of natural products against parasitic infections involves the NADH-oxidase inhibition. In this study, chemometric tools, such as: Principal Component Analysis (PCA), Consensus PCA (CPCA), and partial least squares regression (PLS), were applied to a set of forty natural compounds, acting as NADH-oxidase inhibitors. The calculations were performed using the VolSurf+ program. The formalisms employed generated good exploratory and predictive results. The independent variables or descriptors having a hydrophobic profile were strongly correlated to the biological data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

P>Soil bulk density values are needed to convert organic carbon content to mass of organic carbon per unit area. However, field sampling and measurement of soil bulk density are labour-intensive, costly and tedious. Near-infrared reflectance spectroscopy (NIRS) is a physically non-destructive, rapid, reproducible and low-cost method that characterizes materials according to their reflectance in the near-infrared spectral region. The aim of this paper was to investigate the ability of NIRS to predict soil bulk density and to compare its performance with published pedotransfer functions. The study was carried out on a dataset of 1184 soil samples originating from a reforestation area in the Brazilian Amazon basin, and conventional soil bulk density values were obtained with metallic ""core cylinders"". The results indicate that the modified partial least squares regression used on spectral data is an alternative method for soil bulk density predictions to the published pedotransfer functions tested in this study. The NIRS method presented the closest-to-zero accuracy error (-0.002 g cm-3) and the lowest prediction error (0.13 g cm-3) and the coefficient of variation of the validation sets ranged from 8.1 to 8.9% of the mean reference values. Nevertheless, further research is required to assess the limits and specificities of the NIRS method, but it may have advantages for soil bulk density predictions, especially in environments such as the Amazon forest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The application of laser induced breakdown spectrometry (LIBS) aiming the direct analysis of plant materials is a great challenge that still needs efforts for its development and validation. In this way, a series of experimental approaches has been carried out in order to show that LIBS can be used as an alternative method to wet acid digestions based methods for analysis of agricultural and environmental samples. The large amount of information provided by LIBS spectra for these complex samples increases the difficulties for selecting the most appropriated wavelengths for each analyte. Some applications have suggested that improvements in both accuracy and precision can be achieved by the application of multivariate calibration in LIBS data when compared to the univariate regression developed with line emission intensities. In the present work, the performance of univariate and multivariate calibration, based on partial least squares regression (PLSR), was compared for analysis of pellets of plant materials made from an appropriate mixture of cryogenically ground samples with cellulose as the binding agent. The development of a specific PLSR model for each analyte and the selection of spectral regions containing only lines of the analyte of interest were the best conditions for the analysis. In this particular application, these models showed a similar performance. but PLSR seemed to be more robust due to a lower occurrence of outliers in comparison to the univariate method. Data suggests that efforts dealing with sample presentation and fitness of standards for LIBS analysis must be done in order to fulfill the boundary conditions for matrix independent development and validation. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tuberculosis is an infection caused mainly by Mycobacterium tuberculosis. A first-line antimycobacterial drug is pyrazinamide (PZA), which acts partially as a prodrug activated by a pyrazinamidase releasing the active agent, pyrazinoic acid (POA). As pyrazinoic acid presents some difficulty to cross the mycobacterial cell wall, and also the pyrazinamide-resistant strains do not express the pyrazinamidase, a set of pyrazinoic acid esters have been evaluated as antimycobacterial agents. In this work, a QSAR approach was applied to a set of forty-three pyrazinoates against M. tuberculosis ATCC 27294, using genetic algorithm function and partial least squares regression (WOLF 5.5 program). The independent variables selected were the Balaban index (I), calculated n-octanol/water partition coefficient (ClogP), van-der-Waals surface area, dipole moment, and stretching-energy contribution. The final QSAR model (N = 32, r(2) = 0.68, q(2) = 0.59, LOF = 0.25, and LSE = 0.19) was fully validated employing leave-N-out cross-validation and y-scrambling techniques. The test set (N = 11) presented an external prediction power of 73%. In conclusion, the QSAR model generated can be used as a valuable tool to optimize the activity of future pyrazinoic acid esters in the designing of new antituberculosis agents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Histamine is an important biogenic amine, which acts with a group of four G-protein coupled receptors (GPCRs), namely H(1) to H(4) (H(1)R - H(4)R) receptors. The actions of histamine at H(4)R are related to immunological and inflammatory processes, particularly in pathophysiology of asthma, and H(4)R ligands having antagonistic properties could be helpful as antiinflammatory agents. In this work, molecular modeling and QSAR studies of a set of 30 compounds, indole and benzimidazole derivatives, as H(4)R antagonists were performed. The QSAR models were built and optimized using a genetic algorithm function and partial least squares regression (WOLF 5.5 program). The best QSAR model constructed with training set (N = 25) presented the following statistical measures: r (2) = 0.76, q (2) = 0.62, LOF = 0.15, and LSE = 0.07, and was validated using the LNO and y-randomization techniques. Four of five compounds of test set were well predicted by the selected QSAR model, which presented an external prediction power of 80%. These findings can be quite useful to aid the designing of new anti-H(4) compounds with improved biological response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To identify chemical descriptors to distinguish Cuban from non-Cuban rums, analyses of 44 samples of rum from 15 different countries are described. To provide the chemical descriptors, analyses of the the mineral fraction, phenolic compounds, caramel, alcohols, acetic acid, ethyl acetate, ketones, and aldehydes were carried out. The analytical data were treated through the following chemometric methods: principal component analysis (PCA), partial least square-discriminate analysis (PLS-DA), and linear discriminate analysis (LDA). These analyses indicated 23 analytes as relevant chemical descriptors for the separation of rums into two distinct groups. The possibility of clustering the rum samples investigated through PCA analysis led to an accumulative percentage of 70.4% in the first three principal components, and isoamyl alcohol, n-propyl alcohol, copper, iron, 2-furfuraldehyde (furfuraldehyde), phenylmethanal (benzaldehyde), epicatechin, and vanillin were used as chemical descriptors. By applying the PLS-DA technique to the whole set of analytical data, the following analytes have been selected as descriptors: acetone, sec-butyl alcohol, isobutyl alcohol, ethyl acetate, methanol, isoamyl alcohol, magnesium, sodium, lead, iron, manganese, copper, zinc, 4-hydroxy3,5-dimethoxybenzaldehyde (syringaldehyde), methaldehyde (formaldehyde), 5-hydroxymethyl-2furfuraldehyde (5-HMF), acetalclehyde, 2-furfuraldehyde, 2-butenal (crotonaldehyde), n-pentanal (valeraldehyde), iso-pentanal (isovaleraldehyde), benzaldehyde, 2,3-butanodione monoxime, acetylacetone, epicatechin, and vanillin. By applying the LIDA technique, a model was developed, and the following analytes were selected as descriptors: ethyl acetate, sec-butyl alcohol, n-propyl alcohol, n-butyl alcohol, isoamyl alcohol, isobutyl alcohol, caramel, catechin, vanillin, epicatechin, manganese, acetalclehyde, 4-hydroxy-3-methoxybenzoic acid, 2-butenal, 4-hydroxy-3,5-dimethoxybenzoic acid, cyclopentanone, acetone, lead, zinc, calcium, barium, strontium, and sodium. This model allowed the discrimination of Cuban rums from the others with 88.2% accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this study was to compare REML/BLUP and Least Square procedures in the prediction and estimation of genetic parameters and breeding values in soybean progenies. F(2:3) and F(4:5) progenies were evaluated in the 2005/06 growing season and the F(2:4) and F(4:6) generations derived thereof were evaluated in 2006/07. These progenies were originated from two semi-early, experimental lines that differ in grain yield. The experiments were conducted in a lattice design and plots consisted of a 2 m row, spaced 0.5 m apart. The trait grain yield per plot was evaluated. It was observed that early selection is more efficient for the discrimination of the best lines from the F(4) generation onwards. No practical differences were observed between the least square and REML/BLUP procedures in the case of the models and simplifications for REML/BLUP used here.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: There are several studies in the literature depicting measurement error in gene expression data and also, several others about regulatory network models. However, only a little fraction describes a combination of measurement error in mathematical regulatory networks and shows how to identify these networks under different rates of noise. Results: This article investigates the effects of measurement error on the estimation of the parameters in regulatory networks. Simulation studies indicate that, in both time series (dependent) and non-time series (independent) data, the measurement error strongly affects the estimated parameters of the regulatory network models, biasing them as predicted by the theory. Moreover, when testing the parameters of the regulatory network models, p-values computed by ignoring the measurement error are not reliable, since the rate of false positives are not controlled under the null hypothesis. In order to overcome these problems, we present an improved version of the Ordinary Least Square estimator in independent (regression models) and dependent (autoregressive models) data when the variables are subject to noises. Moreover, measurement error estimation procedures for microarrays are also described. Simulation results also show that both corrected methods perform better than the standard ones (i.e., ignoring measurement error). The proposed methodologies are illustrated using microarray data from lung cancer patients and mouse liver time series data. Conclusions: Measurement error dangerously affects the identification of regulatory network models, thus, they must be reduced or taken into account in order to avoid erroneous conclusions. This could be one of the reasons for high biological false positive rates identified in actual regulatory network models.