920 resultados para Least-Squares prediction
Resumo:
Negative-ion mode electrospray ionization, ESI(-), with Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was coupled to a Partial Least Squares (PLS) regression and variable selection methods to estimate the total acid number (TAN) of Brazilian crude oil samples. Generally, ESI(-)-FT-ICR mass spectra present a power of resolution of ca. 500,000 and a mass accuracy less than 1 ppm, producing a data matrix containing over 5700 variables per sample. These variables correspond to heteroatom-containing species detected as deprotonated molecules, [M - H](-) ions, which are identified primarily as naphthenic acids, phenols and carbazole analog species. The TAN values for all samples ranged from 0.06 to 3.61 mg of KOH g(-1). To facilitate the spectral interpretation, three methods of variable selection were studied: variable importance in the projection (VIP), interval partial least squares (iPLS) and elimination of uninformative variables (UVE). The UVE method seems to be more appropriate for selecting important variables, reducing the dimension of the variables to 183 and producing a root mean square error of prediction of 0.32 mg of KOH g(-1). By reducing the size of the data, it was possible to relate the selected variables with their corresponding molecular formulas, thus identifying the main chemical species responsible for the TAN values.
Resumo:
The aim of this study was to compare REML/BLUP and Least Square procedures in the prediction and estimation of genetic parameters and breeding values in soybean progenies. F(2:3) and F(4:5) progenies were evaluated in the 2005/06 growing season and the F(2:4) and F(4:6) generations derived thereof were evaluated in 2006/07. These progenies were originated from two semi-early, experimental lines that differ in grain yield. The experiments were conducted in a lattice design and plots consisted of a 2 m row, spaced 0.5 m apart. The trait grain yield per plot was evaluated. It was observed that early selection is more efficient for the discrimination of the best lines from the F(4) generation onwards. No practical differences were observed between the least square and REML/BLUP procedures in the case of the models and simplifications for REML/BLUP used here.
Resumo:
Visible and near infrared (vis-NIR) spectroscopy is widely used to detect soil properties. The objective of this study is to evaluate the combined effect of moisture content (MC) and the modeling algorithm on prediction of soil organic carbon (SOC) and pH. Partial least squares (PLS) and the Artificial neural network (ANN) for modeling of SOC and pH at different MC levels were compared in terms of efficiency in prediction of regression. A total of 270 soil samples were used. Before spectral measurement, dry soil samples were weighed to determine the amount of water to be added by weight to achieve the specified gravimetric MC levels of 5, 10, 15, 20, and 25 %. A fiber-optic vis-NIR spectrophotometer (350-2500 nm) was used to measure spectra of soil samples in the diffuse reflectance mode. Spectra preprocessing and PLS regression were carried using Unscrambler® software. Statistica® software was used for ANN modeling. The best prediction result for SOC was obtained using the ANN (RMSEP = 0.82 % and RPD = 4.23) for soil samples with 25 % MC. The best prediction results for pH were obtained with PLS for dry soil samples (RMSEP = 0.65 % and RPD = 1.68) and soil samples with 10 % MC (RMSEP = 0.61 % and RPD = 1.71). Whereas the ANN showed better performance for SOC prediction at all MC levels, PLS showed better predictive accuracy of pH at all MC levels except for 25 % MC. Therefore, based on the data set used in the current study, the ANN is recommended for the analyses of SOC at all MC levels, whereas PLS is recommended for the analysis of pH at MC levels below 20 %.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
This paper studied two different regression techniques for pelvic shape prediction, i.e., the partial least square regression (PLSR) and the principal component regression (PCR). Three different predictors such as surface landmarks, morphological parameters, or surface models of neighboring structures were used in a cross-validation study to predict the pelvic shape. Results obtained from applying these two different regression techniques were compared to the population mean model. In almost all the prediction experiments, both regression techniques unanimously generated better results than the population mean model, while the difference on prediction accuracy between these two regression methods is not statistically significant (α=0.01).
Resumo:
The accurate identification of T-cell epitopes remains a principal goal of bioinformatics within immunology. As the immunogenicity of peptide epitopes is dependent on their binding to major histocompatibility complex (MHC) molecules, the prediction of binding affinity is a prerequisite to the reliable prediction of epitopes. The iterative self-consistent (ISC) partial-least-squares (PLS)-based additive method is a recently developed bioinformatic approach for predicting class II peptide−MHC binding affinity. The ISC−PLS method overcomes many of the conceptual difficulties inherent in the prediction of class II peptide−MHC affinity, such as the binding of a mixed population of peptide lengths due to the open-ended class II binding site. The method has applications in both the accurate prediction of class II epitopes and the manipulation of affinity for heteroclitic and competitor peptides. The method is applied here to six class II mouse alleles (I-Ab, I-Ad, I-Ak, I-As, I-Ed, and I-Ek) and included peptides up to 25 amino acids in length. A series of regression equations highlighting the quantitative contributions of individual amino acids at each peptide position was established. The initial model for each allele exhibited only moderate predictivity. Once the set of selected peptide subsequences had converged, the final models exhibited a satisfactory predictive power. Convergence was reached between the 4th and 17th iterations, and the leave-one-out cross-validation statistical terms - q2, SEP, and NC - ranged between 0.732 and 0.925, 0.418 and 0.816, and 1 and 6, respectively. The non-cross-validated statistical terms r2 and SEE ranged between 0.98 and 0.995 and 0.089 and 0.180, respectively. The peptides used in this study are available from the AntiJen database (http://www.jenner.ac.uk/AntiJen). The PLS method is available commercially in the SYBYL molecular modeling software package. The resulting models, which can be used for accurate T-cell epitope prediction, will be made freely available online (http://www.jenner.ac.uk/MHCPred).
Resumo:
Motivation: The immunogenicity of peptides depends on their ability to bind to MHC molecules. MHC binding affinity prediction methods can save significant amounts of experimental work. The class II MHC binding site is open at both ends, making epitope prediction difficult because of the multiple binding ability of long peptides. Results: An iterative self-consistent partial least squares (PLS)-based additive method was applied to a set of 66 pep- tides no longer than 16 amino acids, binding to DRB1*0401. A regression equation containing the quantitative contributions of the amino acids at each of the nine positions was generated. Its predictability was tested using two external test sets which gave r pred =0.593 and r pred=0.655, respectively. Furthermore, it was benchmarked using 25 known T-cell epitopes restricted by DRB1*0401 and we compared our results with four other online predictive methods. The additive method showed the best result finding 24 of the 25 T-cell epitopes. Availability: Peptides used in the study are available from http://www.jenner.ac.uk/JenPep. The PLS method is available commercially in the SYBYL molecular modelling software package. The final model for affinity prediction of peptides binding to DRB1*0401 molecule is available at http://www.jenner.ac.uk/MHCPred. Models developed for DRB1*0101 and DRB1*0701 also are available in MHC- Pred
Resumo:
High-throughput techniques are necessary to efficiently screen potential lignocellulosic feedstocks for the production of renewable fuels, chemicals, and bio-based materials, thereby reducing experimental time and expense while supplanting tedious, destructive methods. The ratio of lignin syringyl (S) to guaiacyl (G) monomers has been routinely quantified as a way to probe biomass recalcitrance. Mid-infrared and Raman spectroscopy have been demonstrated to produce robust partial least squares models for the prediction of lignin S/G ratios in a diverse group of Acacia and eucalypt trees. The most accurate Raman model has now been used to predict the S/G ratio from 269 unknown Acacia and eucalypt feedstocks. This study demonstrates the application of a partial least squares model composed of Raman spectral data and lignin S/G ratios measured using pyrolysis/molecular beam mass spectrometry (pyMBMS) for the prediction of S/G ratios in an unknown data set. The predicted S/G ratios calculated by the model were averaged according to plant species, and the means were not found to differ from the pyMBMS ratios when evaluating the mean values of each method within the 95 % confidence interval. Pairwise comparisons within each data set were employed to assess statistical differences between each biomass species. While some pairwise appraisals failed to differentiate between species, Acacias, in both data sets, clearly display significant differences in their S/G composition which distinguish them from eucalypts. This research shows the power of using Raman spectroscopy to supplant tedious, destructive methods for the evaluation of the lignin S/G ratio of diverse plant biomass materials. © 2015, The Author(s).
Resumo:
The aim of this study was to investigate the effects of numerous milk compositional factors on milk coagulation properties using Partial Least Squares (PLS). Milk from herds of Jersey and Holstein-Friesian cattle was collected across the year and blended (n=55), to maximize variation in composition and coagulation. The milk was analysed for casein, protein, fat, titratable acidity, lactose, Ca2+, urea content, micelles size, fat globule size, somatic cell count and pH. Milk coagulation properties were defined as coagulation time, curd firmness and curd firmness rate measured by a controlled strain rheometer. The models derived from PLS had higher predictive power than previous models demonstrating the value of measuring more milk components. In addition to the well-established relationships with casein and protein levels, CMS and fat globule size were found to have as strong impact on all of the three models. The study also found a positive impact of fat on milk coagulation properties and a strong relationship between lactose and curd firmness, and urea and curd firmness rate, all of which warrant further investigation due to current lack of knowledge of the underlying mechanism. These findings demonstrate the importance of using a wider range of milk compositional variable for the prediction of the milk coagulation properties, and hence as indicators of milk suitability for cheese making.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Data fluctuation in multiple measurements of Laser Induced Breakdown Spectroscopy (LIBS) greatly affects the accuracy of quantitative analysis. A new LIBS quantitative analysis method based on the Robust Least Squares Support Vector Machine (RLS-SVM) regression model is proposed. The usual way to enhance the analysis accuracy is to improve the quality and consistency of the emission signal, such as by averaging the spectral signals or spectrum standardization over a number of laser shots. The proposed method focuses more on how to enhance the robustness of the quantitative analysis regression model. The proposed RLS-SVM regression model originates from the Weighted Least Squares Support Vector Machine (WLS-SVM) but has an improved segmented weighting function and residual error calculation according to the statistical distribution of measured spectral data. Through the improved segmented weighting function, the information on the spectral data in the normal distribution will be retained in the regression model while the information on the outliers will be restrained or removed. Copper elemental concentration analysis experiments of 16 certified standard brass samples were carried out. The average value of relative standard deviation obtained from the RLS-SVM model was 3.06% and the root mean square error was 1.537%. The experimental results showed that the proposed method achieved better prediction accuracy and better modeling robustness compared with the quantitative analysis methods based on Partial Least Squares (PLS) regression, standard Support Vector Machine (SVM) and WLS-SVM. It was also demonstrated that the improved weighting function had better comprehensive performance in model robustness and convergence speed, compared with the four known weighting functions.
Resumo:
State of Sao Paulo Research Foundation (FAPESP)
Resumo:
We present a novel array RLS algorithm with forgetting factor that circumvents the problem of fading regularization, inherent to the standard exponentially-weighted RLS, by allowing for time-varying regularization matrices with generic structure. Simulations in finite precision show the algorithm`s superiority as compared to alternative algorithms in the context of adaptive beamforming.
Resumo:
Signal Processing, Vol. 86, nº 10
Resumo:
In this paper we propose the use of the least-squares based methods for obtaining digital rational approximations (IIR filters) to fractional-order integrators and differentiators of type sα, α∈R. Adoption of the Padé, Prony and Shanks techniques is suggested. These techniques are usually applied in the signal modeling of deterministic signals. These methods yield suboptimal solutions to the problem which only requires finding the solution of a set of linear equations. The results reveal that the least-squares approach gives similar or superior approximations in comparison with other widely used methods. Their effectiveness is illustrated, both in the time and frequency domains, as well in the fractional differintegration of some standard time domain functions.