28 resultados para partial least squares regression

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Levels of lignin and hydroxycinnamic acid wall components in three genera of forage grasses (Lolium,Festuca and Dactylis) have been accurately predicted by Fourier-transform infrared spectroscopy using partial least squares models correlated to analytical measurements. Different models were derived that predicted the concentrations of acid detergent lignin, total hydroxycinnamic acids, total ferulate monomers plus dimers, p-coumarate and ferulate dimers in independent spectral test data from methanol extracted samples of perennial forage grass with accuracies of 92.8%, 86.5%, 86.1%, 59.7% and 84.7% respectively, and analysis of model projection scores showed that the models relied generally on spectral features that are known absorptions of these compounds. Acid detergent lignin was predicted in samples of two species of energy grass, (Phalaris arundinacea and Pancium virgatum) with an accuracy of 84.5%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The accurate identification of T-cell epitopes remains a principal goal of bioinformatics within immunology. As the immunogenicity of peptide epitopes is dependent on their binding to major histocompatibility complex (MHC) molecules, the prediction of binding affinity is a prerequisite to the reliable prediction of epitopes. The iterative self-consistent (ISC) partial-least-squares (PLS)-based additive method is a recently developed bioinformatic approach for predicting class II peptide−MHC binding affinity. The ISC−PLS method overcomes many of the conceptual difficulties inherent in the prediction of class II peptide−MHC affinity, such as the binding of a mixed population of peptide lengths due to the open-ended class II binding site. The method has applications in both the accurate prediction of class II epitopes and the manipulation of affinity for heteroclitic and competitor peptides. The method is applied here to six class II mouse alleles (I-Ab, I-Ad, I-Ak, I-As, I-Ed, and I-Ek) and included peptides up to 25 amino acids in length. A series of regression equations highlighting the quantitative contributions of individual amino acids at each peptide position was established. The initial model for each allele exhibited only moderate predictivity. Once the set of selected peptide subsequences had converged, the final models exhibited a satisfactory predictive power. Convergence was reached between the 4th and 17th iterations, and the leave-one-out cross-validation statistical terms - q2, SEP, and NC - ranged between 0.732 and 0.925, 0.418 and 0.816, and 1 and 6, respectively. The non-cross-validated statistical terms r2 and SEE ranged between 0.98 and 0.995 and 0.089 and 0.180, respectively. The peptides used in this study are available from the AntiJen database (http://www.jenner.ac.uk/AntiJen). The PLS method is available commercially in the SYBYL molecular modeling software package. The resulting models, which can be used for accurate T-cell epitope prediction, will be made freely available online (http://www.jenner.ac.uk/MHCPred).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: The immunogenicity of peptides depends on their ability to bind to MHC molecules. MHC binding affinity prediction methods can save significant amounts of experimental work. The class II MHC binding site is open at both ends, making epitope prediction difficult because of the multiple binding ability of long peptides. Results: An iterative self-consistent partial least squares (PLS)-based additive method was applied to a set of 66 pep- tides no longer than 16 amino acids, binding to DRB1*0401. A regression equation containing the quantitative contributions of the amino acids at each of the nine positions was generated. Its predictability was tested using two external test sets which gave r pred =0.593 and r pred=0.655, respectively. Furthermore, it was benchmarked using 25 known T-cell epitopes restricted by DRB1*0401 and we compared our results with four other online predictive methods. The additive method showed the best result finding 24 of the 25 T-cell epitopes. Availability: Peptides used in the study are available from http://www.jenner.ac.uk/JenPep. The PLS method is available commercially in the SYBYL molecular modelling software package. The final model for affinity prediction of peptides binding to DRB1*0401 molecule is available at http://www.jenner.ac.uk/MHCPred. Models developed for DRB1*0101 and DRB1*0701 also are available in MHC- Pred

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data fluctuation in multiple measurements of Laser Induced Breakdown Spectroscopy (LIBS) greatly affects the accuracy of quantitative analysis. A new LIBS quantitative analysis method based on the Robust Least Squares Support Vector Machine (RLS-SVM) regression model is proposed. The usual way to enhance the analysis accuracy is to improve the quality and consistency of the emission signal, such as by averaging the spectral signals or spectrum standardization over a number of laser shots. The proposed method focuses more on how to enhance the robustness of the quantitative analysis regression model. The proposed RLS-SVM regression model originates from the Weighted Least Squares Support Vector Machine (WLS-SVM) but has an improved segmented weighting function and residual error calculation according to the statistical distribution of measured spectral data. Through the improved segmented weighting function, the information on the spectral data in the normal distribution will be retained in the regression model while the information on the outliers will be restrained or removed. Copper elemental concentration analysis experiments of 16 certified standard brass samples were carried out. The average value of relative standard deviation obtained from the RLS-SVM model was 3.06% and the root mean square error was 1.537%. The experimental results showed that the proposed method achieved better prediction accuracy and better modeling robustness compared with the quantitative analysis methods based on Partial Least Squares (PLS) regression, standard Support Vector Machine (SVM) and WLS-SVM. It was also demonstrated that the improved weighting function had better comprehensive performance in model robustness and convergence speed, compared with the four known weighting functions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chicken breast from nine products and from the following production regimes: conventional (chilled and frozen), organic and free range, were analysed for fatty acid composition of total lipids, preventative and chain breaking antioxidant contents and lipid oxidation during 5 days of sub-ambient storage following purchase. Total lipids were extracted with an optimal amount of a cold chloroform methanol solvent. Lipid compositions varied, but there were differences between conventional and organic products in their contents of total polyunsaturated fatty acids and n-3 and n-6 fatty acids and n-6:n-3 ratio. Of the antioxidants, a-tocopherol content was inversely correlated with lipid oxidation. The antioxidant enzyme activities of catalase, glutathione peroxidase and glutathione reductase varied between products. Modelling with partial least squares regression showed no overall relationship between total antioxidants and lipid data, but certain individual antioxidants showed a relationship with specific lipid fractions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consumers expect organic, free-range and corn-fed chicken to be nutritionally wholesome and have premium flavour characters. Interrelationships between flavour, fatty acids and antioxidants of retailed breasts were explored using simple correlations and chemometrics. Saturated fatty acid C16:0, and n-6 polyunsaturated C20:4 and C22:4 contents were correlated with lipid oxidation products (thiobarbituric acid reactive substances) and in partial least-squares regression (PLS1) with 32 high-resonance gas chromatography (flame ionization) flavour components (r2>0.90), and also linked (r2>0.80) to antioxidants (-tocopherol, glutathione and catalase). A further 10 high-resonance gas chromatography nitrogen phosphorus detector flavour components were correlated (r 2>0.85) with C18:3(n-3) content. Chicken character was correlated with C18:3(n-3), and C18:3(n-6) inversely with oily, off-flavour and lipid oxidation. Sweet, fruity and oily aromas were linked in PLS1 with 13 specific fatty acids (r2>0.6), and bland taste with total summed (six) fatty acid fractions (r2>0.81). Specific antioxidants were correlated with sweet, fruity and chicken aromas, and -tocopherol inversely with lipid oxidation. PLS2 confirmed relationships between fatty acid composition, antioxidants and the subsets of 32 and 10 flavour components. Clear relationships were thus observed between lipid and antioxidant compositions and flavour in chicken breast meat.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SSTRESs measure) gives rise to an annular structure when the input data is drawn from a high-dimensional isotropic distribution, and we provide a theoretical justification for this observation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantitative structure-activity relationship (QSAR) analysis is a cornerstone of modern informatics. Predictive computational models of peptide-major histocompatibility complex (MHC)-binding affinity based on QSAR technology have now become important components of modern computational immunovaccinology. Historically, such approaches have been built around semiqualitative, classification methods, but these are now giving way to quantitative regression methods. We review three methods--a 2D-QSAR additive-partial least squares (PLS) and a 3D-QSAR comparative molecular similarity index analysis (CoMSIA) method--which can identify the sequence dependence of peptide-binding specificity for various class I MHC alleles from the reported binding affinities (IC50) of peptide sets. The third method is an iterative self-consistent (ISC) PLS-based additive method, which is a recently developed extension to the additive method for the affinity prediction of class II peptides. The QSAR methods presented here have established themselves as immunoinformatic techniques complementary to existing methodology, useful in the quantitative prediction of binding affinity: current methods for the in silico identification of T-cell epitopes (which form the basis of many vaccines, diagnostics, and reagents) rely on the accurate computational prediction of peptide-MHC affinity. We have reviewed various human and mouse class I and class II allele models. Studied alleles comprise HLA-A*0101, HLA-A*0201, HLA-A*0202, HLA-A*0203, HLA-A*0206, HLA-A*0301, HLA-A*1101, HLA-A*3101, HLA-A*6801, HLA-A*6802, HLA-B*3501, H2-K(k), H2-K(b), H2-D(b) HLA-DRB1*0101, HLA-DRB1*0401, HLA-DRB1*0701, I-A(b), I-A(d), I-A(k), I-A(S), I-E(d), and I-E(k). In this chapter we show a step-by-step guide into predicting the reliability and the resulting models to represent an advance on existing methods. The peptides used in this study are available from the AntiJen database (http://www.jenner.ac.uk/AntiJen). The PLS method is available commercially in the SYBYL molecular modeling software package. The resulting models, which can be used for accurate T-cell epitope prediction, will be made are freely available online at the URL http://www.jenner.ac.uk/MHCPred.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.Results: A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.Conclusions: AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin. © 2013 Dimitrov et al.; licensee BioMed Central Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two energy grass species, switch grass, a North American tuft grass, and reed canary grass, a European native, are likely to be important sources of biomass in Western Europe for the production of biorenewable energy. Matching chemical composition to conversion efficiency is a primary goal for improvement programmes and for determining the quality of biomass feed-stocks prior to use and there is a need for methods which allow cost effective characterisation of chemical composition at high rates of sample through-put. In this paper we demonstrate that nitrogen content and alkali index, parameters greatly influencing thermal conversion efficiency, can be accurately predicted in dried samples of these species grown under a range of agronomic conditions by partial least square regression of Fourier transform infrared spectra (R2 values for plots of predicted vs. measured values of 0.938 and 0.937, respectively). We also discuss the prediction of carbon and ash content in these samples and the application of infrared based predictive methods for the breeding improvement of energy grasses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper provides the most fully comprehensive evidence to date on whether or not monetary aggregates are valuable for forecasting US inflation in the early to mid 2000s. We explore a wide range of different definitions of money, including different methods of aggregation and different collections of included monetary assets. In our forecasting experiment we use two non-linear techniques, namely, recurrent neural networks and kernel recursive least squares regression - techniques that are new to macroeconomics. Recurrent neural networks operate with potentially unbounded input memory, while the kernel regression technique is a finite memory predictor. The two methodologies compete to find the best fitting US inflation forecasting models and are then compared to forecasts from a naive random walk model. The best models were non-linear autoregressive models based on kernel methods. Our findings do not provide much support for the usefulness of monetary aggregates in forecasting inflation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study critically discusses findings from a research project involving four European countries. The project had two main aims. The first was to develop a systematic procedure for assessing the balance between knowledge and competencies acquired in higher, further and vocational education and the specific needs of the labor market. The second aim was to develop and test a set of meta-level quality indicators aimed at evaluating the linkages between education and employment. The project was designed to address the lack of employer input concerning the requirements of business graduates for successful workplace performance and the need for more specific industry-driven feedback to guide administrative heads at universities and personnel at quality assurance agencies in curriculum development and revision. Approach: The project was distinctive in that it combined different partners from higher education, vocational training, industry and quality assurance. Project partners designed and implemented an innovative approach, based on literature review, qualitative interviews and surveys in the four countries, in order to identify and confirm key knowledge and competency requirements. This study presents this step-by-step approach, as well as survey findings from a sample of 900 business graduates and employers. In addition, it introduces two Partial Least Squares (PLS) path models for predicting satisfaction with work performance and satisfaction with business education. Results: Survey findings revealed that employers were not very confident regarding business graduates’ abilities in key knowledge areas and in key generic competencies. In subsequent analysis, these graduate abilities were tested and identified as important predictors of employers’ satisfaction with graduates’ work performance. Conclusion: The industry-driven approach introduced in this study can serve as a guide to assist different types of educational institutions to better align study programs with changing labor market requirements. Recommendations for curriculum improvement are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Grounded in Vroom’s motivational framework of performance, we examine the interactive influence of collective human capital (ability) and aggregated service orientation (motivation) on the cross-level relationship between high-performance work systems (HPWS) and individual-level service quality. Results of hierarchical linear modeling (HLM) revealed that HPWS related to collective human capital and aggregated service orientation, which in turn related to individual-level service quality. Furthermore, both HLM and ordinary least squares regression analyses revealed a cross-level interaction effect of collective human capital and aggregated service orientation such that high levels of collective human capital and aggregated service orientation influence individual-level service quality.