34 resultados para partial least square
em Aston University Research Archive
Resumo:
When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.
Resumo:
Levels of lignin and hydroxycinnamic acid wall components in three genera of forage grasses (Lolium,Festuca and Dactylis) have been accurately predicted by Fourier-transform infrared spectroscopy using partial least squares models correlated to analytical measurements. Different models were derived that predicted the concentrations of acid detergent lignin, total hydroxycinnamic acids, total ferulate monomers plus dimers, p-coumarate and ferulate dimers in independent spectral test data from methanol extracted samples of perennial forage grass with accuracies of 92.8%, 86.5%, 86.1%, 59.7% and 84.7% respectively, and analysis of model projection scores showed that the models relied generally on spectral features that are known absorptions of these compounds. Acid detergent lignin was predicted in samples of two species of energy grass, (Phalaris arundinacea and Pancium virgatum) with an accuracy of 84.5%.
Resumo:
The accurate identification of T-cell epitopes remains a principal goal of bioinformatics within immunology. As the immunogenicity of peptide epitopes is dependent on their binding to major histocompatibility complex (MHC) molecules, the prediction of binding affinity is a prerequisite to the reliable prediction of epitopes. The iterative self-consistent (ISC) partial-least-squares (PLS)-based additive method is a recently developed bioinformatic approach for predicting class II peptide−MHC binding affinity. The ISC−PLS method overcomes many of the conceptual difficulties inherent in the prediction of class II peptide−MHC affinity, such as the binding of a mixed population of peptide lengths due to the open-ended class II binding site. The method has applications in both the accurate prediction of class II epitopes and the manipulation of affinity for heteroclitic and competitor peptides. The method is applied here to six class II mouse alleles (I-Ab, I-Ad, I-Ak, I-As, I-Ed, and I-Ek) and included peptides up to 25 amino acids in length. A series of regression equations highlighting the quantitative contributions of individual amino acids at each peptide position was established. The initial model for each allele exhibited only moderate predictivity. Once the set of selected peptide subsequences had converged, the final models exhibited a satisfactory predictive power. Convergence was reached between the 4th and 17th iterations, and the leave-one-out cross-validation statistical terms - q2, SEP, and NC - ranged between 0.732 and 0.925, 0.418 and 0.816, and 1 and 6, respectively. The non-cross-validated statistical terms r2 and SEE ranged between 0.98 and 0.995 and 0.089 and 0.180, respectively. The peptides used in this study are available from the AntiJen database (http://www.jenner.ac.uk/AntiJen). The PLS method is available commercially in the SYBYL molecular modeling software package. The resulting models, which can be used for accurate T-cell epitope prediction, will be made freely available online (http://www.jenner.ac.uk/MHCPred).
Resumo:
Motivation: The immunogenicity of peptides depends on their ability to bind to MHC molecules. MHC binding affinity prediction methods can save significant amounts of experimental work. The class II MHC binding site is open at both ends, making epitope prediction difficult because of the multiple binding ability of long peptides. Results: An iterative self-consistent partial least squares (PLS)-based additive method was applied to a set of 66 pep- tides no longer than 16 amino acids, binding to DRB1*0401. A regression equation containing the quantitative contributions of the amino acids at each of the nine positions was generated. Its predictability was tested using two external test sets which gave r pred =0.593 and r pred=0.655, respectively. Furthermore, it was benchmarked using 25 known T-cell epitopes restricted by DRB1*0401 and we compared our results with four other online predictive methods. The additive method showed the best result finding 24 of the 25 T-cell epitopes. Availability: Peptides used in the study are available from http://www.jenner.ac.uk/JenPep. The PLS method is available commercially in the SYBYL molecular modelling software package. The final model for affinity prediction of peptides binding to DRB1*0401 molecule is available at http://www.jenner.ac.uk/MHCPred. Models developed for DRB1*0101 and DRB1*0701 also are available in MHC- Pred
Resumo:
Two energy grass species, switch grass, a North American tuft grass, and reed canary grass, a European native, are likely to be important sources of biomass in Western Europe for the production of biorenewable energy. Matching chemical composition to conversion efficiency is a primary goal for improvement programmes and for determining the quality of biomass feed-stocks prior to use and there is a need for methods which allow cost effective characterisation of chemical composition at high rates of sample through-put. In this paper we demonstrate that nitrogen content and alkali index, parameters greatly influencing thermal conversion efficiency, can be accurately predicted in dried samples of these species grown under a range of agronomic conditions by partial least square regression of Fourier transform infrared spectra (R2 values for plots of predicted vs. measured values of 0.938 and 0.937, respectively). We also discuss the prediction of carbon and ash content in these samples and the application of infrared based predictive methods for the breeding improvement of energy grasses.
Resumo:
A rapid method for the analysis of biomass feedstocks was established to identify the quality of the pyrolysis products likely to impact on bio-oil production. A total of 15 Lolium and Festuca grasses known to exhibit a range of Klason lignin contents were analysed by pyroprobe-GC/MS (Py-GC/MS) to determine the composition of the thermal degradation products of lignin. The identification of key marker compounds which are the derivatives of the three major lignin subunits (G, H, and S) allowed pyroprobe-GC/MS to be statistically correlated to the Klason lignin content of the biomass using the partial least-square method to produce a calibration model. Data from this multivariate modelling procedure was then applied to identify likely "key marker" ions representative of the lignin subunits from the mass spectral data. The combined total abundance of the identified key markers for the lignin subunits exhibited a linear relationship with the Klason lignin content. In addition the effect of alkali metal concentration on optimum pyrolysis characteristics was also examined. Washing of the grass samples removed approximately 70% of the metals and changed the characteristics of the thermal degradation process and products. Overall the data indicate that both the organic and inorganic specification of the biofuel impacts on the pyrolysis process and that pyroprobe-GC/MS is a suitable analytical technique to asses lignin composition. © 2007 Elsevier B.V. All rights reserved.
Resumo:
Relationships among quality factors in retailed free-range, corn-fed, organic, and conventional chicken breasts (9) were modeled using chemometric approaches. Use of principal component analysis (PCA) to neutral lipid composition data explained the majority (93%) of variability (variance) in fatty acid contents in 2 significant multivariate factors. PCA explained 88 and 75% variance in 3 factors for, respectively, flame ionization detection (FID) and nitrogen phosphorus (NPD) components in chromatographic flavor data from cooked chicken after simultaneous distillation extraction. Relationships to tissue antioxidant contents were modeled. Partial least square regression (PLS2), interrelating total data matrices, provided no useful models. By using single antioxidants as Y variables in PLS (1), good models (r2 values > 0.9) were obtained for alpha-tocopherol, glutathione, catalase, glutathione peroxidase, and reductase and FID flavor components and among the variables total mono and polyunsaturated fatty acids and subsets of FID, and saturated fatty acid and NPD components. Alpha-tocopherol had a modest (r2 = 0.63) relationship with neutral lipid n-3 fatty acid content. Such factors thus relate to flavor development and quality in chicken breast meat.
Resumo:
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.
Resumo:
Purpose - The paper aims to examine the role of market orientation (MO) and innovation capability in determining business performance during an economic upturn and downturn. Design/methodology/approach - The data comprise two national-level surveys conducted in Finland in 2008, representing an economic boom, and in 2010 when the global economic crisis had hit the Finnish market. Partial least square path analysis is used to test the potential mediating effect of innovation capability on the relationship between MO and business performance during economic boom and bust. Findings - The results show that innovation capability fully mediates the performance effects of a MO during an economic upturn, whereas the mediation is only partial during a downturn. Innovation capability also mediates the relationship between a customer orientation and business performance during an upturn, whereas the mediating effect culminates in a competitor orientation during a downturn. Thus, the role of innovation capability as a mediator between the individual market-orientation components varies along the business cycle. Originality/value - This paper is one of the first studies that empirically examine the impact of the economic cycle on the relationship between strategic marketing concepts, such as MO or innovation capability, and the firm's business performance.
Resumo:
Objective In this study, we have used a chemometrics-based method to correlate key liposomal adjuvant attributes with in-vivo immune responses based on multivariate analysis. Methods The liposomal adjuvant composed of the cationic lipid dimethyldioctadecylammonium bromide (DDA) and trehalose 6,6-dibehenate (TDB) was modified with 1,2-distearoyl-sn-glycero-3-phosphocholine at a range of mol% ratios, and the main liposomal characteristics (liposome size and zeta potential) was measured along with their immunological performance as an adjuvant for the novel, postexposure fusion tuberculosis vaccine, Ag85B-ESAT-6-Rv2660c (H56 vaccine). Partial least square regression analysis was applied to correlate and cluster liposomal adjuvants particle characteristics with in-vivo derived immunological performances (IgG, IgG1, IgG2b, spleen proliferation, IL-2, IL-5, IL-6, IL-10, IFN-γ). Key findings While a range of factors varied in the formulations, decreasing the 1,2-distearoyl-sn-glycero-3-phosphocholine content (and subsequent zeta potential) together built the strongest variables in the model. Enhanced DDA and TDB content (and subsequent zeta potential) stimulated a response skewed towards a cell mediated immunity, with the model identifying correlations with IFN-γ, IL-2 and IL-6. Conclusion This study demonstrates the application of chemometrics-based correlations and clustering, which can inform liposomal adjuvant design.
Resumo:
Circulating low density lipoproteins (LDL) are thought to play a crucial role in the onset and development of atherosclerosis, though the detailed molecular mechanisms responsible for their biological effects remain controversial. The complexity of biomolecules (lipids, glycans and protein) and structural features (isoforms and chemical modifications) found in LDL particles hampers the complete understanding of the mechanism underlying its atherogenicity. For this reason the screening of LDL for features discriminative of a particular pathology in search of biomarkers is of high importance. Three major biomolecule classes (lipids, protein and glycans) in LDL particles were screened using mass spectrometry coupled to liquid chromatography. Dual-polarity screening resulted in good lipidome coverage, identifying over 300 lipid species from 12 lipid sub-classes. Multivariate analysis was used to investigate potential discriminators in the individual lipid sub-classes for different study groups (age, gender, pathology). Additionally, the high protein sequence coverage of ApoB-100 routinely achieved (≥70%) assisted in the search for protein modifications correlating to aging and pathology. The large size and complexity of the datasets required the use of chemometric methods (Partial Least Square-Discriminant Analysis, PLS-DA) for their analysis and for the identification of ions that discriminate between study groups. The peptide profile from enzymatically digested ApoB-100 can be correlated with the high structural complexity of lipids associated with ApoB-100 using exploratory data analysis. In addition, using targeted scanning modes, glycosylation sites within neutral and acidic sugar residues in ApoB-100 are also being explored. Together or individually, knowledge of the profiles and modifications of the major biomolecules in LDL particles will contribute towards an in-depth understanding, will help to map the structural features that contribute to the atherogenicity of LDL, and may allow identification of reliable, pathology-specific biomarkers. This research was supported by a Marie Curie Intra-European Fellowship within the 7th European Community Framework Program (IEF 255076). Work of A. Rudnitskaya was supported by Portuguese Science and Technology Foundation, through the European Social Fund (ESF) and "Programa Operacional Potencial Humano - POPH".
Resumo:
Data fluctuation in multiple measurements of Laser Induced Breakdown Spectroscopy (LIBS) greatly affects the accuracy of quantitative analysis. A new LIBS quantitative analysis method based on the Robust Least Squares Support Vector Machine (RLS-SVM) regression model is proposed. The usual way to enhance the analysis accuracy is to improve the quality and consistency of the emission signal, such as by averaging the spectral signals or spectrum standardization over a number of laser shots. The proposed method focuses more on how to enhance the robustness of the quantitative analysis regression model. The proposed RLS-SVM regression model originates from the Weighted Least Squares Support Vector Machine (WLS-SVM) but has an improved segmented weighting function and residual error calculation according to the statistical distribution of measured spectral data. Through the improved segmented weighting function, the information on the spectral data in the normal distribution will be retained in the regression model while the information on the outliers will be restrained or removed. Copper elemental concentration analysis experiments of 16 certified standard brass samples were carried out. The average value of relative standard deviation obtained from the RLS-SVM model was 3.06% and the root mean square error was 1.537%. The experimental results showed that the proposed method achieved better prediction accuracy and better modeling robustness compared with the quantitative analysis methods based on Partial Least Squares (PLS) regression, standard Support Vector Machine (SVM) and WLS-SVM. It was also demonstrated that the improved weighting function had better comprehensive performance in model robustness and convergence speed, compared with the four known weighting functions.
Resumo:
The Fe Mössbauer spectroscopy of mononuclear [Fe(II)(isoxazole)](ClO) has been studied to reveal the thermal spin crossover of Fe(II) between low-spin (S = 0) and high-spin (S = 2) states. Temperaturedependent spin transition curves have been constructed with the least-square fitted data obtained from the Mössbauer spectra measured at various temperatures between 84 and 270 K during a cooling and heating cycle. This compound exhibits an unusual temperature-dependent spin transition behaviour with T(?) = 223 and T(?) = 213 K occurring in the reverse order in comparison to those observed in SQUID observation and many other spin transition compounds. The compound has three high-spin Fe(II) sites at the highest temperature of study of which two undergo spin transitions. The compound seems to undergo a structural phase transition around the spin transition temperature, which plays a significant role in the spin crossover behaviour as well as the magnetic properties of the compound at temperatures below T. The present study reveals an increase in high-spin fraction upon heating in the temperature range below T, and an explanation is provided.
Resumo:
1. Pearson's correlation coefficient only tests whether the data fit a linear model. With large numbers of observations, quite small values of r become significant and the X variable may only account for a minute proportion of the variance in Y. Hence, the value of r squared should always be calculated and included in a discussion of the significance of r. 2. The use of r assumes that a bivariate normal distribution is present and this assumption should be examined prior to the study. If Pearson's r is not appropriate, then a non-parametric correlation coefficient such as Spearman's rs may be used. 3. A significant correlation should not be interpreted as indicating causation especially in observational studies in which there is a high probability that the two variables are correlated because of their mutual correlations with other variables. 4. In studies of measurement error, there are problems in using r as a test of reliability and the ‘intra-class correlation coefficient’ should be used as an alternative. A correlation test provides only limited information as to the relationship between two variables. Fitting a regression line to the data using the method known as ‘least square’ provides much more information and the methods of regression and their application in optometry will be discussed in the next article.
Using interior point algorithms for the solution of linear programs with special structural features
Resumo:
Linear Programming (LP) is a powerful decision making tool extensively used in various economic and engineering activities. In the early stages the success of LP was mainly due to the efficiency of the simplex method. After the appearance of Karmarkar's paper, the focus of most research was shifted to the field of interior point methods. The present work is concerned with investigating and efficiently implementing the latest techniques in this field taking sparsity into account. The performance of these implementations on different classes of LP problems is reported here. The preconditional conjugate gradient method is one of the most powerful tools for the solution of the least square problem, present in every iteration of all interior point methods. The effect of using different preconditioners on a range of problems with various condition numbers is presented. Decomposition algorithms has been one of the main fields of research in linear programming over the last few years. After reviewing the latest decomposition techniques, three promising methods were chosen the implemented. Sparsity is again a consideration and suggestions have been included to allow improvements when solving problems with these methods. Finally, experimental results on randomly generated data are reported and compared with an interior point method. The efficient implementation of the decomposition methods considered in this study requires the solution of quadratic subproblems. A review of recent work on algorithms for convex quadratic was performed. The most promising algorithms are discussed and implemented taking sparsity into account. The related performance of these algorithms on randomly generated separable and non-separable problems is also reported.