18 resultados para cross validation
Resumo:
It is known theoretically that an algorithm cannot be good for an arbitrary prior. We show that in practical terms this also applies to the technique of ``cross validation'', which has been widely regarded as defying this general rule. Numerical examples are analysed in detail. Their implications to researches on learning algorithms are discussed.
Resumo:
The thrust of this report concerns spline theory and some of the background to spline theory and follows the development in (Wahba, 1991). We also review methods for determining hyper-parameters, such as the smoothing parameter, by Generalised Cross Validation. Splines have an advantage over Gaussian Process based procedures in that we can readily impose atmospherically sensible smoothness constraints and maintain computational efficiency. Vector splines enable us to penalise gradients of vorticity and divergence in wind fields. Two similar techniques are summarised and improvements based on robust error functions and restricted numbers of basis functions given. A final, brief discussion of the application of vector splines to the problem of scatterometer data assimilation highlights the problems of ambiguous solutions.
Resumo:
This Letter addresses image segmentation via a generative model approach. A Bayesian network (BNT) in the space of dyadic wavelet transform coefficients is introduced to model texture images. The model is similar to a Hidden Markov model (HMM), but with non-stationary transitive conditional probability distributions. It is composed of discrete hidden variables and observable Gaussian outputs for wavelet coefficients. In particular, the Gabor wavelet transform is considered. The introduced model is compared with the simplest joint Gaussian probabilistic model for Gabor wavelet coefficients for several textures from the Brodatz album [1]. The comparison is based on cross-validation and includes probabilistic model ensembles instead of single models. In addition, the robustness of the models to cope with additive Gaussian noise is investigated. We further study the feasibility of the introduced generative model for image segmentation in the novelty detection framework [2]. Two examples are considered: (i) sea surface pollution detection from intensity images and (ii) image segmentation of the still images with varying illumination across the scene.
Resumo:
An interoperable web processing service (WPS) for the automatic interpolation of environmental data has been developed in the frame of the INTAMAP project. In order to assess the performance of the interpolation method implemented, a validation WPS has also been developed. This validation WPS can be used to perform leave one out and K-fold cross validation: a full dataset is submitted and a range of validation statistics and diagnostic plots (e.g. histograms, variogram of residuals, mean errors) is received in return. This paper presents the architecture of the validation WPS and a case study is used to briefly illustrate its use in practice. We conclude with a discussion on the current limitations of the system and make proposals for further developments.
Resumo:
This project explored how consumers in emerging economies evaluate brand extension by using China as a case. Two separate but related studies were conducted, and university students were used as respondents in both the studies. Study one or replication study tested Aaker and Keller's brand extension model in China. Assuming similar methods to Aaker and Keller's, six well-recognised brands were chosen as parent brand and each was extended to three product categories. Totally, 469 respondents completed the survey questionnaire. As each was to evaluate six extensions, this made the cases 2814. The data was analysed using Optimal Least Square regression approach and "residual centred" approach respectively. The result confirmed most of the findings observed in developed countries. Specifically, consumer's attitude towards the extension is primarily driven by the brand affect, the fit between the two product categories, the difficulty of making the extension and moderated via the interactions between the brand affect and the fit variables. Study two refined and extended Aaker and Keller's model by adding new variables and making methodological adjustments. The same stimuli and data analysis techniques as those in the replication were employed. 252 respondents participated in the survey and each evaluated six extensions, making cases 1512. In addition to re-verifying the findings of the replication and providing cross validation to these findings, the extended study found that the image consistency between the parent brand and the extension, the competition intensity of the extension product market were important in determining the success of the extension. Further, consumer differed in evaluating durable extensions and non-durable extensions. The thesis detailed the two studies above, and discussed the findings and their implications by relating to branding literature, to the general situation of the emerging economies as well as the reality of China. It also presented the limitations of the research and the future research directions.
Resumo:
The work present in this thesis was aimed at assessing the efficacy of lithium in the acute treatment of mania and for the prophylaxis of bipolar disorder, and investigating the value of plasma haloperidol concentration for predicting response to treatment in schizophrenia. The pharmacogenetics of psychotropic drugs is critically appraised to provide insights into interindividual variability in response to pharmacotherapy, In clinical trials of acute mania, a number of measures have been used to assess the severity of illness and its response to treatment. Rating instruments need to be validated in order for a clinical study to provide reliable and meaningful estimates of treatment effects, Eight symptom-rating scales were identified and critically assessed, The Mania Rating Scale (MRS) was the most commonly used for assessing treatment response, The advantage of the MRS is that there is a relatively extensive database of studies based on it and this will no doubt ensure that it remains a gold standard for the foreseeable future. Other useful rating scales are available for measuring mania but further cross-validation and validation against clinically meaningful global changes are required. A total of 658 patients from 12 trials were included in an evaluation of the efficacy of lithium in the treatment of acute mania. Treatment periods ranged from 3 to 4 weeks. Efficacy was estimated using (i) the differences in the reduction in mania severity scores, and (ii) the ratio and difference in improvement response rates. The response rate ratio for lithium against placebo was 1.95 (95% CI 1.17 to 3.23). The mean number needed to treat was 5 (95% CI 3 to 20). Patients were twice as likely to obtain remission with lithium than with chlorpromazine (rate ratio = 1.96, 95% CI 1.02 to 3.77). The mean number needed to treat (NNT) was 4 (95% CI 3 to 9). Neither carbamazepine nor valproate was more effective than lithium. The response rate ratios were 1.01 (95% CI 0.54 to 1.88) for lithium compared to carbarnazepine and 1.22 (95% CI 0.91 to 1.64) for lithium against valproate. Haloperidol was no better than lithium on the basis of improvement based on assessment of global severity. The differences in effects between lithium and risperidone were -2.79 (95% CI -4.22 to -1.36) in favour of risperidone with respect to symptom severity improvement and -0.76 (95% CI -1.11 to -0,41) on the basis of reduction in global severity of disease. Symptom and global severity was at least as well controlIed with lithium as with verapamil. Lithium caused more side-effects than placebo and verapamil, but no more than carbamazepine or valproate. A total of 554 patients from 13 trials were included in the statistical analysis of lithium's efficacy in the prophylaxis of bipolar disorder. The mean follow-up period was 5-34 months. The relapse risk ratio for lithium versus placebo was 0.47 (95% CI 0.26 to 0.86) and the NNT was 3 (95% CI 2 to 7). The relapse risk ratio for lithium versus imipramine was 0.62 (95% CI 0.46 to 0.84) and the NNT was 4 (951% Cl 3 to 7), The combination of lithium and imipramine was no more effective than lithium alone. The risk of relapse was greater with lithium alone than with the lithium-divalproate combination. A risk difference of 0.60 (95% CI 0.21 to 0.99) and an NNT of 2 (95% CI 1 to 5) were obtained. Lithium was as effective as carbamazepine. Based on individual data concerning plasma haloperidol concentration and percent improvement in psychotic symptoms, our results suggest an acceptable concentration range of 11.20-30.30 ng/mL A minimum of 2 weeks should be allowed before evaluating therapeutic response. Monitoring of drug plasma levels seems not to be necessary unless behavioural toxicity or noncompliance is suspected. Pharmacokinetics and pharmacodynamics, which are mainly determined by genetic factors, contribute to interindividual and interethnic variations in clinical response to drugs. These variations are primarily due to differences in drug metabolism. Variability in pharmacokinetics of a number of drugs is associated with oxidation polymorphism. Debrisoquine/sparteine hydroxylase (CYP2D6) and the S-mephenytoin hydroxylase (CYP2C19) are polymorphic P450 enzymes with particular importance in psychopharmacotherapy. The enzymes are responsible for the metabolism of many commonly used antipsychotic and antidepressant drugs. The incidence of poor metabolisers of debrisoquine and S-mephenytoin varies widely among populations. Ethnic variations in polymorphic isoenzymes may, at least in part, explain ethnic differences in response to pharmacotherapy of antipsychotics and antidepressant drugs.
Resumo:
Fifteen Miscanthus genotypes grown in five locations across Europe were analysed to investigate the influence of genetic and environmental factors on cell wall composition. Chemometric techniques combining near infrared reflectance spectroscopy (NIRS) and conventional chemical analyses were used to construct calibration models for determination of acid detergent lignin (ADL), acid detergent fibre (ADF), and neutral detergent fibre (NDF) from sample spectra. Results generated were subsequently converted to lignin, cellulose and hemicellulose content and used to assess the genetic and environmental variation in cell wall composition of Miscanthus and to identify genotypes which display quality traits suitable for exploitation in a range of energy conversion systems. The NIRS calibration models developed were found to predict concentrations with a good degree of accuracy based on the coefficient of determination (R2), standard error of calibration (SEC), and standard error of cross-validation (SECV) values. Across all sites mean lignin, cellulose and hemicellulose values in the winter harvest ranged from 76–115 g kg-1, 412–529 g kg-1, and 235–338 g kg-1 respectively. Overall, of the 15 genotypes Miscanthus x giganteus and Miscanthus sacchariflorus contained higher lignin and cellulose concentrations in the winter harvest. The degree of observed genotypic variation in cell wall composition indicates good potential for plant breeding and matching feedstocks to be optimised to different energy conversion processes.
Resumo:
We investigate two numerical procedures for the Cauchy problem in linear elasticity, involving the relaxation of either the given boundary displacements (Dirichlet data) or the prescribed boundary tractions (Neumann data) on the over-specified boundary, in the alternating iterative algorithm of Kozlov et al. (1991). The two mixed direct (well-posed) problems associated with each iteration are solved using the method of fundamental solutions (MFS), in conjunction with the Tikhonov regularization method, while the optimal value of the regularization parameter is chosen via the generalized cross-validation (GCV) criterion. An efficient regularizing stopping criterion which ceases the iterative procedure at the point where the accumulation of noise becomes dominant and the errors in predicting the exact solutions increase, is also presented. The MFS-based iterative algorithms with relaxation are tested for Cauchy problems for isotropic linear elastic materials in various geometries to confirm the numerical convergence, stability, accuracy and computational efficiency of the proposed method.
Resumo:
Background - Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Results - Bacterial, viral and tumour protein datasets were used to derive models for prediction of whole protein antigenicity. Every set consisted of 100 known antigens and 100 non-antigens. The derived models were tested by internal leave-one-out cross-validation and external validation using test sets. An additional five training sets for each class of antigens were used to test the stability of the discrimination between antigens and non-antigens. The models performed well in both validations showing prediction accuracy of 70% to 89%. The models were implemented in a server, which we call VaxiJen. Conclusion - VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. The server can be used on its own or in combination with alignment-based prediction methods.
Resumo:
Background - Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. Results - The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 AROC for the MHCBench data sets (up from 0.756), and an average of 0.96 AROC for multiple alleles of the MHCPEP database. Conclusion - The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems.
Resumo:
Subunit vaccine discovery is an accepted clinical priority. The empirical approach is time- and labor-consuming and can often end in failure. Rational information-driven approaches can overcome these limitations in a fast and efficient manner. However, informatics solutions require reliable algorithms for antigen identification. All known algorithms use sequence similarity to identify antigens. However, antigenicity may be encoded subtly in a sequence and may not be directly identifiable by sequence alignment. We propose a new alignment-independent method for antigen recognition based on the principal chemical properties of protein amino acid sequences. The method is tested by cross-validation on a training set of bacterial antigens and external validation on a test set of known antigens. The prediction accuracy is 83% for the cross-validation and 80% for the external test set. Our approach is accurate and robust, and provides a potent tool for the in silico discovery of medically relevant subunit vaccines.
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Resumo:
The accurate identification of T-cell epitopes remains a principal goal of bioinformatics within immunology. As the immunogenicity of peptide epitopes is dependent on their binding to major histocompatibility complex (MHC) molecules, the prediction of binding affinity is a prerequisite to the reliable prediction of epitopes. The iterative self-consistent (ISC) partial-least-squares (PLS)-based additive method is a recently developed bioinformatic approach for predicting class II peptide−MHC binding affinity. The ISC−PLS method overcomes many of the conceptual difficulties inherent in the prediction of class II peptide−MHC affinity, such as the binding of a mixed population of peptide lengths due to the open-ended class II binding site. The method has applications in both the accurate prediction of class II epitopes and the manipulation of affinity for heteroclitic and competitor peptides. The method is applied here to six class II mouse alleles (I-Ab, I-Ad, I-Ak, I-As, I-Ed, and I-Ek) and included peptides up to 25 amino acids in length. A series of regression equations highlighting the quantitative contributions of individual amino acids at each peptide position was established. The initial model for each allele exhibited only moderate predictivity. Once the set of selected peptide subsequences had converged, the final models exhibited a satisfactory predictive power. Convergence was reached between the 4th and 17th iterations, and the leave-one-out cross-validation statistical terms - q2, SEP, and NC - ranged between 0.732 and 0.925, 0.418 and 0.816, and 1 and 6, respectively. The non-cross-validated statistical terms r2 and SEE ranged between 0.98 and 0.995 and 0.089 and 0.180, respectively. The peptides used in this study are available from the AntiJen database (http://www.jenner.ac.uk/AntiJen). The PLS method is available commercially in the SYBYL molecular modeling software package. The resulting models, which can be used for accurate T-cell epitope prediction, will be made freely available online (http://www.jenner.ac.uk/MHCPred).
Resumo:
A set of 38 epitopes and 183 non-epitopes, which bind to alleles of the HLA-A3 supertype, was subjected to a combination of comparative molecular similarity indices analysis (CoMSIA) and soft independent modeling of class analogy (SIMCA). During the process of T cell recognition, T cell receptors (TCR) interact with the central section of the bound nonamer peptide; thus only positions 4−8 were considered in the study. The derived model distinguished 82% of the epitopes and 73% of the non-epitopes after cross-validation in five groups. The overall preference from the model is for polar amino acids with high electron density and the ability to form hydrogen bonds. These so-called “aggressive” amino acids are flanked by small-sized residues, which enable such residues to protrude from the binding cleft and take an active role in TCR-mediated T cell recognition. Combinations of “aggressive” and “passive” amino acids in the middle part of epitopes constitute a putative TCR binding motif
Resumo:
Allergy is an overreaction by the immune system to a previously encountered, ordinarily harmless substance - typically proteins - resulting in skin rash, swelling of mucous membranes, sneezing or wheezing, or other abnormal conditions. The use of modified proteins is increasingly widespread: their presence in food, commercial products, such as washing powder, and medical therapeutics and diagnostics, makes predicting and identifying potential allergens a crucial societal issue. The prediction of allergens has been explored widely using bioinformatics, with many tools being developed in the last decade; many of these are freely available online. Here, we report a set of novel models for allergen prediction utilizing amino acid E-descriptors, auto- and cross-covariance transformation, and several machine learning methods for classification, including logistic regression (LR), decision tree (DT), naïve Bayes (NB), random forest (RF), multilayer perceptron (MLP) and k nearest neighbours (kNN). The best performing method was kNN with 85.3% accuracy at 5-fold cross-validation. The resulting model has been implemented in a revised version of the AllerTOP server (http://www.ddg-pharmfac.net/AllerTOP). © Springer-Verlag 2014.