37 resultados para Bankruptcy prediction methods
em Aston University Research Archive
Resumo:
Background - Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Results - Bacterial, viral and tumour protein datasets were used to derive models for prediction of whole protein antigenicity. Every set consisted of 100 known antigens and 100 non-antigens. The derived models were tested by internal leave-one-out cross-validation and external validation using test sets. An additional five training sets for each class of antigens were used to test the stability of the discrimination between antigens and non-antigens. The models performed well in both validations showing prediction accuracy of 70% to 89%. The models were implemented in a server, which we call VaxiJen. Conclusion - VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. The server can be used on its own or in combination with alignment-based prediction methods.
Resumo:
Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT.
Resumo:
Motivation: The immunogenicity of peptides depends on their ability to bind to MHC molecules. MHC binding affinity prediction methods can save significant amounts of experimental work. The class II MHC binding site is open at both ends, making epitope prediction difficult because of the multiple binding ability of long peptides. Results: An iterative self-consistent partial least squares (PLS)-based additive method was applied to a set of 66 pep- tides no longer than 16 amino acids, binding to DRB1*0401. A regression equation containing the quantitative contributions of the amino acids at each of the nine positions was generated. Its predictability was tested using two external test sets which gave r pred =0.593 and r pred=0.655, respectively. Furthermore, it was benchmarked using 25 known T-cell epitopes restricted by DRB1*0401 and we compared our results with four other online predictive methods. The additive method showed the best result finding 24 of the 25 T-cell epitopes. Availability: Peptides used in the study are available from http://www.jenner.ac.uk/JenPep. The PLS method is available commercially in the SYBYL molecular modelling software package. The final model for affinity prediction of peptides binding to DRB1*0401 molecule is available at http://www.jenner.ac.uk/MHCPred. Models developed for DRB1*0101 and DRB1*0701 also are available in MHC- Pred
Resumo:
Immunoinformatics is an emergent branch of informatics science that long ago pullulated from the tree of knowledge that is bioinformatics. It is a discipline which applies informatic techniques to problems of the immune system. To a great extent, immunoinformatics is typified by epitope prediction methods. It has found disappointingly limited use in the design and discovery of new vaccines, which is an area where proper computational support is generally lacking. Most extant vaccines are not based around isolated epitopes but rather correspond to chemically-treated or attenuated whole pathogens or correspond to individual proteins extract from whole pathogens or correspond to complex carbohydrate. In this chapter we attempt to review what progress there has been in an as-yet-underexplored area of immunoinformatics: the computational discovery of whole protein antigens. The effective development of antigen prediction methods would significantly reduce the laboratory resource required to identify pathogenic proteins as candidate subunit vaccines. We begin our review by placing antigen prediction firmly into context, exploring the role of reverse vaccinology in the design and discovery of vaccines. We also highlight several competing yet ultimately complementary methodological approaches: sub-cellular location prediction, identifying antigens using sequence similarity, and the use of sophisticated statistical approaches for predicting the probability of antigen characteristics. We end by exploring how a systems immunomics approach to the prediction of immunogenicity would prove helpful in the prediction of antigens.
Resumo:
A study of vapour-liquid equilibria is presented together with current developments. The theory of vapour-liquid equilibria is discussed. Both experimental and prediction methods for obtaining vapour-liquid equilibria data are critically reviewed. The development of a new family of equilibrium stills to measure experimental VLE data from sub-atmosphere to 35 bar pressure is described. Existing experimental techniques are reviewed, to highlight the needs for these new apparati and their major attributes. Details are provided of how apparatus may be further improved and how computer control may be implemented. To provide a rigorous test of the apparatus the stills have been commissioned using acetic acid-water mixture at one atmosphere pressure. A Barker-type consistency test computer program, which allows for association in both phases has been applied to the data generated and clearly shows that the stills produce data of a very high quality. Two high quality data sets, for the mixture acetone-chloroform, have been generated at one atmosphere and 64.3oC. These data are used to investigate the ability of the new novel technique, based on molecular parameters, to predict VLE data for highly polar mixtures. Eight, vapour-liquid equilibrium data sets have been produced for the cyclohexane-ethanol mixture at one atmosphere, 2, 4, 6, 8 and 11 bar, 90.9oC and 132.8oC. These data sets have been tested for thermodynamic consistency using a Barker-type fitting package and shown to be of high quality. The data have been used to investigate the dependence of UNIQUAC parameters with temperature. The data have in addition been used to compare directly the performance of the predictive methods - Original UNIFAC, a modified version of UNIFAC, and the new novel technique, based on molecular parameters developed from generalised London's potential (GLP) theory.
Resumo:
A recent method for phase equilibria, the AGAPE method, has been used to predict activity coefficients and excess Gibbs energy for binary mixtures with good accuracy. The theory, based on a generalised London potential (GLP), accounts for intermolecular attractive forces. Unlike existing prediction methods, for example UNIFAC, the AGAPE method uses only information derived from accessible experimental data and molecular information for pure components. Presently, the AGAPE method has some limitations, namely that the mixtures must consist of small, non-polar compounds with no hydrogen bonding, at low moderate pressures and at conditions below the critical conditions of the components. Distinction between vapour-liquid equilibria and gas-liquid solubility is rather arbitrary and it seems reasonable to extend these ideas to solubility. The AGAPE model uses a molecular lattice-based mixing rule. By judicious use of computer programs a methodology was created to examine a body of experimental gas-liquid solubility data for gases such as carbon dioxide, propane, n-butane or sulphur hexafluoride which all have critical temperatures a little above 298 K dissolved in benzene, cyclo-hexane and methanol. Within this methodology the value of the GLP as an ab initio combining rule for such solutes in very dilute solutions in a variety of liquids has been tested. Using the GLP as a mixing rule involves the computation of rotationally averaged interactions between the constituent atoms, and new calculations have had to be made to discover the magnitude of the unlike pair interactions. These numbers have been seen as significant in their own right in the context of the behaviour of infinitely-dilute solutions. A method for extending this treatment to "permanent" gases has also been developed. The findings from the GLP method and from the more general AGAPE approach have been examined in the context of other models for gas-liquid solubility, both "classical" and contemporary, in particular those derived from equations-of-state methods and from reference solvent methods.
Resumo:
The theory of vapour-liquid equilibria is reviewed, as is the present status or prediction methods in this field. After discussion of the experimental methods available, development of a recirculating equilibrium still based on a previously successful design (the modified Raal, Code and Best still of O'Donnell and Jenkins) is described. This novel still is designed to work at pressures up to 35 bar and for the measurement of both isothermal and isobaric vapour-liquid equilibrium data. The equilibrium still was first commissioned by measuring the saturated vapour pressures of pure ethanol and cyclohexane in the temperature range 77-124°C and 80-142°C respectively. The data obtained were compared with available literature experimental values and with values derived from an extended form of the Antoine equation for which parameters were given in the literature. Commissioning continued with the study of the phase behaviour of mixtures of the two pure components as such mixtures are strongly non-ideal, showing azeotopic behaviour. Existing data did not exist above one atmosphere pressure. Isothermal measurements were made at 83.29°C and 106.54°C, whilst isobaric measurements were made at pressures of 1 bar, 3 bar and 5 bar respectively. The experimental vapour-liquid equilibrium data obtained are assessed by a standard literature method incorporating a themodynamic consistency test that minimises the errors in all the measured variables. This assessment showed that reasonable x-P-T data-sets had been measured, from which y-values could be deduced, but that the experimental y-values indicated the need for improvements in the design of the still. The final discussion sets out the improvements required and outlines how they might be attained.
Resumo:
Motivation: In any macromolecular polyprotic system - for example protein, DNA or RNA - the isoelectric point - commonly referred to as the pI - can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge - and thus the electrophoretic mobility - of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: yperez@ebi.ac.uk Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.
Resumo:
This paper describes how modern machine learning techniques can be used in conjunction with statistical methods to forecast short term movements in exchange rates, producing models suitable for use in trading. It compares the results achieved by two different techniques, and shows how they can be used in a complementary fashion. The paper draws on experience of both inter- and intra-day forecasting taken from earlier studies conducted by Logica and Chemical Bank Quantitative Research and Trading (QRT) group's experience in developing trading models.
Resumo:
Vaccines are the greatest single instrument of prophylaxis against infectious diseases, with immeasurable benefits to human wellbeing. The accurate and reliable prediction of peptide-MHC binding is fundamental to the robust identification of T-cell epitopes and thus the successful design of peptide- and protein-based vaccines. The prediction of MHC class II peptide binding has hitherto proved recalcitrant and refractory. Here we illustrate the utility of existing computational tools for in silico prediction of peptides binding to class II MHCs. Most of the methods, tested in the present study, detect more than the half of the true binders in the top 5% of all possible nonamers generated from one protein. This number increases in the top 10% and 15% and then does not change significantly. For the top 15% the identified binders approach 86%. In terms of lab work this means 85% less expenditure on materials, labour and time. We show that while existing caveats are well founded, nonetheless use of computational models of class II binding can still offer viable help to the work of the immunologist and vaccinologist.
Resumo:
1. Fitting a linear regression to data provides much more information about the relationship between two variables than a simple correlation test. A goodness of fit test of the line should always be carried out. Hence, r squared estimates the strength of the relationship between Y and X, ANOVA whether a statistically significant line is present, and the ‘t’ test whether the slope of the line is significantly different from zero. 2. Always check whether the data collected fit the assumptions for regression analysis and, if not, whether a transformation of the Y and/or X variables is necessary. 3. If the regression line is to be used for prediction, it is important to determine whether the prediction involves an individual y value or a mean. Care should be taken if predictions are made close to the extremities of the data and are subject to considerable error if x falls beyond the range of the data. Multiple predictions require correction of the P values. 3. If several individual regression lines have been calculated from a number of similar sets of data, consider whether they should be combined to form a single regression line. 4. If the data exhibit a degree of curvature, then fitting a higher-order polynomial curve may provide a better fit than a straight line. In this case, a test of whether the data depart significantly from a linear regression should be carried out.
Resumo:
The sudden loss of the plasma magnetic confinement, known as disruption, is one of the major issue in a nuclear fusion machine as JET (Joint European Torus), Disruptions pose very serious problems to the safety of the machine. The energy stored in the plasma is released to the machine structure in few milliseconds resulting in forces that at JET reach several Mega Newtons. The problem is even more severe in the nuclear fusion power station where the forces are in the order of one hundred Mega Newtons. The events that occur during a disruption are still not well understood even if some mechanisms that can lead to a disruption have been identified and can be used to predict them. Unfortunately it is always a combination of these events that generates a disruption and therefore it is not possible to use simple algorithms to predict it. This thesis analyses the possibility of using neural network algorithms to predict plasma disruptions in real time. This involves the determination of plasma parameters every few milliseconds. A plasma boundary reconstruction algorithm, XLOC, has been developed in collaboration with Dr. D. Ollrien and Dr. J. Ellis capable of determining the plasma wall/distance every 2 milliseconds. The XLOC output has been used to develop a multilayer perceptron network to determine plasma parameters as ?i and q? with which a machine operational space has been experimentally defined. If the limits of this operational space are breached the disruption probability increases considerably. Another approach for prediction disruptions is to use neural network classification methods to define the JET operational space. Two methods have been studied. The first method uses a multilayer perceptron network with softmax activation function for the output layer. This method can be used for classifying the input patterns in various classes. In this case the plasma input patterns have been divided between disrupting and safe patterns, giving the possibility of assigning a disruption probability to every plasma input pattern. The second method determines the novelty of an input pattern by calculating the probability density distribution of successful plasma patterns that have been run at JET. The density distribution is represented as a mixture distribution, and its parameters arc determined using the Expectation-Maximisation method. If the dataset, used to determine the distribution parameters, covers sufficiently well the machine operational space. Then, the patterns flagged as novel can be regarded as patterns belonging to a disrupting plasma. Together with these methods, a network has been designed to predict the vertical forces, that a disruption can cause, in order to avoid that too dangerous plasma configurations are run. This network can be run before the pulse using the pre-programmed plasma configuration or on line becoming a tool that allows to stop dangerous plasma configuration. All these methods have been implemented in real time on a dual Pentium Pro based machine. The Disruption Prediction and Prevention System has shown that internal plasma parameters can be determined on-line with a good accuracy. Also the disruption detection algorithms showed promising results considering the fact that JET is an experimental machine where always new plasma configurations are tested trying to improve its performances.
Resumo:
This thesis introduces a flexible visual data exploration framework which combines advanced projection algorithms from the machine learning domain with visual representation techniques developed in the information visualisation domain to help a user to explore and understand effectively large multi-dimensional datasets. The advantage of such a framework to other techniques currently available to the domain experts is that the user is directly involved in the data mining process and advanced machine learning algorithms are employed for better projection. A hierarchical visualisation model guided by a domain expert allows them to obtain an informed segmentation of the input space. Two other components of this thesis exploit properties of these principled probabilistic projection algorithms to develop a guided mixture of local experts algorithm which provides robust prediction and a model to estimate feature saliency simultaneously with the training of a projection algorithm.Local models are useful since a single global model cannot capture the full variability of a heterogeneous data space such as the chemical space. Probabilistic hierarchical visualisation techniques provide an effective soft segmentation of an input space by a visualisation hierarchy whose leaf nodes represent different regions of the input space. We use this soft segmentation to develop a guided mixture of local experts (GME) algorithm which is appropriate for the heterogeneous datasets found in chemoinformatics problems. Moreover, in this approach the domain experts are more involved in the model development process which is suitable for an intuition and domain knowledge driven task such as drug discovery. We also derive a generative topographic mapping (GTM) based data visualisation approach which estimates feature saliency simultaneously with the training of a visualisation model.
Resumo:
In vitro studies of drug absorption processes are undertaken to assess drug candidate or formulation suitability, mechanism investigation, and ultimately for the development of predictive models. This study included each of these approaches, with the aim of developing novel in vitro methods for inclusion in a drug absorption model. Two model analgesic drugs, ibuprofen and paracetamol, were selected. The study focused on three main areas, the interaction of the model drugs with co-administered antacids, the elucidation of the mechanisms responsible for the increased absorption rate observed in a novel paracetamol formulation and the development of novel ibuprofen tablet formulations containing alkalising excipients as dissolution promoters.Several novel dissolution methods were developed. A method to study the interaction of drug/excipient mixtures in the powder form was successfully used to select suitable dissolution enhancing exicipents. A method to study intrinsic dissolution rate using paddle apparatus was developed and used to study dissolution mechanisms. Methods to simulate stomach and intestine environments in terms of media composition and volume and drug/antacid doses were developed. Antacid addition greatly increased the dissolution of ibuprofen in the stomach model.Novel methods to measure drug permeability through rat stomach and intestine were developed, using sac methodology. The methods allowed direct comparison of the apparent permeability values obtained. Tissue stability, reproducibility and integrity was observed, with selectivity between paracellular and transcellular markers and hydrophilic and lipophilic compounds within an homologous series of beta-blockers.
Resumo:
This thesis reports the development of a reliable method for the prediction of response to electromagnetically induced vibration in large electric machines. The machines of primary interest are DC ship-propulsion motors but much of the work reported has broader significance. The investigation has involved work in five principal areas. (1) The development and use of dynamic substructuring methods. (2) The development of special elements to represent individual machine components. (3) Laboratory scale investigations to establish empirical values for properties which affect machine vibration levels. (4) Experiments on machines on the factory test-bed to provide data for correlation with prediction. (5) Reasoning with regard to the effect of various design features. The limiting factor in producing good models for machines in vibration is the time required for an analysis to take place. Dynamic substructuring methods were adopted early in the project to maximise the efficiency of the analysis. A review of existing substructure- representation and composite-structure assembly methods includes comments on which are most suitable for this application. In three appendices to the main volume methods are presented which were developed by the author to accelerate analyses. Despite significant advances in this area, the limiting factor in machine analyses is still time. The representation of individual machine components was addressed as another means by which the time required for an analysis could be reduced. This has resulted in the development of special elements which are more efficient than their finite-element counterparts. The laboratory scale experiments reported were undertaken to establish empirical values for the properties of three distinct features - lamination stacks, bolted-flange joints in rings and cylinders and the shimmed pole-yoke joint. These are central to the preparation of an accurate machine model. The theoretical methods are tested numerically and correlated with tests on two machines (running and static). A system has been devised with which the general electromagnetic forcing may be split into its most fundamental components. This is used to draw some conclusions about the probable effects of various design features.