946 resultados para Machine à vecteurs de support
Resumo:
Background - The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions: instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek to make predictions about precise binding affinities. Results - We developed a quantitative support vector machine regression (SVR) approach, called SVRMHC, to model peptide-MHC binding affinities. As a non-linear method, SVRMHC was able to generate models that out-performed existing linear models, such as the "additive method". By adopting a new "11-factor encoding" scheme, SVRMHC takes into account similarities in the physicochemical properties of the amino acids constituting the input peptides. When applied to MHC-peptide binding data for three mouse class I MHC alleles, the SVRMHC models produced more accurate predictions than those produced previously. Furthermore, comparisons based on Receiver Operating Characteristic (ROC) analysis indicated that SVRMHC was able to out-perform several prominent methods in identifying strongly binding peptides. Conclusion - As a method with demonstrated performance in the quantitative modeling of MHC-peptide binding and in identifying strong binders, SVRMHC is a promising immunoinformatics tool with not inconsiderable future potential.
Resumo:
Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that parallel grammars also can be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the promoter recognition a complex problem. We replace the problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived L- grammar rules are analyzed and compared with natural promoter sequences.
Resumo:
Data fluctuation in multiple measurements of Laser Induced Breakdown Spectroscopy (LIBS) greatly affects the accuracy of quantitative analysis. A new LIBS quantitative analysis method based on the Robust Least Squares Support Vector Machine (RLS-SVM) regression model is proposed. The usual way to enhance the analysis accuracy is to improve the quality and consistency of the emission signal, such as by averaging the spectral signals or spectrum standardization over a number of laser shots. The proposed method focuses more on how to enhance the robustness of the quantitative analysis regression model. The proposed RLS-SVM regression model originates from the Weighted Least Squares Support Vector Machine (WLS-SVM) but has an improved segmented weighting function and residual error calculation according to the statistical distribution of measured spectral data. Through the improved segmented weighting function, the information on the spectral data in the normal distribution will be retained in the regression model while the information on the outliers will be restrained or removed. Copper elemental concentration analysis experiments of 16 certified standard brass samples were carried out. The average value of relative standard deviation obtained from the RLS-SVM model was 3.06% and the root mean square error was 1.537%. The experimental results showed that the proposed method achieved better prediction accuracy and better modeling robustness compared with the quantitative analysis methods based on Partial Least Squares (PLS) regression, standard Support Vector Machine (SVM) and WLS-SVM. It was also demonstrated that the improved weighting function had better comprehensive performance in model robustness and convergence speed, compared with the four known weighting functions.
Resumo:
Background: DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. Results: We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. Conclusions: The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.
Resumo:
The increasing use of model-driven software development has renewed emphasis on using domain-specific models during application development. More specifically, there has been emphasis on using domain-specific modeling languages (DSMLs) to capture user-specified requirements when creating applications. The current approach to realizing these applications is to translate DSML models into source code using several model-to-model and model-to-code transformations. This approach is still dependent on the underlying source code representation and only raises the level of abstraction during development. Experience has shown that developers will many times be required to manually modify the generated source code, which can be error-prone and time consuming. ^ An alternative to the aforementioned approach involves using an interpreted domain-specific modeling language (i-DSML) whose models can be directly executed using a Domain Specific Virtual Machine (DSVM). Direct execution of i-DSML models require a semantically rich platform that reduces the gap between the application models and the underlying services required to realize the application. One layer in this platform is the domain-specific middleware that is responsible for the management and delivery of services in the specific domain. ^ In this dissertation, we investigated the problem of designing the domain-specific middleware of the DSVM to facilitate the bifurcation of the semantics of the domain and the model of execution (MoE) while supporting runtime adaptation and validation. We approached our investigation by seeking solutions to the following sub-problems: (1) How can the domain-specific knowledge (DSK) semantics be separated from the MoE for a given domain? (2) How do we define a generic model of execution (GMoE) of the middleware so that it is adaptable and realizes DSK operations to support delivery of services? (3) How do we validate the realization of DSK operations at runtime? ^ Our research into the domain-specific middleware was done using an i-DSML for the user-centric communication domain, Communication Modeling Language (CML), and for microgrid energy management domain, Microgrid Modeling Language (MGridML). We have successfully developed a methodology to separate the DSK and GMoE of the middleware of a DSVM that supports specialization for a given domain, and is able to perform adaptation and validation at runtime. ^
Resumo:
We experimentally demonstrate 7-dB reduction of nonlinearity penalty in 40-Gb/s CO-OFDM at 2000-km using support vector machine regression-based equalization. Simulation in WDM-CO-OFDM shows up to 12-dB enhancement in Q-factor compared to linear equalization.
Resumo:
Computational intelligent support for decision making is becoming increasingly popular and essential among medical professionals. Also, with the modern medical devices being capable to communicate with ICT, created models can easily find practical translation into software. Machine learning solutions for medicine range from the robust but opaque paradigms of support vector machines and neural networks to the also performant, yet more comprehensible, decision trees and rule-based models. So how can such different techniques be combined such that the professional obtains the whole spectrum of their particular advantages? The presented approaches have been conceived for various medical problems, while permanently bearing in mind the balance between good accuracy and understandable interpretation of the decision in order to truly establish a trustworthy ‘artificial’ second opinion for the medical expert.
Resumo:
Software used by architectural and industrial designers – has moved from becoming a tool for drafting, towards use in verification, simulation, project management and project sharing remotely. In more advanced models, parameters for the designed object can be adjusted so a family of variations can be produced rapidly. With advances in computer aided design technology, numerous design options can now be generated and analyzed in real time. However the use of digital tools to support design as an activity is still at an early stage and has largely been limited in functionality with regard to the design process. To date, major CAD vendors have not developed an integrated tool that is able to both leverage specialized design knowledge from various discipline domains (known as expert knowledge systems) and support the creation of design alternatives that satisfy different forms of constraints. We propose that evolutionary computing and machine learning be linked with parametric design techniques to record and respond to a designer’s own way of working and design history. It is expected that this will lead to results that impact on future work on design support systems-(ergonomics and interface) as well as implicit constraint and problem definition for problems that are difficult to quantify.
Resumo:
The use of appropriate features to characterize an output class or object is critical for all classification problems. This paper evaluates the capability of several spectral and texture features for object-based vegetation classification at the species level using airborne high resolution multispectral imagery. Image-objects as the basic classification unit were generated through image segmentation. Statistical moments extracted from original spectral bands and vegetation index image are used as feature descriptors for image objects (i.e. tree crowns). Several state-of-art texture descriptors such as Gray-Level Co-Occurrence Matrix (GLCM), Local Binary Patterns (LBP) and its extensions are also extracted for comparison purpose. Support Vector Machine (SVM) is employed for classification in the object-feature space. The experimental results showed that incorporating spectral vegetation indices can improve the classification accuracy and obtained better results than in original spectral bands, and using moments of Ratio Vegetation Index obtained the highest average classification accuracy in our experiment. The experiments also indicate that the spectral moment features also outperform or can at least compare with the state-of-art texture descriptors in terms of classification accuracy.
Resumo:
A significant proportion of the cost of software development is due to software testing and maintenance. This is in part the result of the inevitable imperfections due to human error, lack of quality during the design and coding of software, and the increasing need to reduce faults to improve customer satisfaction in a competitive marketplace. Given the cost and importance of removing errors improvements in fault detection and removal can be of significant benefit. The earlier in the development process faults can be found, the less it costs to correct them and the less likely other faults are to develop. This research aims to make the testing process more efficient and effective by identifying those software modules most likely to contain faults, allowing testing efforts to be carefully targeted. This is done with the use of machine learning algorithms which use examples of fault prone and not fault prone modules to develop predictive models of quality. In order to learn the numerical mapping between module and classification, a module is represented in terms of software metrics. A difficulty in this sort of problem is sourcing software engineering data of adequate quality. In this work, data is obtained from two sources, the NASA Metrics Data Program, and the open source Eclipse project. Feature selection before learning is applied, and in this area a number of different feature selection methods are applied to find which work best. Two machine learning algorithms are applied to the data - Naive Bayes and the Support Vector Machine - and predictive results are compared to those of previous efforts and found to be superior on selected data sets and comparable on others. In addition, a new classification method is proposed, Rank Sum, in which a ranking abstraction is laid over bin densities for each class, and a classification is determined based on the sum of ranks over features. A novel extension of this method is also described based on an observed polarising of points by class when rank sum is applied to training data to convert it into 2D rank sum space. SVM is applied to this transformed data to produce models the parameters of which can be set according to trade-off curves to obtain a particular performance trade-off.
Resumo:
The ability to accurately predict the remaining useful life of machine components is critical for machine continuous operation and can also improve productivity and enhance system’s safety. In condition-based maintenance (CBM), maintenance is performed based on information collected through condition monitoring and assessment of the machine health. Effective diagnostics and prognostics are important aspects of CBM for maintenance engineers to schedule a repair and to acquire replacement components before the components actually fail. Although a variety of prognostic methodologies have been reported recently, their application in industry is still relatively new and mostly focused on the prediction of specific component degradations. Furthermore, they required significant and sufficient number of fault indicators to accurately prognose the component faults. Hence, sufficient usage of health indicators in prognostics for the effective interpretation of machine degradation process is still required. Major challenges for accurate longterm prediction of remaining useful life (RUL) still remain to be addressed. Therefore, continuous development and improvement of a machine health management system and accurate long-term prediction of machine remnant life is required in real industry application. This thesis presents an integrated diagnostics and prognostics framework based on health state probability estimation for accurate and long-term prediction of machine remnant life. In the proposed model, prior empirical (historical) knowledge is embedded in the integrated diagnostics and prognostics system for classification of impending faults in machine system and accurate probability estimation of discrete degradation stages (health states). The methodology assumes that machine degradation consists of a series of degraded states (health states) which effectively represent the dynamic and stochastic process of machine failure. The estimation of discrete health state probability for the prediction of machine remnant life is performed using the ability of classification algorithms. To employ the appropriate classifier for health state probability estimation in the proposed model, comparative intelligent diagnostic tests were conducted using five different classifiers applied to the progressive fault data of three different faults in a high pressure liquefied natural gas (HP-LNG) pump. As a result of this comparison study, SVMs were employed in heath state probability estimation for the prediction of machine failure in this research. The proposed prognostic methodology has been successfully tested and validated using a number of case studies from simulation tests to real industry applications. The results from two actual failure case studies using simulations and experiments indicate that accurate estimation of health states is achievable and the proposed method provides accurate long-term prediction of machine remnant life. In addition, the results of experimental tests show that the proposed model has the capability of providing early warning of abnormal machine operating conditions by identifying the transitional states of machine fault conditions. Finally, the proposed prognostic model is validated through two industrial case studies. The optimal number of health states which can minimise the model training error without significant decrease of prediction accuracy was also examined through several health states of bearing failure. The results were very encouraging and show that the proposed prognostic model based on health state probability estimation has the potential to be used as a generic and scalable asset health estimation tool in industrial machinery.
Resumo:
The primary genetic risk factor in multiple sclerosis (MS) is the HLA-DRB1*1501 allele; however, much of the remaining genetic contribution to MS has yet to be elucidated. Several lines of evidence support a role for neuroendocrine system involvement in autoimmunity which may, in part, be genetically determined. Here, we comprehensively investigated variation within eight candidate hypothalamic-pituitary-adrenal (HPA) axis genes and susceptibility to MS. A total of 326 SNPs were investigated in a discovery dataset of 1343 MS cases and 1379 healthy controls of European ancestry using a multi-analytical strategy. Random Forests, a supervised machine-learning algorithm, identified eight intronic SNPs within the corticotrophin-releasing hormone receptor 1 or CRHR1 locus on 17q21.31 as important predictors of MS. On the basis of univariate analyses, six CRHR1 variants were associated with decreased risk for disease following a conservative correction for multiple tests. Independent replication was observed for CRHR1 in a large meta-analysis comprising 2624 MS cases and 7220 healthy controls of European ancestry. Results from a combined meta-analysis of all 3967 MS cases and 8599 controls provide strong evidence for the involvement of CRHR1 in MS. The strongest association was observed for rs242936 (OR = 0.82, 95% CI = 0.74-0.90, P = 9.7 × 10-5). Replicated CRHR1 variants appear to exist on a single associated haplotype. Further investigation of mechanisms involved in HPA axis regulation and response to stress in MS pathogenesis is warranted. © The Author 2010. Published by Oxford University Press. All rights reserved.