878 resultados para Support vector regression
Resumo:
Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Several recent studies in literature have identified brain morphological alterations associated to Borderline Personality Disorder (BPD) patients. These findings are reported by studies based on voxel-based-morphometry analysis of structural MRI data, comparing mean gray-matter concentration between groups of BPD patients and healthy controls. On the other hand, mean differences between groups are not informative about the discriminative value of neuroimaging data to predict the group of individual subjects. In this paper, we go beyond mean differences analyses, and explore to what extent individual BPD patients can be differentiated from controls (25 subjects in each group), using a combination of automated-morphometric tools for regional cortical thickness/volumetric estimation and Support Vector Machine classifier. The approach included a feature selection step in order to identify the regions containing most discriminative information. The accuracy of this classifier was evaluated using the leave-one-subject-out procedure. The brain regions indicated as containing relevant information to discriminate groups were the orbitofrontal, rostral anterior cingulate, posterior cingulate, middle temporal cortices, among others. These areas, which are distinctively involved in emotional and affect regulation of BPD patients, were the most informative regions to achieve both sensitivity and specificity values of 80% in SVM classification. The findings suggest that this new methodology can add clinical and potential diagnostic value to neuroimaging of psychiatric disorders. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.
Resumo:
Solar radiation estimates with clear sky models require estimations of aerosol data. The low spatial resolution of current aerosol datasets, with their remarkable drift from measured data, poses a problem in solar resource estimation. This paper proposes a new downscaling methodology by combining support vector machines for regression (SVR) and kriging with external drift, with data from the MACC reanalysis datasets and temperature and rainfall measurements from 213 meteorological stations in continental Spain. The SVR technique was proven efficient in aerosol variable modeling. The Linke turbidity factor (TL) and the aerosol optical depth at 550 nm (AOD 550) estimated with SVR generated significantly lower errors in AERONET positions than MACC reanalysis estimates. The TL was estimated with relative mean absolute error (rMAE) of 10.2% (compared with AERONET), against the MACC rMAE of 18.5%. A similar behavior was seen with AOD 550, estimated with rMAE of 8.6% (compared with AERONET), against the MACC rMAE of 65.6%. Kriging using MACC data as an external drift was found useful in generating high resolution maps (0.05° × 0.05°) of both aerosol variables. We created high resolution maps of aerosol variables in continental Spain for the year 2008. The proposed methodology was proven to be a valuable tool to create high resolution maps of aerosol variables (TL and AOD 550). This methodology shows meaningful improvements when compared with estimated available databases and therefore, leads to more accurate solar resource estimations. This methodology could also be applied to the prediction of other atmospheric variables, whose datasets are of low resolution.
Resumo:
Deep brain stimulation (DBS) provides significant therapeutic benefit for movement disorders such as Parkinson’s disease (PD). Current DBS devices lack real-time feedback (thus are open loop) and stimulation parameters are adjusted during scheduled visits with a clinician. A closed-loop DBS system may reduce power consumption and side effects by adjusting stimulation parameters based on patient’s behavior. Thus behavior detection is a major step in designing such systems. Various physiological signals can be used to recognize the behaviors. Subthalamic Nucleus (STN) Local field Potential (LFP) is a great candidate signal for the neural feedback, because it can be recorded from the stimulation lead and does not require additional sensors. This thesis proposes novel detection and classification techniques for behavior recognition based on deep brain LFP. Behavior detection from such signals is the vital step in developing the next generation of closed-loop DBS devices. LFP recordings from 13 subjects are utilized in this study to design and evaluate our method. Recordings were performed during the surgery and the subjects were asked to perform various behavioral tasks. Various techniques are used understand how the behaviors modulate the STN. One method studies the time-frequency patterns in the STN LFP during the tasks. Another method measures the temporal inter-hemispheric connectivity of the STN as well as the connectivity between STN and Pre-frontal Cortex (PFC). Experimental results demonstrate that different behaviors create different m odulation patterns in STN and it’s connectivity. We use these patterns as features to classify behaviors. A method for single trial recognition of the patient’s current task is proposed. This method uses wavelet coefficients as features and support vector machine (SVM) as the classifier for recognition of a selection of behaviors: speech, motor, and random. The proposed method is 82.4% accurate for the binary classification and 73.2% for classifying three tasks. As the next step, a practical behavior detection method which asynchronously detects behaviors is proposed. This method does not use any priori knowledge of behavior onsets and is capable of asynchronously detect the finger movements of PD patients. Our study indicates that there is a motor-modulated inter-hemispheric connectivity between LFP signals recorded bilaterally from STN. We utilize a non-linear regression method to measure this inter-hemispheric connectivity and to detect the finger movements. Our experimental results using STN LFP recorded from eight patients with PD demonstrate this is a promising approach for behavior detection and developing novel closed-loop DBS systems.
Resumo:
Support vector machines (SVMs) have recently emerged as a powerful technique for solving problems in pattern classification and regression. Best performance is obtained from the SVM its parameters have their values optimally set. In practice, good parameter settings are usually obtained by a lengthy process of trial and error. This paper describes the use of genetic algorithm to evolve these parameter settings for an application in mobile robotics.
Resumo:
A novel versatile digital signal processing (DSP)-based equalizer using support vector machine regression (SVR) is proposed for 16-quadrature amplitude modulated (16-QAM) coherent optical orthogonal frequency-division multiplexing (CO-OFDM) and experimentally compared to traditional DSP-based deterministic fiber-induced nonlinearity equalizers (NLEs), namely the full-field digital back-propagation (DBP) and the inverse Volterra series transfer function-based NLE (V-NLE). For a 40 Gb/s 16-QAM CO-OFDM at 2000 km, SVR-NLE extends the optimum launched optical power (LOP) by 4 dB compared to V-NLE by means of reduction of fiber nonlinearity. In comparison to full-field DBP at a LOP of 6 dBm, SVR-NLE outperforms by ∼1 dB in Q-factor. In addition, SVR-NLE is the most computational efficient DSP-NLE.
Resumo:
In this study we have identified key genes that are critical in development of astrocytic tumors. Meta-analysis of microarray studies which compared normal tissue to astrocytoma revealed a set of 646 differentially expressed genes in the majority of astrocytoma. Reverse engineering of these 646 genes using Bayesian network analysis produced a gene network for each grade of astrocytoma (Grade I–IV), and ‘key genes’ within each grade were identified. Genes found to be most influential to development of the highest grade of astrocytoma, Glioblastoma multiforme were: COL4A1, EGFR, BTF3, MPP2, RAB31, CDK4, CD99, ANXA2, TOP2A, and SERBP1. All of these genes were up-regulated, except MPP2 (down regulated). These 10 genes were able to predict tumor status with 96–100% confidence when using logistic regression, cross validation, and the support vector machine analysis. Markov genes interact with NFkβ, ERK, MAPK, VEGF, growth hormone and collagen to produce a network whose top biological functions are cancer, neurological disease, and cellular movement. Three of the 10 genes - EGFR, COL4A1, and CDK4, in particular, seemed to be potential ‘hubs of activity’. Modified expression of these 10 Markov Blanket genes increases lifetime risk of developing glioblastoma compared to the normal population. The glioblastoma risk estimates were dramatically increased with joint effects of 4 or more than 4 Markov Blanket genes. Joint interaction effects of 4, 5, 6, 7, 8, 9 or 10 Markov Blanket genes produced 9, 13, 20.9, 26.7, 52.8, 53.2, 78.1 or 85.9%, respectively, increase in lifetime risk of developing glioblastoma compared to normal population. In summary, it appears that modified expression of several ‘key genes’ may be required for the development of glioblastoma. Further studies are needed to validate these ‘key genes’ as useful tools for early detection and novel therapeutic options for these tumors.
Resumo:
Purpose: To build a model that will predict the survival time for patients that were treated with stereotactic radiosurgery for brain metastases using support vector machine (SVM) regression.
Methods and Materials: This study utilized data from 481 patients, which were equally divided into training and validation datasets randomly. The SVM model used a Gaussian RBF function, along with various parameters, such as the size of the epsilon insensitive region and the cost parameter (C) that are used to control the amount of error tolerated by the model. The predictor variables for the SVM model consisted of the actual survival time of the patient, the number of brain metastases, the graded prognostic assessment (GPA) and Karnofsky Performance Scale (KPS) scores, prescription dose, and the largest planning target volume (PTV). The response of the model is the survival time of the patient. The resulting survival time predictions were analyzed against the actual survival times by single parameter classification and two-parameter classification. The predicted mean survival times within each classification were compared with the actual values to obtain the confidence interval associated with the model’s predictions. In addition to visualizing the data on plots using the means and error bars, the correlation coefficients between the actual and predicted means of the survival times were calculated during each step of the classification.
Results: The number of metastases and KPS scores, were consistently shown to be the strongest predictors in the single parameter classification, and were subsequently used as first classifiers in the two-parameter classification. When the survival times were analyzed with the number of metastases as the first classifier, the best correlation was obtained for patients with 3 metastases, while patients with 4 or 5 metastases had significantly worse results. When the KPS score was used as the first classifier, patients with a KPS score of 60 and 90/100 had similar strong correlation results. These mixed results are likely due to the limited data available for patients with more than 3 metastases or KPS scores of 60 or less.
Conclusions: The number of metastases and the KPS score both showed to be strong predictors of patient survival time. The model was less accurate for patients with more metastases and certain KPS scores due to the lack of training data.
Resumo:
Motivated by environmental protection concerns, monitoring the flue gas of thermal power plant is now often mandatory due to the need to ensure that emission levels stay within safe limits. Optical based gas sensing systems are increasingly employed for this purpose, with regression techniques used to relate gas optical absorption spectra to the concentrations of specific gas components of interest (NOx, SO2 etc.). Accurately predicting gas concentrations from absorption spectra remains a challenging problem due to the presence of nonlinearities in the relationships and the high-dimensional and correlated nature of the spectral data. This article proposes a generalized fuzzy linguistic model (GFLM) to address this challenge. The GFLM is made up of a series of “If-Then” fuzzy rules. The absorption spectra are input variables in the rule antecedent. The rule consequent is a general nonlinear polynomial function of the absorption spectra. Model parameters are estimated using least squares and gradient descent optimization algorithms. The performance of GFLM is compared with other traditional prediction models, such as partial least squares, support vector machines, multilayer perceptron neural networks and radial basis function networks, for two real flue gas spectral datasets: one from a coal-fired power plant and one from a gas-fired power plant. The experimental results show that the generalized fuzzy linguistic model has good predictive ability, and is competitive with alternative approaches, while having the added advantage of providing an interpretable model.
Resumo:
Motivated by environmental protection concerns, monitoring the flue gas of thermal power plant is now often mandatory due to the need to ensure that emission levels stay within safe limits. Optical based gas sensing systems are increasingly employed for this purpose, with regression techniques used to relate gas optical absorption spectra to the concentrations of specific gas components of interest (NOx, SO2 etc.). Accurately predicting gas concentrations from absorption spectra remains a challenging problem due to the presence of nonlinearities in the relationships and the high-dimensional and correlated nature of the spectral data. This article proposes a generalized fuzzy linguistic model (GFLM) to address this challenge. The GFLM is made up of a series of “If-Then” fuzzy rules. The absorption spectra are input variables in the rule antecedent. The rule consequent is a general nonlinear polynomial function of the absorption spectra. Model parameters are estimated using least squares and gradient descent optimization algorithms. The performance of GFLM is compared with other traditional prediction models, such as partial least squares, support vector machines, multilayer perceptron neural networks and radial basis function networks, for two real flue gas spectral datasets: one from a coal-fired power plant and one from a gas-fired power plant. The experimental results show that the generalized fuzzy linguistic model has good predictive ability, and is competitive with alternative approaches, while having the added advantage of providing an interpretable model.
Resumo:
Computational intelligent support for decision making is becoming increasingly popular and essential among medical professionals. Also, with the modern medical devices being capable to communicate with ICT, created models can easily find practical translation into software. Machine learning solutions for medicine range from the robust but opaque paradigms of support vector machines and neural networks to the also performant, yet more comprehensible, decision trees and rule-based models. So how can such different techniques be combined such that the professional obtains the whole spectrum of their particular advantages? The presented approaches have been conceived for various medical problems, while permanently bearing in mind the balance between good accuracy and understandable interpretation of the decision in order to truly establish a trustworthy ‘artificial’ second opinion for the medical expert.
Resumo:
Las organizaciones y sus entornos son sistemas complejos. Tales sistemas son difíciles de comprender y predecir. Pese a ello, la predicción es una tarea fundamental para la gestión empresarial y para la toma de decisiones que implica siempre un riesgo. Los métodos clásicos de predicción (entre los cuales están: la regresión lineal, la Autoregresive Moving Average y el exponential smoothing) establecen supuestos como la linealidad, la estabilidad para ser matemática y computacionalmente tratables. Por diferentes medios, sin embargo, se han demostrado las limitaciones de tales métodos. Pues bien, en las últimas décadas nuevos métodos de predicción han surgido con el fin de abarcar la complejidad de los sistemas organizacionales y sus entornos, antes que evitarla. Entre ellos, los más promisorios son los métodos de predicción bio-inspirados (ej. redes neuronales, algoritmos genéticos /evolutivos y sistemas inmunes artificiales). Este artículo pretende establecer un estado situacional de las aplicaciones actuales y potenciales de los métodos bio-inspirados de predicción en la administración.
Resumo:
PURPOSE: To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). METHODS: Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. RESULTS: Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). CONCLUSION: Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.