918 resultados para support vector machines


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the health impacts caused by exposures to air pollutants in urban areas, monitoring and forecasting of air quality parameters have become popular as an important topic in atmospheric and environmental research today. The knowledge on the dynamics and complexity of air pollutants behavior has made artificial intelligence models as a useful tool for a more accurate pollutant concentration prediction. This paper focuses on an innovative method of daily air pollution prediction using combination of Support Vector Machine (SVM) as predictor and Partial Least Square (PLS) as a data selection tool based on the measured values of CO concentrations. The CO concentrations of Rey monitoring station in the south of Tehran, from Jan. 2007 to Feb. 2011, have been used to test the effectiveness of this method. The hourly CO concentrations have been predicted using the SVM and the hybrid PLS–SVM models. Similarly, daily CO concentrations have been predicted based on the aforementioned four years measured data. Results demonstrated that both models have good prediction ability; however the hybrid PLS–SVM has better accuracy. In the analysis presented in this paper, statistic estimators including relative mean errors, root mean squared errors and the mean absolute relative error have been employed to compare performances of the models. It has been concluded that the errors decrease after size reduction and coefficients of determination increase from 56 to 81% for SVM model to 65–85% for hybrid PLS–SVM model respectively. Also it was found that the hybrid PLS–SVM model required lower computational time than SVM model as expected, hence supporting the more accurate and faster prediction ability of hybrid PLS–SVM model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart by the sympathetic and parasympathetic branches of the autonomic nervous system. HRV analysis is an important tool to observe the heart’s ability to respond to normal regulatory impulses that affect its rhythm. Like many bio-signals, HRV signals are non-linear in nature. Higher order spectral analysis (HOS) is known to be a good tool for the analysis of non-linear systems and provides good noise immunity. A computer-based arrhythmia detection system of cardiac states is very useful in diagnostics and disease management. In this work, we studied the identification of the HRV signals using features derived from HOS. These features were fed to the support vector machine (SVM) for classification. Our proposed system can classify the normal and other four classes of arrhythmia with an average accuracy of more than 85%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, downscaling models are developed using a support vector machine (SVM) for obtaining projections of monthly mean maximum and minimum temperatures (T-max and T-min) to river-basin scale. The effectiveness of the model is demonstrated through application to downscale the predictands for the catchment of the Malaprabha reservoir in India, which is considered to be a climatically sensitive region. The probable predictor variables are extracted from (1) the National Centers for Environmental Prediction (NCEP) reanalysis dataset for the period 1978-2000, and (2) the simulations from the third-generation Canadian Coupled Global Climate Model (CGCM3) for emission scenarios A1B, A2, B1 and COMMIT for the period 1978-2100. The predictor variables are classified into three groups, namely A, B and C. Large-scale atmospheric variables Such as air temperature, zonal and meridional wind velocities at 925 nib which are often used for downscaling temperature are considered as predictors in Group A. Surface flux variables such as latent heat (LH), sensible heat, shortwave radiation and longwave radiation fluxes, which control temperature of the Earth's surface are tried as plausible predictors in Group B. Group C comprises of all the predictor variables in both the Groups A and B. The scatter plots and cross-correlations are used for verifying the reliability of the simulation of the predictor variables by the CGCM3 and to Study the predictor-predictand relationships. The impact of trend in predictor variables on downscaled temperature was studied. The predictor, air temperature at 925 mb showed an increasing trend, while the rest of the predictors showed no trend. The performance of the SVM models that are developed, one for each combination of predictor group, predictand, calibration period and location-based stratification (land, land and ocean) of climate variables, was evaluated. In general, the models which use predictor variables pertaining to land surface improved the performance of SVM models for downscaling T-max and T-min

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying unusual or anomalous patterns in an underlying dataset is an important but challenging task in many applications. The focus of the unsupervised anomaly detection literature has mostly been on vectorised data. However, many applications are more naturally described using higher-order tensor representations. Approaches that vectorise tensorial data can destroy the structural information encoded in the high-dimensional space, and lead to the problem of the curse of dimensionality. In this paper we present the first unsupervised tensorial anomaly detection method, along with a randomised version of our method. Our anomaly detection method, the One-class Support Tensor Machine (1STM), is a generalisation of conventional one-class Support Vector Machines to higher-order spaces. 1STM preserves the multiway structure of tensor data, while achieving significant improvement in accuracy and efficiency over conventional vectorised methods. We then leverage the theory of nonlinear random projections to propose the Randomised 1STM (R1STM). Our empirical analysis on several real and synthetic datasets shows that our R1STM algorithm delivers comparable or better accuracy to a state-of-the-art deep learning method and traditional kernelised approaches for anomaly detection, while being approximately 100 times faster in training and testing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein-protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The determination of the overconsolidation ratio (OCR) of clay deposits is an important task in geotechnical engineering practice. This paper examines the potential of a support vector machine (SVM) for predicting the OCR of clays from piezocone penetration test data. SVM is a statistical learning theory based on a structural risk minimization principle that minimizes both error and weight terms. The five input variables used for the SVM model for prediction of OCR are the corrected cone resistance (qt), vertical total stress (sigmav), hydrostatic pore pressure (u0), pore pressure at the cone tip (u1), and the pore pressure just above the cone base (u2). Sensitivity analysis has been performed to investigate the relative importance of each of the input parameters. From the sensitivity analysis, it is clear that qt=primary in situ data influenced by OCR followed by sigmav, u0, u2, and u1. Comparison between SVM and some of the traditional interpretation methods is also presented. The results of this study have shown that the SVM approach has the potential to be a practical tool for determination of OCR.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The determination of settlement of shallow foundations on cohesionless soil is an important task in geotechnical engineering. Available methods for the determination of settlement are not reliable. In this study, the support vector machine (SVM), a novel type of learning algorithm based on statistical theory, has been used to predict the settlement of shallow foundations on cohesionless soil. SVM uses a regression technique by introducing an ε – insensitive loss function. A thorough sensitive analysis has been made to ascertain which parameters are having maximum influence on settlement. The study shows that SVM has the potential to be a useful and practical tool for prediction of settlement of shallow foundation on cohesionless soil.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The most difficult operation in the flood inundation mapping using optical flood images is to separate fully inundated areas from the ‘wet’ areas where trees and houses are partly covered by water. This can be referred as a typical problem the presence of mixed pixels in the images. A number of automatic information extraction image classification algorithms have been developed over the years for flood mapping using optical remote sensing images. Most classification algorithms generally, help in selecting a pixel in a particular class label with the greatest likelihood. However, these hard classification methods often fail to generate a reliable flood inundation mapping because the presence of mixed pixels in the images. To solve the mixed pixel problem advanced image processing techniques are adopted and Linear Spectral unmixing method is one of the most popular soft classification technique used for mixed pixel analysis. The good performance of linear spectral unmixing depends on two important issues, those are, the method of selecting endmembers and the method to model the endmembers for unmixing. This paper presents an improvement in the adaptive selection of endmember subset for each pixel in spectral unmixing method for reliable flood mapping. Using a fixed set of endmembers for spectral unmixing all pixels in an entire image might cause over estimation of the endmember spectra residing in a mixed pixel and hence cause reducing the performance level of spectral unmixing. Compared to this, application of estimated adaptive subset of endmembers for each pixel can decrease the residual error in unmixing results and provide a reliable output. In this current paper, it has also been proved that this proposed method can improve the accuracy of conventional linear unmixing methods and also easy to apply. Three different linear spectral unmixing methods were applied to test the improvement in unmixing results. Experiments were conducted in three different sets of Landsat-5 TM images of three different flood events in Australia to examine the method on different flooding conditions and achieved satisfactory outcomes in flood mapping.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At the time of restoration transmission line switching is one of the major causes, which creates transient overvoltages. Though detailed Electro Magnetic Transient studies are carried out extensively for the planning and design of transmission systems, such studies are not common in a day-today operation of power systems. However it is important for the operator to ensure during restoration of supply that peak overvoltages resulting from the switching operations are well within safe limits. This paper presents a support vector machine approach to classify the various cases of line energization in the category of safe or unsafe based upon the peak value of overvoltage at the receiving end of line. Operator can define the threshold value of voltage to assign the data pattern in either of the class. For illustration of proposed approach the power system used for switching transient peak overvoltages tests is a 400 kV equivalent system of an Indian southern gri

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical learning algorithms provide a viable framework for geotechnical engineering modeling. This paper describes two statistical learning algorithms applied for site characterization modeling based on standard penetration test (SPT) data. More than 2700 field SPT values (N) have been collected from 766 boreholes spread over an area of 220 sqkm area in Bangalore. To get N corrected value (N,), N values have been corrected (Ne) for different parameters such as overburden stress, size of borehole, type of sampler, length of connecting rod, etc. In three-dimensional site characterization model, the function N-c=N-c (X, Y, Z), where X, Y and Z are the coordinates of a point corresponding to N, value, is to be approximated in which N, value at any half-space point in Bangalore can be determined. The first algorithm uses least-square support vector machine (LSSVM), which is related to aridge regression type of support vector machine. The second algorithm uses relevance vector machine (RVM), which combines the strengths of kernel-based methods and Bayesian theory to establish the relationships between a set of input vectors and a desired output. The paper also presents the comparative study between the developed LSSVM and RVM model for site characterization. Copyright (C) 2009 John Wiley & Sons,Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present two new support vector approaches for ordinal regression. These approaches find the concentric spheres with minimum volume that contain most of the training samples. Both approaches guarantee that the radii of the spheres are properly ordered at the optimal solution. The size of the optimization problem is linear in the number of training samples. The popular SMO algorithm is adapted to solve the resulting optimization problem. Numerical experiments on some real-world data sets verify the usefulness of our approaches for data mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper. we propose a novel method using wavelets as input to neural network self-organizing maps and support vector machine for classification of magnetic resonance (MR) images of the human brain. The proposed method classifies MR brain images as either normal or abnormal. We have tested the proposed approach using a dataset of 52 MR brain images. Good classification percentage of more than 94% was achieved using the neural network self-organizing maps (SOM) and 98% front support vector machine. We observed that the classification rate is high for a Support vector machine classifier compared to self-organizing map-based approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses a method for scaling SVM with Gaussian kernel function to handle large data sets by using a selective sampling strategy for the training set. It employs a scalable hierarchical clustering algorithm to construct cluster indexing structures of the training data in the kernel induced feature space. These are then used for selective sampling of the training data for SVM to impart scalability to the training process. Empirical studies made on real world data sets show that the proposed strategy performs well on large data sets.