105 resultados para support vector machines


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavior as the spammer constantly changes their tricks to circumvent these filters. The evasive tactics that the spammer uses are patterns and these patterns can be modeled to combat spam. This paper investigates the possibilities of modeling spammer behavioral patterns by well-known classification algorithms such as Naïve Bayesian classifier (Naive Bayes), Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary experimental results demonstrate a promising detection rate of around 92%, which is considerably an enhancement of performance compared to similar spammer behavior modeling research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis examined the application of data mining techniques to the issue of predicting pilling propensity of wool knitwear. Using real industrial data, a pilling propensity prediction tool with embedded trained support vector machines is developed to provide high accuracy prediction to wool knitwear even before the yarn is spun!

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Offline handwritten recognition is an important automated process in pattern recognition and computer vision field. This paper presents an approach of polar coordinate-based handwritten recognition system involving Support Vector Machines (SVM) classification methodology to achieve high recognition performance. We provide comparison and evaluation for zoning feature extraction methods applied in Polar system. The recognition results we proposed were trained and tested by using SVM with a set of 650 handwritten character images. All the input images are segmented (isolated) handwritten characters. Compared with Cartesian based handwritten recognition system, the recognition rate is more stable and improved up to 86.63%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To be diagnostically effective, structural magnetic resonance imaging (sMRI) must reliably distinguish a depressed individual from a healthy individual at individual scans level. One of the tasks in the automated diagnosis of depression from brain sMRI is the classification. It determines the class to which a sample belongs (i.e., depressed/not depressed, remitted/not-remitted depression) based on the values of its features. Thus far, very limited works have been reported for identification of a suitable classification algorithm for depression detection. In this paper, different types of classification algorithms are compared for effective diagnosis of depression. Ten independent classification schemas are applied and a comparative study is carried out. The algorithms are: Naïve Bayes, Support Vector Machines (SVM) with Radial Basis Function (RBF), SVM Sigmoid, J48, Random Forest, Random Tree, Voting Feature Intervals (VFI), LogitBoost, Simple KMeans Classification Via Clustering (KMeans) and Classification Via Clustering Expectation Minimization (EM) respectively. The performances of the algorithms are determined through a set of experiments on sMRI brain scans. An experimental procedure is developed to measure the performance of the tested algorithms. A classification accuracy evaluation method was employed for evaluation and comparison of the performance of the examined classifiers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-dimensional problem domains pose significant challenges for anomaly detection. The presence of irrelevant features can conceal the presence of anomalies. This problem, known as the '. curse of dimensionality', is an obstacle for many anomaly detection techniques. Building a robust anomaly detection model for use in high-dimensional spaces requires the combination of an unsupervised feature extractor and an anomaly detector. While one-class support vector machines are effective at producing decision surfaces from well-behaved feature vectors, they can be inefficient at modelling the variation in large, high-dimensional datasets. Architectures such as deep belief networks (DBNs) are a promising technique for learning robust features. We present a hybrid model where an unsupervised DBN is trained to extract generic underlying features, and a one-class SVM is trained from the features learned by the DBN. Since a linear kernel can be substituted for nonlinear ones in our hybrid model without loss of accuracy, our model is scalable and computationally efficient. The experimental results show that our proposed model yields comparable anomaly detection performance with a deep autoencoder, while reducing its training and testing time by a factor of 3 and 1000, respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Appliance-specific Load Monitoring (LM) provides a possible solution to the problem of energy conservation which is becoming increasingly challenging, due to growing energy demands within offices and residential spaces. It is essential to perform automatic appliance recognition and monitoring for optimal resource utilization. In this paper, we study the use of non-intrusive LM methods that rely on steady-state appliance signatures for classifying most commonly used office appliances, while demonstrating their limitation in terms of accurately discerning the low-power devices due to overlapping load signatures. We propose a multi-layer decision architecture that makes use of audio features derived from device sounds and fuse it with load signatures acquired from energy meter. For the recognition of device sounds, we perform feature set selection by evaluating the combination of time-domain and FFT-based audio features on the state of the art machine learning algorithms. Further, we demonstrate that our proposed feature set which is a concatenation of device audio feature and load signature significantly improves the device recognition accuracy in comparison to the use of steady-state load signatures only.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of unsupervised anomaly detection arises in a wide variety of practical applications. While one-class support vector machines have demonstrated their effectiveness as an anomaly detection technique, their ability to model large datasets is limited due to their memory and time complexity for training. To address this issue for supervised learning of kernel machines, there has been growing interest in random projection methods as an alternative to the computationally expensive problems of kernel matrix construction and support vector optimisation. In this paper we leverage the theory of nonlinear random projections and propose the Randomised One-class SVM (R1SVM), which is an efficient and scalable anomaly detection technique that can be trained on large-scale datasets. Our empirical analysis on several real-life and synthetic datasets shows that our randomised 1SVM algorithm achieves comparable or better accuracy to deep autoen-coder and traditional kernelised approaches for anomaly detection, while being approximately 100 times faster in training and testing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lung cancer is a leading cause of cancer-related death worldwide. The early diagnosis of cancer has demonstrated to be greatly helpful for curing the disease effectively. Microarray technology provides a promising approach of exploiting gene profiles for cancer diagnosis. In this study, the authors propose a gene expression programming (GEP)-based model to predict lung cancer from microarray data. The authors use two gene selection methods to extract the significant lung cancer related genes, and accordingly propose different GEP-based prediction models. Prediction performance evaluations and comparisons between the authors' GEP models and three representative machine learning methods, support vector machine, multi-layer perceptron and radial basis function neural network, were conducted thoroughly on real microarray lung cancer datasets. Reliability was assessed by the cross-data set validation. The experimental results show that the GEP model using fewer feature genes outperformed other models in terms of accuracy, sensitivity, specificity and area under the receiver operating characteristic curve. It is concluded that GEP model is a better solution to lung cancer prediction problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Graph-based anomaly detection plays a vital role in various application domains such as network intrusion detection, social network analysis and road traffic monitoring. Although these evolving networks impose a curse of dimensionality on the learning models, they usually contain structural properties that anomaly detection schemes can exploit. The major challenge is finding a feature extraction technique that preserves graph structure while balancing the accuracy of the model against its scalability. We propose the use of a scalable technique known as random projection as a method for structure aware embedding, which extracts relational properties of the network, and present an analytical proof of this claim. We also analyze the effect of embedding on the accuracy of one-class support vector machines for anomaly detection on real and synthetic datasets. We demonstrate that the embedding can be effective in terms of scalability without detrimental influence on the accuracy of the learned model.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Compared with conventional two-class learning schemes, one-class classification simply uses a single class in the classifier training phase. Applying one-class classification to learn from unbalanced data set is regarded as the recognition based learning and has shown to have the potential of achieving better performance. Similar to twoclass learning, parameter selection is a significant issue, especially when the classifier is sensitive to the parameters. For one-class learning scheme with the kernel function, such as one-class Support Vector Machine and Support Vector Data Description, besides the parameters involved in the kernel, there is another one-class specific parameter: the rejection rate v. In this paper, we proposed a general framework to involve the majority class in solving the parameter selection problem. In this framework, we first use the minority target class for training in the one-class classification stage; then we use both minority and majority class for estimating the generalization performance of the constructed classifier. This generalization performance is set as the optimization criteria. We employed the Grid search and Experiment Design search to attain various parameter settings. Experiments on UCI and Reuters text data show that the parameter optimized one-class classifiers outperform all the standard one-class learning schemes we examined.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Microarray data classification is one of the most important emerging clinical applications in the medical community. Machine learning algorithms are most frequently used to complete this task. We selected one of the state-of-the-art kernel-based algorithms, the support vector machine (SVM), to classify microarray data. As a large number of kernels are available, a significant research question is what is the best kernel for patient diagnosis based on microarray data classification using SVM? We first suggest three solutions based on data visualization and quantitative measures. Different types of microarray problems then test the proposed solutions. Finally, we found that the rule-based approach is most useful for automatic kernel selection for SVM to classify microarray data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There exists an enormous gap between low-level visual feature and high-level semantic information, and the accuracy of content-based image classification and retrieval depends greatly on the description of low-level visual features. Taking this into consideration, a novel texture and edge descriptor is proposed in this paper, which can be represented with a histogram. Furthermore, with the incorporation of the color, texture and edge histograms searnlessly, the images are grouped into semantic classes using a support vector machine (SVM). Experiment results show that the combination descriptor is more discriminative than other feature descriptors such as Gabor texture.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a system that employs random forests to formulate a method for subcellular localisation of proteins. A random forest is an ensemble learner that grows classification trees. Each tree produces a classification decision, and an integrated output is calculated. The system classifies the protein-localisation patterns within fluorescent microscope images. 2D images of HeLa cells that include all major classes of subcellular structures, and the associated feature set are used. The performance of the developed system is compared against that of the support vector machine and decision tree approaches. Three experiments are performed to study the influence of the training and test set size on the performance of the examined methods. The calculated classification errors and execution times are presented and discussed. The lowest classification error (2.9%) has been produced by the developed system.