959 resultados para feature selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phishing emails are more active than ever before and putting the average computer user and organizations at risk of significant data, brand and financial loss. Through an analysis of a number of phishing and ham email collected, this paper focused on fundamental attacker behavior which could be extracted from email header. It also put forward a hybrid feature selection approach based on combination of content-based and behavior-based. The approach could mine the attacker behavior based on email header. On a publicly available test corpus, our hybrid features selections are able to achieve 96% accuracy rate. In addition, we successfully tested the quality of our proposed behavior-based feature using the information gain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A good intrusion system gives an accurate and efficient classification results. This ability is an essential functionality to build an intrusion detection system. In this paper, we focused on using various training functions with feature selection to achieve high accurate results. The data we used in our experiments are NSL-KDD. However, the training and testing time to build the model is very high. To address this, we proposed feature selection based on information gain, which can detect several attack types with high accurate result and low false rate. Moreover, we executed experiments to category each of the five classes (probe, denial of service (DoS), user to super-user (U2R), and remote to local (R2L), normal). Our proposed outperform other state-of-art methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the issues associated with pattern classification using data based machine learning systems is the “curse of dimensionality”. In this paper, the circle-segments method is proposed as a feature selection method to identify important input features before the entire data set is provided for learning with machine learning systems. Specifically, four machine learning systems are deployed for classification, viz. Multilayer Perceptron (MLP), Support Vector Machine (SVM), Fuzzy ARTMAP (FAM), and k-Nearest Neighbour (kNN). The integration between the circle-segments method and the machine learning systems has been applied to two case studies comprising one benchmark and one real data sets. Overall, the results after feature selection using the circle segments method demonstrate improvements in performance even with more than 50% of the input features eliminated from the original data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is devoted to empirical investigation of novel multi-level ensemble meta classifiers for the detection and monitoring of progression of cardiac autonomic neuropathy, CAN, in diabetes patients. Our experiments relied on an extensive database and concentrated on ensembles of ensembles, or multi-level meta classifiers, for the classification of cardiac autonomic neuropathy progression. First, we carried out a thorough investigation comparing the performance of various base classifiers for several known sets of the most essential features in this database and determined that Random Forest significantly and consistently outperforms all other base classifiers in this new application. Second, we used feature selection and ranking implemented in Random Forest. It was able to identify a new set of features, which has turned out better than all other sets considered for this large and well-known database previously. Random Forest remained the very best classier for the new set of features too. Third, we investigated meta classifiers and new multi-level meta classifiers based on Random Forest, which have improved its performance. The results obtained show that novel multi-level meta classifiers achieved further improvement and obtained new outcomes that are significantly better compared with the outcomes published in the literature previously for cardiac autonomic neuropathy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Detection of depression from structural MRI (sMRI) scans is relatively new in the mental health diagnosis. Such detection requires processes including image acquisition and pre-processing, feature extraction and selection, and classification. Identification of a suitable feature selection (FS) algorithm will facilitate the enhancement of the detection accuracy by selection of important features. In the field of depression study, there are very limited works that evaluate feature selection algorithms for sMRI data. This paper investigates the performance of four algorithms for FS of volumetric attributes in sMRI scans. The algorithms are One Rule (OneR), Support Vector Machine (SVM), Information Gain (IG) and ReliefF. The performances of the algorithms are determined through a set of experiments on sMRI brain scans. An experimental procedure is developed to measure the performance of the tested algorithms. The result of the evaluation of the FS algorithms is discussed by using a number of analyses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Email has become the critical communication medium for most organizations. Unfortunately, email-born attacks in computer networks are causing considerable economic losses worldwide. Exiting phishing email blocking appliances have little effect in weeding out the vast majority of phishing emails. At the same time, online criminals are becoming more dangerous and sophisticated. Phishing emails are more active than ever before and putting the average computer user and organizations at risk of significant data, brand and financial loss. In this paper, we propose a hybrid feature selection approach based combination of content-based and behaviour-based. The approach could mine the attacker behaviour based on email header. On a publicly available test corpus, our hybrid features selection is able to achieve 94% accuracy rate.