66 resultados para feature selection

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Selecting a set of features which is optimal for a given task is the problem which plays an important role in a wide variety of contexts including pattern recognition, images understanding and machine learning. The concept of reduction of the decision table based on the rough set is very useful for feature selection. In this paper, a genetic algorithm based approach is presented to search the relative reduct decision table of the rough set. This approach has the ability to accommodate multiple criteria such as accuracy and cost of classification into the feature selection process and finds the effective feature subset for texture classification . On the basis of the effective feature subset selected, this paper presents a method to extract the objects which are higher than their surroundings, such as trees or forest, in the color aerial images. The experiments results show that the feature subset selected and the method of the object extraction presented in this paper are practical and effective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spam is commonly defined as unsolicited email messages and the goal of spam filtering is to differentiate spam from legitimate email. Much work have been done to filter spam from legitimate emails using machine learning algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In this paper, architecture of spam filtering has been proposed based on support vector machine (SVM,) which will get better accuracy by reducing FP problems. In this architecture an innovative technique for feature selection called dynamic feature selection (DFS) has been proposed which is enhanced the overall performance of the architecture with reduction of FP problems. The experimental result shows that the proposed technique gives better performance compare to similar existing techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Selecting a set of features which is optimal for a given task is a problem which plays an important role in a wide variety of contexts including pattern recognition, images understanding and machine learning. The paper describes an application of rough sets method to feature selection and reduction in texture images recognition. The proposed methods include continuous data discretization based on Kohonen neural network and maximum covariance, and rough set algorithms for feature selection and reduction. The experiments on trees extraction from aerial images show that the methods presented in this paper are practical and effective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Different data classification algorithms have been developed and applied in various areas to analyze and extract valuable information and patterns from large datasets with noise and missing values. However, none of them could consistently perform well over all datasets. To this end, ensemble methods have been suggested as the promising measures. This paper proposes a novel hybrid algorithm, which is the combination of a multi-objective Genetic Algorithm (GA) and an ensemble classifier. While the ensemble classifier, which consists of a decision tree classifier, an Artificial Neural Network (ANN) classifier, and a Support Vector Machine (SVM) classifier, is used as the classification committee, the multi-objective Genetic Algorithm is employed as the feature selector to facilitate the ensemble classifier to improve the overall sample classification accuracy while also identifying the most important features in the dataset of interest. The proposed GA-Ensemble method is tested on three benchmark datasets, and compared with each individual classifier as well as the methods based on mutual information theory, bagging and boosting. The results suggest that this GA-Ensemble method outperform other algorithms in comparison, and be a useful method for classification and feature selection problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection is an important technique in dealing with application problems with large number of variables and limited training samples, such as image processing, combinatorial chemistry, and microarray analysis. Commonly employed feature selection strategies can be divided into filter and wrapper. In this study, we propose an embedded two-layer feature selection approach to combining the advantages of filter and wrapper algorithms while avoiding their drawbacks. The hybrid algorithm, called GAEF (Genetic Algorithm with embedded filter), divides the feature selection process into two stages. In the first stage, Genetic Algorithm (GA) is employed to pre-select features while in the second stage a filter selector is used to further identify a small feature subset for accurate sample classification. Three benchmark microarray datasets are used to evaluate the proposed algorithm. The experimental results suggest that this embedded two-layer feature selection strategy is able to improve the stability of the selection results as well as the sample classification accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Anti-malware software producers are continually challenged to identify and counter new malware as it is released into the wild. A dramatic increase in malware production in recent years has rendered the conventional method of manually determining a signature for each new malware sample untenable. This paper presents a scalable, automated approach for detecting and classifying malware by using pattern recognition algorithms and statistical methods at various stages of the malware analysis life cycle. Our framework combines the static features of function length and printable string information extracted from malware samples into a single test which gives classification results better than those achieved by using either feature individually. In our testing we input feature information from close to 1400 unpacked malware samples to a number of different classification algorithms. Using k-fold cross validation on the malware, which includes Trojans and viruses, along with 151 clean files, we achieve an overall classification accuracy of over 98%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phishing emails are more dynamic and cause high risk of significant data, brand and financial loss to average computer user and organizations. To address this problem, we propose a hybrid feature selection approach based on combination of content-based and behavior-based. Our proposed hybrid features selections are able to achieve 93% accuracy rate as compared to other approaches. In addition, we successfully tested the quality of our proposed behavior-based feature using the Information Gain, Gain Ratio and Symmetrical Uncertainty.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Phishing emails are more active than ever before and putting the average computer user and organizations at risk of significant data, brand and financial loss. Through an analysis of a number of phishing and ham email collected, this paper focused on fundamental attacker behavior which could be extracted from email header. It also put forward a hybrid feature selection approach based on combination of content-based and behavior-based. The approach could mine the attacker behavior based on email header. On a publicly available test corpus, our hybrid features selections are able to achieve 96% accuracy rate. In addition, we successfully tested the quality of our proposed behavior-based feature using the information gain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A good intrusion system gives an accurate and efficient classification results. This ability is an essential functionality to build an intrusion detection system. In this paper, we focused on using various training functions with feature selection to achieve high accurate results. The data we used in our experiments are NSL-KDD. However, the training and testing time to build the model is very high. To address this, we proposed feature selection based on information gain, which can detect several attack types with high accurate result and low false rate. Moreover, we executed experiments to category each of the five classes (probe, denial of service (DoS), user to super-user (U2R), and remote to local (R2L), normal). Our proposed outperform other state-of-art methods.