99 resultados para Classification Nomenclature


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This considers the challenging task of cancer prediction based on microarray data for the medical community. The research was conducted on mostly common cancers (breast, colon, long, prostate and leukemia) microarray data analysis, and suggests the use of modern machine learning techniques to predict cancer.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To develop an objective and repeatable method of identification and classification of animal fibres, two different integrated systems were developed to mimic the human brain's ability to undertake feature extraction and discrimination of animal fibres. Both integrated systems are basically composed of an image processing system and an artificial neural network system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis proposes an innovative adaptive multi-classifier spam filtering model, with a grey-list analyser and a dynamic feature selection method, to overcome false-positive problems in email classification. It also presents additional techniques to minimize the added complexity. Empirical evidence indicates the success of this model over existing approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classifying malware correctly is an important research issue for anti-malware software producers. This paper presents an effective and efficient malware classification technique based on string information using several wellknown classification algorithms. In our testing we extracted the printable strings from 1367 samples, including unpacked trojans and viruses and clean files. Information describing the printable strings contained in each sample was input to various classification algorithms, including treebased classifiers, a nearest neighbour algorithm, statistical algorithms and AdaBoost. Using k-fold cross validation on the unpacked malware and clean files, we achieved a classification accuracy of 97%. Our results reveal that strings from library code (rather than malicious code itself) can be utilised to distinguish different malware families.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, many scholars make use of fusion of filters to enhance the performance of spam filtering. In the past several years, a lot of effort has been devoted to different ensemble methods to achieve better performance. In reality, how to select appropriate ensemble methods towards spam filtering is an unsolved problem. In this paper, we investigate this problem through designing a framework to compare the performances among various ensemble methods. It is helpful for researchers to fight spam email more effectively in applied systems. The experimental results indicate that online based methods perform well on accuracy, while the off-line batch methods are evidently influenced by the size of data set. When a large data set is involved, the performance of off-line batch methods is not at par with online methods, and in the framework of online methods, the performance of parallel ensemble is better when using complex filters only.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper uses error correcting codes for multilabel classification. BCH code and random forests learner are used to form the proposed method. Thus, the advantage of the error-correcting properties of BCH is merged with the good performance of the random forests learner to enhance the multilabel classification results. Three experiments are conducted on three common benchmark datasets. The results are compared against those of several exiting approaches. The proposed method does well against its counterparts for the three datasets of varying characteristics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last decade, the Internet email has become one of the primary method of communication used by everyone for the exchange of ideas and information. However, in recent years, along with the rapid growth of the Internet and email, there has been a dramatic growth in spam. Classifications algorithms have been successfully used to filter spam, but with a certain amount of false positive trade-offs. This problem is mainly caused by the dynamic nature of spam content, spam delivery strategies, as well as the diversification of the classification algorithms. This paper presents an approach of email classification to overcome the burden of analyzing technique of GL (grey list) analyser as further refinements of our previous multi-classifier based email classification [10]. In this approach, we introduce a “majority voting grey list (MVGL)” analyzing technique with two different variations which will analyze only the product of GL emails. Our empirical evidence proofs the improvements of this approach, in terms of complexity and cost, compared to existing GL analyser. This approach also overcomes the limitation of human interaction of existing analyzing technique.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Protecting user's mailbox from infiltration of phishing email is a significant research issue now a day. Many researches are going on filtering phishing using classification based algorithms and achieve substantial performance. It has been studied and investigated with different classification algorithms and observed that the outputs of the classifiers vary from one another with same corpora. This paper presents the impact of classifier rescheduling of multi-tier classification of phishing email to observe the best scheduling in the classification process. In our method, the features of phishing email will be extracted and classified in a sequential fashion by using the multi-tier classification and the outputs will be sent to the decision fusion process. Empirical evidence proofs that the impact of rescheduling of classifiers among the tiers gives diverse outcomes in terms of accuracy as well as number of false positive instances.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel multi-label classification framework for domains with large numbers of labels. Automatic image annotation is such a domain, as the available semantic concepts are typically hundreds. The proposed framework comprises an initial clustering phase that breaks the original training set into several disjoint clusters of data. It then trains a multi-label classifier from the data of each cluster. Given a new test instance, the framework first finds the nearest cluster and then applies the corresponding model. Empirical results using two clustering algorithms, four multi-label classification algorithms and three image annotation data sets suggest that the proposed approach can improve the performance and reduce the training time of standard multi-label classification algorithms, particularly in the case of large number of labels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a dual-random ensemble multi-label classification method for classification of multi-label data. The method is formed by integrating and extending the concepts of feature subspace method and random k-label set ensemble multi-label classification method. Experiemental results show that the developed method outperforms the exisiting multi-lable classification methods on three different multi-lable datasets including the biological yeast and genbase datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lung nodules can be detected through examining CT scans. An automated lung nodule classification system is presented in this paper. The system employs random forests as it base classifier. A unique architecture for classification-aided-by-clustering is presented. Four experiments are conducted to study the performance of the developed system. 5721 CT lung image slices from the LIDC database are employed in the experiments. According to the experimental results, the highest sensitivity of 97.92%, and specificty of 96.28% are achieved by the system. The results demonstrate that the system has improved the performances of its tested counterparts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An automated lung nodule detection system can help spot lung abnormalities in CT lung images. Lung nodule detection can be achieved using template-based, segmentation-based, and classification-based methods. The existing systems that include a classification component in their structures have demonstrated better performances than their counterparts. Ensemble learners combine decisions of multiple classifiers to form an integrated output. To improve the performance of automated lung nodule detection, an ensemble classification aided by clustering (CAC) method is proposed. The method takes advantage of the random forest algorithm and offers a structure for a hybrid random forest based lung nodule classification aided by clustering. Several experiments are carried out involving the proposed method as well as two other existing methods. The parameters of the classifiers are varied to identify the best performing classifiers. The experiments are conducted using lung scans of 32 patients including 5721 images within which nodule locations are marked by expert radiologists. Overall, the best sensitivity of 98.33% and specificity of 97.11% have been recorded for proposed system. Also, a high receiver operating characteristic (ROC) Az of 0.9786 has been achieved.