998 resultados para Typological Classification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, many scholars make use of fusion of filters to enhance the performance of spam filtering. In the past several years, a lot of effort has been devoted to different ensemble methods to achieve better performance. In reality, how to select appropriate ensemble methods towards spam filtering is an unsolved problem. In this paper, we investigate this problem through designing a framework to compare the performances among various ensemble methods. It is helpful for researchers to fight spam email more effectively in applied systems. The experimental results indicate that online based methods perform well on accuracy, while the off-line batch methods are evidently influenced by the size of data set. When a large data set is involved, the performance of off-line batch methods is not at par with online methods, and in the framework of online methods, the performance of parallel ensemble is better when using complex filters only.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper uses error correcting codes for multilabel classification. BCH code and random forests learner are used to form the proposed method. Thus, the advantage of the error-correcting properties of BCH is merged with the good performance of the random forests learner to enhance the multilabel classification results. Three experiments are conducted on three common benchmark datasets. The results are compared against those of several exiting approaches. The proposed method does well against its counterparts for the three datasets of varying characteristics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last decade, the Internet email has become one of the primary method of communication used by everyone for the exchange of ideas and information. However, in recent years, along with the rapid growth of the Internet and email, there has been a dramatic growth in spam. Classifications algorithms have been successfully used to filter spam, but with a certain amount of false positive trade-offs. This problem is mainly caused by the dynamic nature of spam content, spam delivery strategies, as well as the diversification of the classification algorithms. This paper presents an approach of email classification to overcome the burden of analyzing technique of GL (grey list) analyser as further refinements of our previous multi-classifier based email classification [10]. In this approach, we introduce a “majority voting grey list (MVGL)” analyzing technique with two different variations which will analyze only the product of GL emails. Our empirical evidence proofs the improvements of this approach, in terms of complexity and cost, compared to existing GL analyser. This approach also overcomes the limitation of human interaction of existing analyzing technique.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Protecting user's mailbox from infiltration of phishing email is a significant research issue now a day. Many researches are going on filtering phishing using classification based algorithms and achieve substantial performance. It has been studied and investigated with different classification algorithms and observed that the outputs of the classifiers vary from one another with same corpora. This paper presents the impact of classifier rescheduling of multi-tier classification of phishing email to observe the best scheduling in the classification process. In our method, the features of phishing email will be extracted and classified in a sequential fashion by using the multi-tier classification and the outputs will be sent to the decision fusion process. Empirical evidence proofs that the impact of rescheduling of classifiers among the tiers gives diverse outcomes in terms of accuracy as well as number of false positive instances.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel multi-label classification framework for domains with large numbers of labels. Automatic image annotation is such a domain, as the available semantic concepts are typically hundreds. The proposed framework comprises an initial clustering phase that breaks the original training set into several disjoint clusters of data. It then trains a multi-label classifier from the data of each cluster. Given a new test instance, the framework first finds the nearest cluster and then applies the corresponding model. Empirical results using two clustering algorithms, four multi-label classification algorithms and three image annotation data sets suggest that the proposed approach can improve the performance and reduce the training time of standard multi-label classification algorithms, particularly in the case of large number of labels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a dual-random ensemble multi-label classification method for classification of multi-label data. The method is formed by integrating and extending the concepts of feature subspace method and random k-label set ensemble multi-label classification method. Experiemental results show that the developed method outperforms the exisiting multi-lable classification methods on three different multi-lable datasets including the biological yeast and genbase datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lung nodules can be detected through examining CT scans. An automated lung nodule classification system is presented in this paper. The system employs random forests as it base classifier. A unique architecture for classification-aided-by-clustering is presented. Four experiments are conducted to study the performance of the developed system. 5721 CT lung image slices from the LIDC database are employed in the experiments. According to the experimental results, the highest sensitivity of 97.92%, and specificty of 96.28% are achieved by the system. The results demonstrate that the system has improved the performances of its tested counterparts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An automated lung nodule detection system can help spot lung abnormalities in CT lung images. Lung nodule detection can be achieved using template-based, segmentation-based, and classification-based methods. The existing systems that include a classification component in their structures have demonstrated better performances than their counterparts. Ensemble learners combine decisions of multiple classifiers to form an integrated output. To improve the performance of automated lung nodule detection, an ensemble classification aided by clustering (CAC) method is proposed. The method takes advantage of the random forest algorithm and offers a structure for a hybrid random forest based lung nodule classification aided by clustering. Several experiments are carried out involving the proposed method as well as two other existing methods. The parameters of the classifiers are varied to identify the best performing classifiers. The experiments are conducted using lung scans of 32 patients including 5721 images within which nodule locations are marked by expert radiologists. Overall, the best sensitivity of 98.33% and specificity of 97.11% have been recorded for proposed system. Also, a high receiver operating characteristic (ROC) Az of 0.9786 has been achieved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an image to text translation platform consisting of image segmentation, region features extraction, region blobs clustering, and translation components. A multi-label learning method is suggested for realizing the translation component. Empirical studies show that the predictive performance of the translation component is better than its counterparts when employed a dual-random ensemble multi-label classification algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis includes the development of an architectural framework for the proposed image to text translation system containing four components. Selection of appropriate algorithms for the first three components developed three effective multi-label classification algorithms for the fourth component, i.e. the translation component, for different problem settings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The thesis investigates various machine learning approaches to reducing data dimensionality, and studies the impact of asymmetric data on learning in image retrieval. Efficient algorithms are proposed to reduce the data dimensionality. Integration strategies for one-class classification are designed to address asymmetric data issue and improve retrieval effectiveness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a multilabel classification method that employs an error correction code together with a base ensemble learner to deal with multilabel data. It explores two different error correction codes: convolutional code and BCH code. A random forest learner is used as its based learner. The performance of the proposed method is evaluated experimentally. The popular multilabel yeast dataset is used for benchmarking. The results are compared against those of several exiting approaches. The proposed method performs well against its counterparts.