929 resultados para ensemble classifiers


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the top~ tier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi~tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi~level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is devoted to empirical investigation of novel multi-level ensemble meta classifiers for the detection and monitoring of progression of cardiac autonomic neuropathy, CAN, in diabetes patients. Our experiments relied on an extensive database and concentrated on ensembles of ensembles, or multi-level meta classifiers, for the classification of cardiac autonomic neuropathy progression. First, we carried out a thorough investigation comparing the performance of various base classifiers for several known sets of the most essential features in this database and determined that Random Forest significantly and consistently outperforms all other base classifiers in this new application. Second, we used feature selection and ranking implemented in Random Forest. It was able to identify a new set of features, which has turned out better than all other sets considered for this large and well-known database previously. Random Forest remained the very best classier for the new set of features too. Third, we investigated meta classifiers and new multi-level meta classifiers based on Random Forest, which have improved its performance. The results obtained show that novel multi-level meta classifiers achieved further improvement and obtained new outcomes that are significantly better compared with the outcomes published in the literature previously for cardiac autonomic neuropathy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for big data. These classifiers are very large, but are quite easy to generate and use. They can be so large that it makes sense to use them only for big data. They are generated automatically as a result of several iterations in applying ensemble meta classifiers. They incorporate diverse ensemble meta classifiers into several tiers simultaneously and combine them into one automatically generated iterative system so that many ensemble meta classifiers function as integral parts of other ensemble meta classifiers at higher tiers. In this paper, we carry out a comprehensive investigation of the performance of LIME classifiers for a problem concerning security of big data. Our experiments compare LIME classifiers with various base classifiers and standard ordinary ensemble meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of classifications. LIME classifiers performed better than the base classifiers and standard ensemble meta classifiers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Different data classification algorithms have been developed and applied in various areas to analyze and extract valuable information and patterns from large datasets with noise and missing values. However, none of them could consistently perform well over all datasets. To this end, ensemble methods have been suggested as the promising measures. This paper proposes a novel hybrid algorithm, which is the combination of a multi-objective Genetic Algorithm (GA) and an ensemble classifier. While the ensemble classifier, which consists of a decision tree classifier, an Artificial Neural Network (ANN) classifier, and a Support Vector Machine (SVM) classifier, is used as the classification committee, the multi-objective Genetic Algorithm is employed as the feature selector to facilitate the ensemble classifier to improve the overall sample classification accuracy while also identifying the most important features in the dataset of interest. The proposed GA-Ensemble method is tested on three benchmark datasets, and compared with each individual classifier as well as the methods based on mutual information theory, bagging and boosting. The results suggest that this GA-Ensemble method outperform other algorithms in comparison, and be a useful method for classification and feature selection problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper is devoted to multi-tier ensemble classifiers for the detection and filtering of phishing emails. We introduce a new construction of ensemble classifiers, based on the well known and productive multi-tier approach. Our experiments evaluate their performance for the detection and filtering of phishing emails. The multi-tier constructions are well known and have been used to design effective classifiers for email classification and other applications previously. We investigate new multi-tier ensemble classifiers, where diverse ensemble methods are combined in a unified system by incorporating different ensembles at a lower tier as an integral part of another ensemble at the top tier. Our novel contribution is to investigate the possibility and effectiveness of combining diverse ensemble methods into one large multi-tier ensemble for the example of detection and filtering of phishing emails. Our study handled a few essential ensemble methods and more recent approaches incorporated into a combined multi-tier ensemble classifier. The results show that new large multi-tier ensemble classifiers achieved better performance compared with the outcomes of the base classifiers and ensemble classifiers incorporated in the multi-tier system. This demonstrates that the new method of combining diverse ensembles into one unified multi-tier ensemble can be applied to increase the performance of classifiers if diverse ensembles are incorporated in the system.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This article is devoted to a new iterative construction of hierarchical classifiers in SimpleCLI for the detection of phishing websites. Our new construction of hierarchical systems creates ensembles of ensembles in SimpleCLI by iteratively linking a top-level ensemble to another middle-level ensemble instead of a base classifier so that the top-level ensemble can generate a large multilevel system. This new construction makes it easy to set up and run such large systems in SimpleCLI. The present article concentrates on the investigation of performance of the iterative construction of such classifiers for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of the iterative construction of hierarchical classifiers. The results presented here demonstrate that the iterative construction of hierarchical classifiers performed better than the base classifiers and standard ensembles. This example of application to the classification of phishing websites shows that the new iterative construction combining diverse ensemble techniques into the iterative construction of hierarchical classifiers can be applied to increase the performance in situations where data can be processed on a large computer. © 2014 ACADEMY PUBLISHER.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The reduction of size of ensemble classifiers is important for various security applications. The majority of known pruning algorithms belong to the following three categories: ranking based, clustering based, and optimization based methods. The present paper introduces and investigates a new pruning technique. It is called a Three-Level Pruning Technique, TLPT, because it simultaneously combines all three approaches in three levels of the process. This paper investigates the TLPT method combining the state-of-the-art ranking of the Ensemble Pruning via Individual Contribution ordering, EPIC, the clustering of the K-Means Pruning, KMP, and the optimisation method of Directed Hill Climbing Ensemble Pruning, DHCEP, for a phishing dataset. Our new experiments presented in this paper show that the TLPT is competitive in comparison to EPIC, KMP and DHCEP, and can achieve better outcomes. These experimental results demonstrate the effectiveness of the TLPT technique in this example of information security application.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Blood biochemistry attributes form an important class of tests, routinely collected several times per year for many patients with diabetes. The objective of this study is to investigate the role of blood biochemistry for improving the predictive accuracy of the diagnosis of cardiac autonomic neuropathy (CAN) progression. Blood biochemistry contributes to CAN, and so it is a causative factor that can provide additional power for the diagnosis of CAN especially in the absence of a complete set of Ewing tests. We introduce automated iterative multitier ensembles (AIME) and investigate their performance in comparison to base classifiers and standard ensemble classifiers for blood biochemistry attributes. AIME incorporate diverse ensembles into several tiers simultaneously and combine them into one automatically generated integrated system so that one ensemble acts as an integral part of another ensemble. We carried out extensive experimental analysis using large datasets from the diabetes screening research initiative (DiScRi) project. The results of our experiments show that several blood biochemistry attributes can be used to supplement the Ewing battery for the detection of CAN in situations where one or more of the Ewing tests cannot be completed because of the individual difficulties faced by each patient in performing the tests. The results show that AIME provide higher accuracy as a multitier CAN classification paradigm. The best predictive accuracy of 99.57% has been obtained by the AIME combining decorate on top tier with bagging on middle tier based on random forest. Practitioners can use these findings to increase the accuracy of CAN diagnosis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of the maps obtained from remote sensing orbital images submitted to digital processing became fundamental to optimize conservation and monitoring actions of the coral reefs. However, the accuracy reached in the mapping of submerged areas is limited by variation of the water column that degrades the signal received by the orbital sensor and introduces errors in the final result of the classification. The limited capacity of the traditional methods based on conventional statistical techniques to solve the problems related to the inter-classes took the search of alternative strategies in the area of the Computational Intelligence. In this work an ensemble classifiers was built based on the combination of Support Vector Machines and Minimum Distance Classifier with the objective of classifying remotely sensed images of coral reefs ecosystem. The system is composed by three stages, through which the progressive refinement of the classification process happens. The patterns that received an ambiguous classification in a certain stage of the process were revalued in the subsequent stage. The prediction non ambiguous for all the data happened through the reduction or elimination of the false positive. The images were classified into five bottom-types: deep water; under-water corals; inter-tidal corals; algal and sandy bottom. The highest overall accuracy (89%) was obtained from SVM with polynomial kernel. The accuracy of the classified image was compared through the use of error matrix to the results obtained by the application of other classification methods based on a single classifier (neural network and the k-means algorithm). In the final, the comparison of results achieved demonstrated the potential of the ensemble classifiers as a tool of classification of images from submerged areas subject to the noise caused by atmospheric effects and the water column

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Feature selection is important in medical field for many reasons. However, selecting important variables is a difficult task with the presence of censoring that is a unique feature in survival data analysis. This paper proposed an approach to deal with the censoring problem in endovascular aortic repair survival data through Bayesian networks. It was merged and embedded with a hybrid feature selection process that combines cox's univariate analysis with machine learning approaches such as ensemble artificial neural networks to select the most relevant predictive variables. The proposed algorithm was compared with common survival variable selection approaches such as; least absolute shrinkage and selection operator LASSO, and Akaike information criterion AIC methods. The results showed that it was capable of dealing with high censoring in the datasets. Moreover, ensemble classifiers increased the area under the roc curves of the two datasets collected from two centers located in United Kingdom separately. Furthermore, ensembles constructed with center 1 enhanced the concordance index of center 2 prediction compared to the model built with a single network. Although the size of the final reduced model using the neural networks and its ensembles is greater than other methods, the model outperformed the others in both concordance index and sensitivity for center 2 prediction. This indicates the reduced model is more powerful for cross center prediction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação (mestrado)—Universidade de Brasília, Faculdade de Economia, Administração e Contabilidade, Programa de Pós-Graduação em Administração, 2016.