37 resultados para Ensemble doublement résolvant

em Deakin Research Online - Australia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Determining the causal structure of a domain is frequently a key task in the area of Data Mining and Knowledge Discovery. This paper introduces ensemble learning into linear causal model discovery, then examines several algorithms based on different ensemble strategies including Bagging, Adaboost and GASEN. Experimental results show that (1) Ensemble discovery algorithm can achieve an improved result compared with individual causal discovery algorithm in terms of accuracy; (2) Among all examined ensemble discovery algorithms, BWV algorithm which uses a simple Bagging strategy works excellently compared to other more sophisticated ensemble strategies; (3) Ensemble method can also improve the stability of parameter estimation. In addition, Ensemble discovery algorithm is amenable to parallel and distributed processing, which is important for data mining in large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Accurate prediction of the roll separating force is critical to assuring the quality of the final product in steel manufacturing. This paper presents an ensemble model that addresses these concerns. A stacked generalisation approach to ensemble modeling is used with two sets of the ensemble model members, the first set being learnt from the current input-output data of the hot rolling finishing mill, while another uses the available information on the previous coil in addition to the current information. Both sets of ensemble members include linear regression, multilayer perceptron, and k-nearest neighbor algorithms. A competitive selection model (multilayer perceptron) is then used to select the output from one of the ensemble members to be the final output of the ensemble model. The ensemble model created by such a stacked generalization is able to achieve extremely high accuracy in predicting the roll separation force with the average relative accuracy being within 1% of the actual measured roll force.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of ensemble models in many problem domains has increased significantly in the last fewyears. The ensemble modeling, in particularly boosting, has shown a great promise in improving predictive performance of a model. Combining the ensemble members is normally done in a co-operative fashion where each of the ensemble members performs the same task and their predictions are aggregated to obtain the improved performance. However, it is also possible to combine the ensemble members in a competitive fashion where the best prediction of a relevant ensemble member is selected for a particular input. This option has been previously somewhat overlooked. The aim of this article is to investigate and compare the competitive and co-operative approaches to combining the models in the ensemble. A comparison is made between a competitive ensemble model and that of MARS with bagging, mixture of experts, hierarchical mixture of experts and a neural network ensemble over several public domain regression problems that have a high degree of nonlinearity and noise. The empirical results showa substantial advantage of competitive learning versus the co-operative learning for all the regression problems investigated. The requirements for creating the efficient ensembles and the available guidelines are also discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parameter Estimation is one of the key issues involved in the discovery of graphical models from data. Current state of the art methods have demonstrated their abilities in different kind of graphical models. In this paper, we introduce ensemble learning into the process of parameter estimation, and examine ensemble parameter estimation methods for different kind of graphical models under complete data set and incomplete data set. We provide experimental results which show that ensemble method can achieve an improved result over the base parameter estimation method in terms of accuracy. In addition, the method is amenable to parallel or distributed processing, which is an important characteristic for data mining in large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ensemble learning that combines the decisions of multiple weak classifiers to from an output, has recently emerged as an effective identification method. This paper presents a road-sign identification system based upon the ensemble learning approach. The system identifies the regions of interest that are extracted from the scene into the road-sign groups that they belong to. A large road-sign image dataset is formed and used to train and test the system. Fifteen groups of road signs are chosen for identification. Five experiments are performed and the results are presented and discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, the impact of the size of the training set on the benefit from ensemble, i.e. the gains obtained by employing ensemble learning paradigms, is empirically studied. Experiments on Bagged/ Boosted J4.8 decision trees with/without pruning show that enlarging the training set tends to improve the benefit from Boosting but does not significantly impact the benefit from Bagging. This phenomenon is then explained from the view of bias-variance reduction. Moreover, it is shown that even for Boosting, the benefit does not always increase consistently along with the increase of the training set size since single learners sometimes may learn relatively more from additional training data that are randomly provided than ensembles do. Furthermore, it is observed that the benefit from ensemble of unpruned decision trees is usually bigger than that from ensemble of pruned decision trees. This phenomenon is then explained from the view of error-ambiguity balance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Different data classification algorithms have been developed and applied in various areas to analyze and extract valuable information and patterns from large datasets with noise and missing values. However, none of them could consistently perform well over all datasets. To this end, ensemble methods have been suggested as the promising measures. This paper proposes a novel hybrid algorithm, which is the combination of a multi-objective Genetic Algorithm (GA) and an ensemble classifier. While the ensemble classifier, which consists of a decision tree classifier, an Artificial Neural Network (ANN) classifier, and a Support Vector Machine (SVM) classifier, is used as the classification committee, the multi-objective Genetic Algorithm is employed as the feature selector to facilitate the ensemble classifier to improve the overall sample classification accuracy while also identifying the most important features in the dataset of interest. The proposed GA-Ensemble method is tested on three benchmark datasets, and compared with each individual classifier as well as the methods based on mutual information theory, bagging and boosting. The results suggest that this GA-Ensemble method outperform other algorithms in comparison, and be a useful method for classification and feature selection problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method is presented that achieves lung nodule detection by classification of nodule and non-nodule patterns. It is based on random forests which are ensemble learners that grow classification trees. Each tree produces a classification decision, and an integrated output is calculated. The performance of the developed method is compared against that of the support vector machine and the decision tree methods. Three experiments are performed using lung scans of 32 patients including thousands of images within which nodule locations are marked by expert radiologists. The classification errors and execution times are presented and discussed. The lowest classification error (2.4%) has been produced by the developed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis made outstanding contribution in automating the discovery of linear causal models. It introduced a highly efficient discovery algorithm, which implements new encoding, ensemble and accelerating strategies. Theoretic research and experimental work showed that this new discovery algorithm outperforms the previous system in both accuracy and efficiency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, many scholars make use of fusion of filters to enhance the performance of spam filtering. In the past several years, a lot of effort has been devoted to different ensemble methods to achieve better performance. In reality, how to select appropriate ensemble methods towards spam filtering is an unsolved problem. In this paper, we investigate this problem through designing a framework to compare the performances among various ensemble methods. It is helpful for researchers to fight spam email more effectively in applied systems. The experimental results indicate that online based methods perform well on accuracy, while the off-line batch methods are evidently influenced by the size of data set. When a large data set is involved, the performance of off-line batch methods is not at par with online methods, and in the framework of online methods, the performance of parallel ensemble is better when using complex filters only.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a dual-random ensemble multi-label classification method for classification of multi-label data. The method is formed by integrating and extending the concepts of feature subspace method and random k-label set ensemble multi-label classification method. Experiemental results show that the developed method outperforms the exisiting multi-lable classification methods on three different multi-lable datasets including the biological yeast and genbase datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a triple-random ensemble learning method for handling multi-label classification problems. The proposed method integrates and develops the concepts of random subspace, bagging and random k-label sets ensemble learning methods to form an approach to classify multi-label data. It applies the random subspace method to feature space, label space as well as instance space. The devised subsets selection procedure is executed iteratively. Each multi-label classifier is trained using the randomly selected subsets. At the end of the iteration, optimal parameters are selected and the ensemble MLC classifiers are constructed. The proposed method is implemented and its performance compared against that of popular multi-label classification methods. The experimental results reveal that the proposed method outperforms the examined counterparts in most occasions when tested on six small to larger multi-label datasets from different domains. This demonstrates that the developed method possesses general applicability for various multi-label classification problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.
Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Anti-spam technology is developing rapidly in recent years. With the emerging applications of machine learning in diverse fields, researchers as well as manufacturers around the world have attempted a large number of related algorithms to prevent spam. In this paper, we designed an effective anti-spam protection system, SpamCooling, based on the mechanism of active learning and parallel heterogeneous ensemble learning techniques. The system adopts a batch method to filter spam and can be easily incorporated with existing mail clients (MUA). It can actively obtain user feedbacks for providing users with personalized spam filtering experiences. The parallel heterogeneous ensemble method can help system achieve high spam detection rate as well as low ham misclassification rate.