997 resultados para rule induction


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En este proyecto se analiza y compara el comportamiento del algoritmo CTC diseñado por el grupo de investigación ALDAPA usando bases de datos muy desbalanceadas. En concreto se emplea un conjunto de bases de datos disponibles en el sitio web asociado al proyecto KEEL (http://sci2s.ugr.es/keel/index.php) y que han sido ya utilizadas con diferentes algoritmos diseñados para afrontar el problema de clases desbalanceadas (Class imbalance problem) en el siguiente trabajo: A. Fernandez, S. García, J. Luengo, E. Bernadó-Mansilla, F. Herrera, "Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy and Comparative Study". IEEE Transactions on Evolutionary Computation 14:6 (2010) 913-941, http://dx.doi.org/10.1109/TEVC.2009.2039140 Las bases de datos (incluidas las muestras del cross-validation), junto con los resultados obtenidos asociados a la experimentación de este trabajo se pueden encontrar en un sitio web creado a tal efecto: http://sci2s.ugr.es/gbml/. Esto hace que los resultados del CTC obtenidos con estas muestras sean directamente comparables con los obtenidos por todos los algoritmos obtenidos en este trabajo.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

No presente trabalho foram utilizados modelos de classificação para minerar dados relacionados à aprendizagem de Matemática e ao perfil de professores do ensino fundamental. Mais especificamente, foram abordados os fatores referentes aos educadores do Estado do Rio de Janeiro que influenciam positivamente e negativamente no desempenho dos alunos do 9 ano do ensino básico nas provas de Matemática. Os dados utilizados para extrair estas informações são disponibilizados pelo Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira que avalia o sistema educacional brasileiro em diversos níveis e modalidades de ensino, incluindo a Educação Básica, cuja avaliação, que foi foco deste estudo, é realizada pela Prova Brasil. A partir desta base, foi aplicado o processo de Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases), composto das etapas de preparação, mineração e pós-processamento dos dados. Os padrões foram extraídos dos modelos de classificação gerados pelas técnicas árvore de decisão, indução de regras e classificadores Bayesianos, cujos algoritmos estão implementados no software Weka (Waikato Environment for Knowledge Analysis). Além disso, foram aplicados métodos de grupos e uma metodologia para tornar as classes uniformemente distribuídas, afim de melhorar a precisão dos modelos obtidos. Os resultados apresentaram importantes fatores que contribuem para o ensino-aprendizagem de Matemática, assim como evidenciaram aspectos que comprometem negativamente o desempenho dos discentes. Por fim, os resultados extraídos fornecem ao educador e elaborador de políticas públicas fatores para uma análise que os auxiliem em posteriores tomadas de decisão.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Q. Shen and R. Jensen, 'Selecting Informative Features with Fuzzy-Rough Sets and its Application for Complex Systems Monitoring,' Pattern Recognition, vol. 37, no. 7, pp. 1351-1363, 2004.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

M. Galea and Q. Shen. Fuzzy rules from ant-inspired computation. Proceedings of the 13th International Conference on Fuzzy Systems, pages 1691-1696, 2004.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

M. Galea and Q. Shen. FRANTIC - A system for inducing accurate and comprehensible fuzzy rules. Proceedings of the 2004 UK Workshop on Computational Intelligence, pages 136-143.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

M. Galea and Q. Shen. Linguistic hedges for ant-generated rules. Proceedings of the 15th International Conference on Fuzzy Systems, pages 9105-9112, 2006.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

M. Galea and Q. Shen. Simultaneous ant colony optimisation algorithms for learning linguistic fuzzy rules. A. Abraham, C. Grosan and V. Ramos (Eds.), Swarm Intelligence in Data Mining, pages 75-99.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis proposes a framework for identifying the root-cause of a voltage disturbance, as well as, its source location (upstream/downstream) from the monitoring place. The framework works with three-phase voltage and current waveforms collected in radial distribution networks without distributed generation. Real-world and synthetic waveforms are used to test it. The framework involves features that are conceived based on electrical principles, and assuming some hypothesis on the analyzed phenomena. Features considered are based on waveforms and timestamp information. Multivariate analysis of variance and rule induction algorithms are applied to assess the amount of meaningful information explained by each feature, according to the root-cause of the disturbance and its source location. The obtained classification rates show that the proposed framework could be used for automatic diagnosis of voltage disturbances collected in radial distribution networks. Furthermore, the diagnostic results can be subsequently used for supporting power network operation, maintenance and planning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There is an increasing interest in the application of Evolutionary Algorithms (EAs) to induce classification rules. This hybrid approach can benefit areas where classical methods for rule induction have not been very successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when one or more classes heavily outnumber other classes. Frequently, classical machine learning (ML) classifiers are not able to learn in the presence of imbalanced data sets, inducing classification models that always predict the most numerous classes. In this work, we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are fed to classical ML systems that produce rule sets. The rule sets are combined creating a pool of rules and an EA is used to build a classifier from this pool of rules. This hybrid approach has some advantages over undersampling, since it reduces the amount of discarded information, and some advantages over oversampling, since it avoids overfitting. The proposed approach was experimentally analysed and the experimental results show an improvement in the classification performance measured as the area under the receiver operating characteristics (ROC) curve.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes an investigation of the hybrid PSO/ACO algorithm to classify automatically the well drilling operation stages. The method feasibility is demonstrated by its application to real mud-logging dataset. The results are compared with bio-inspired methods, and rule induction and decision tree algorithms for data mining. © 2009 Springer Berlin Heidelberg.