1000 resultados para rule generation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

K. Rasmani and Q. Shen. Data-driven fuzzy rule generation and its application for student academic performance evaluation. Applied Intelligence, 25(3):305-319, 2006.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Automatic generation of classification rules has been an increasingly popular technique in commercial applications such as Big Data analytics, rule based expert systems and decision making systems. However, a principal problem that arises with most methods for generation of classification rules is the overfit-ting of training data. When Big Data is dealt with, this may result in the generation of a large number of complex rules. This may not only increase computational cost but also lower the accuracy in predicting further unseen instances. This has led to the necessity of developing pruning methods for the simplification of rules. In addition, classification rules are used further to make predictions after the completion of their generation. As efficiency is concerned, it is expected to find the first rule that fires as soon as possible by searching through a rule set. Thus a suit-able structure is required to represent the rule set effectively. In this chapter, the authors introduce a unified framework for construction of rule based classification systems consisting of three operations on Big Data: rule generation, rule simplification and rule representation. The authors also review some existing methods and techniques used for each of the three operations and highlight their limitations. They introduce some novel methods and techniques developed by them recently. These methods and techniques are also discussed in comparison to existing ones with respect to efficient processing of Big Data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Classification methods are usually used to categorize text documents, such as, Rocchio method, Naïve bayes based method, and SVM based text classification method. These methods learn labeled text documents and then construct classifiers. The generated classifiers can predict which category is located for a new coming text document. The keywords in the document are often used to form rules to categorize text documents, for example “kw = computer” can be a rule for the IT documents category. However, the number of keywords is very large. To select keywords from the large number of keywords is a challenging work. Recently, a rule generation method based on enumeration of all possible keywords combinations has been proposed [2]. In this method, there remains a crucial problem: how to prune irrelevant combinations at the early stages of the rule generation procedure. In this paper, we propose a method than can effectively prune irrelative keywords at an early stage.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract With the phenomenal growth of electronic data and information, there are many demands for the development of efficient and effective systems (tools) to perform the issue of data mining tasks on multidimensional databases. Association rules describe associations between items in the same transactions (intra) or in different transactions (inter). Association mining attempts to find interesting or useful association rules in databases: this is the crucial issue for the application of data mining in the real world. Association mining can be used in many application areas, such as the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase, called pattern mining, is the discovery of frequent patterns. The second phase, called rule generation, is the discovery of interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns; these also include much noise. The second phase is also a time consuming activity that can generate many redundant rules. To improve the quality of association mining in databases, this thesis provides an alternative technique, granule-based association mining, for knowledge discovery in databases, where a granule refers to a predicate that describes common features of a group of transactions. The new technique first transfers transaction databases into basic decision tables, then uses multi-tier structures to integrate pattern mining and rule generation in one phase for both intra and inter transaction association rule mining. To evaluate the proposed new technique, this research defines the concept of meaningless rules by considering the co-relations between data-dimensions for intratransaction-association rule mining. It also uses precision to evaluate the effectiveness of intertransaction association rules. The experimental results show that the proposed technique is promising.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models. We develop two clustering algorithms toward the outlined goal of building interpretable and reconfigurable cluster models. They generate clusters with associated rules that are composed of conditions on word occurrences or nonoccurrences. The proposed approaches vary in the complexity of the format of the rules; RGC employs disjunctions and conjunctions in rule generation whereas RGC-D rules are simple disjunctions of conditions signifying presence of various words. In both the cases, each cluster is comprised of precisely the set of documents that satisfy the corresponding rule. Rules of the latter kind are easy to interpret, whereas the former leads to more accurate clustering. We show that our approaches outperform the unsupervised decision tree approach for rule-generating clustering and also an approach we provide for generating interpretable models for general clusterings, both by significant margins. We empirically show that the purity and f-measure losses to achieve interpretability can be as little as 3 and 5%, respectively using the algorithms presented herein.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many classification methods have been proposed to find patterns in text documents. However, according to Occam's razor principle, "the explanation of any phenomenon should make as few assumptions as possible", short patterns usually have more explainable and meaningful for classifying text documents. In this paper, we propose a depth-first pattern generation algorithm, which can find out short patterns from text document more effectively, comparing with breadth-first algorithm

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The continuously rising Internet attacks pose severe challenges to develop an effective Intrusion Detection System (IDS) to detect known and unknown malicious attack. In order to address the problem of detecting known, unknown attacks and identify an attack grouped, the authors provide a new multi stage rules for detecting anomalies in multi-stage rules. The authors used the RIPPER for rule generation, which is capable to create rule sets more quickly and can determine the attack types with smaller numbers of rules. These rules would be efficient to apply for Signature Intrusion Detection System (SIDS) and Anomaly Intrusion Detection System (AIDS).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Forecasting bike sharing demand is of paramount importance for management of fleet in city level. Rapidly changing demand in this service is due to a number of factors including workday, weekend, holiday and weather condition. These nonlinear dependencies make the prediction a difficult task. This work shows that type-1 and type-2 fuzzy inference-based prediction mechanisms can capture this highly variable trend with good accuracy. Wang-Mendel rule generation method is utilized to generate rule base and then only current information like date related information and weather condition is used to forecast bike share demand at any given point in future. Simulation results reveal that fuzzy inference predictors can potentially outperform traditional feed forward neural network in terms of prediction accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work deals with the car sequencing (CS) problem, a combinatorial optimization problem for sequencing mixed-model assembly lines. The aim is to find a production sequence for different variants of a common base product, such that work overload of the respective line operators is avoided or minimized. The variants are distinguished by certain options (e.g., sun roof yes/no) and, therefore, require different processing times at the stations of the line. CS introduces a so-called sequencing rule H:N for each option, which restricts the occurrence of this option to at most H in any N consecutive variants. It seeks for a sequence that leads to no or a minimum number of sequencing rule violations. In this work, CS’ suitability for workload-oriented sequencing is analyzed. Therefore, its solution quality is compared in experiments to the related mixed-model sequencing problem. A new sequencing rule generation approach as well as a new lower bound for the problem are presented. Different exact and heuristic solution methods for CS are developed and their efficiency is shown in experiments. Furthermore, CS is adjusted and applied to a resequencing problem with pull-off tables.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation anti-virus engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current anti-virus engines in detecting malware. In this paper, we propose a stepwise binary logistic regression-based dimensionality reduction techniques for malware detection using application program interface (API) call statistics. Finding the most significant malware feature using traditional wrapper-based approaches takes an exponential complexity of the dimension (m) of the dataset with a brute-force search strategies and order of (m-1) complexity with a backward elimination filter heuristics. The novelty of the proposed approach is that it finds the worst case computational complexity which is less than order of (m-1). The proposed approach uses multi-linear regression and the p-value of each individual API feature for selection of the most uncorrelated and significant features in order to reduce the dimensionality of the large malware data and to ensure the absence of multi-collinearity. The stepwise logistic regression approach is then employed to test the significance of the individual malware feature based on their corresponding Wald statistic and to construct the binary decision the model. When the selected most significant APIs are used in a decision rule generation systems, this approach not only reduces the tree size but also improves classification performance. Exhaustive experiments on a large malware data set show that the proposed approach clearly exceeds the existing standard decision rule, support vector machine-based template approach with complete data and provides a better statistical fitness.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Artificial neural networks have a good potential to be employed for fault diagnosis and condition monitoring problems in complex processes. In this paper, the applicability of the fuzzy ARTMAP (FAM) neural network as an intelligent learning system for fault detection and diagnosis in a power generation plant is described. The process under scrutiny is the circulating water (CW) system, with specific attention to the conditions of heat transfer and tube blockage in the CW system. A series of experiments has been conducted systematically to investigate the effectiveness of FAM in fault detection and diagnosis tasks. In addition, a set of domain rules has been extracted from the trained FAM network so that its predictions can be explained and justified. The outcomes demonstrate the benefits of employing FAM as an intelligent fault detection and diagnosis tool with an explanatory capability for monitoring and diagnosing complex processes in power generation plants.