1000 resultados para ID3 algorithm


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação para a obtenção do grau de Mestre em Engenharia Electrotécnica Ramo de Energia

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Inducing general functions from specific training examples is a central problem in the machine learning. Using sets of If-then rules is the most expressive and readable manner. To find If-then rules, many induction algorithms such as ID3, AQ, CN2 and their variants, were proposed. Sequential covering is the kernel technique of them. To avoid testing all possible selectors, Entropy gain is used to select the best attribute in ID3. Constraint of the size of star was introduced in AQ and beam search was adopted in CN2. These methods speed up their induction algorithms but many good selectors are filtered out. In this work, we introduce a new induction algorithm that is based on enumeration of all possible selectors. Contrary to the previous works, we use pruning power to reduce irrelative selectors. But we can guarantee that no good selectors are filtered out. Comparing with other techniques, the experiment results demonstrate
that the rules produced by our induction algorithm have high consistency and simplicity.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador: