2 resultados para prediction accuracy
em Brock University, Canada
Resumo:
Feature selection plays an important role in knowledge discovery and data mining nowadays. In traditional rough set theory, feature selection using reduct - the minimal discerning set of attributes - is an important area. Nevertheless, the original definition of a reduct is restrictive, so in one of the previous research it was proposed to take into account not only the horizontal reduction of information by feature selection, but also a vertical reduction considering suitable subsets of the original set of objects. Following the work mentioned above, a new approach to generate bireducts using a multi--objective genetic algorithm was proposed. Although the genetic algorithms were used to calculate reduct in some previous works, we did not find any work where genetic algorithms were adopted to calculate bireducts. Compared to the works done before in this area, the proposed method has less randomness in generating bireducts. The genetic algorithm system estimated a quality of each bireduct by values of two objective functions as evolution progresses, so consequently a set of bireducts with optimized values of these objectives was obtained. Different fitness evaluation methods and genetic operators, such as crossover and mutation, were applied and the prediction accuracies were compared. Five datasets were used to test the proposed method and two datasets were used to perform a comparison study. Statistical analysis using the one-way ANOVA test was performed to determine the significant difference between the results. The experiment showed that the proposed method was able to reduce the number of bireducts necessary in order to receive a good prediction accuracy. Also, the influence of different genetic operators and fitness evaluation strategies on the prediction accuracy was analyzed. It was shown that the prediction accuracies of the proposed method are comparable with the best results in machine learning literature, and some of them outperformed it.
Resumo:
The purpose of this study is to examine the impact of the choice of cut-off points, sampling procedures, and the business cycle on the accuracy of bankruptcy prediction models. Misclassification can result in erroneous predictions leading to prohibitive costs to firms, investors and the economy. To test the impact of the choice of cut-off points and sampling procedures, three bankruptcy prediction models are assessed- Bayesian, Hazard and Mixed Logit. A salient feature of the study is that the analysis includes both parametric and nonparametric bankruptcy prediction models. A sample of firms from Lynn M. LoPucki Bankruptcy Research Database in the U. S. was used to evaluate the relative performance of the three models. The choice of a cut-off point and sampling procedures were found to affect the rankings of the various models. In general, the results indicate that the empirical cut-off point estimated from the training sample resulted in the lowest misclassification costs for all three models. Although the Hazard and Mixed Logit models resulted in lower costs of misclassification in the randomly selected samples, the Mixed Logit model did not perform as well across varying business-cycles. In general, the Hazard model has the highest predictive power. However, the higher predictive power of the Bayesian model, when the ratio of the cost of Type I errors to the cost of Type II errors is high, is relatively consistent across all sampling methods. Such an advantage of the Bayesian model may make it more attractive in the current economic environment. This study extends recent research comparing the performance of bankruptcy prediction models by identifying under what conditions a model performs better. It also allays a range of user groups, including auditors, shareholders, employees, suppliers, rating agencies, and creditors' concerns with respect to assessing failure risk.