974 resultados para Classification Tree Pruning
Resumo:
Scene classification based on latent Dirichlet allocation (LDA) is a more general modeling method known as a bag of visual words, in which the construction of a visual vocabulary is a crucial quantization process to ensure success of the classification. A framework is developed using the following new aspects: Gaussian mixture clustering for the quantization process, the use of an integrated visual vocabulary (IVV), which is built as the union of all centroids obtained from the separate quantization process of each class, and the usage of some features, including edge orientation histogram, CIELab color moments, and gray-level co-occurrence matrix (GLCM). The experiments are conducted on IKONOS images with six semantic classes (tree, grassland, residential, commercial/industrial, road, and water). The results show that the use of an IVV increases the overall accuracy (OA) by 11 to 12% and 6% when it is implemented on the selected and all features, respectively. The selected features of CIELab color moments and GLCM provide a better OA than the implementation over CIELab color moment or GLCM as individuals. The latter increases the OA by only ∼2 to 3%. Moreover, the results show that the OA of LDA outperforms the OA of C4.5 and naive Bayes tree by ∼20%. © 2014 Society of Photo-Optical Instrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.8.083690]
Resumo:
Automatic generation of classification rules has been an increasingly popular technique in commercial applications such as Big Data analytics, rule based expert systems and decision making systems. However, a principal problem that arises with most methods for generation of classification rules is the overfit-ting of training data. When Big Data is dealt with, this may result in the generation of a large number of complex rules. This may not only increase computational cost but also lower the accuracy in predicting further unseen instances. This has led to the necessity of developing pruning methods for the simplification of rules. In addition, classification rules are used further to make predictions after the completion of their generation. As efficiency is concerned, it is expected to find the first rule that fires as soon as possible by searching through a rule set. Thus a suit-able structure is required to represent the rule set effectively. In this chapter, the authors introduce a unified framework for construction of rule based classification systems consisting of three operations on Big Data: rule generation, rule simplification and rule representation. The authors also review some existing methods and techniques used for each of the three operations and highlight their limitations. They introduce some novel methods and techniques developed by them recently. These methods and techniques are also discussed in comparison to existing ones with respect to efficient processing of Big Data.
Resumo:
This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.
Resumo:
Various popular machine learning techniques, like support vector machines, are originally conceived for the solution of two-class (binary) classification problems. However, a large number of real problems present more than two classes. A common approach to generalize binary learning techniques to solve problems with more than two classes, also known as multiclass classification problems, consists of hierarchically decomposing the multiclass problem into multiple binary sub-problems, whose outputs are combined to define the predicted class. This strategy results in a tree of binary classifiers, where each internal node corresponds to a binary classifier distinguishing two groups of classes and the leaf nodes correspond to the problem classes. This paper investigates how measures of the separability between classes can be employed in the construction of binary-tree-based multiclass classifiers, adapting the decompositions performed to each particular multiclass problem. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.
Resumo:
Quando se considera aptidão climática, as plantas frutíferas são classificadas em: tropicais, subtropicais e temperadas. Esta tradicional classificação, por muito tempo, mostrou-se bastante efetiva. Os mais atuais conhecimentos dos centros de origens de diferentes espécies, os avanços tecnológicos na condução dos pomares e na conservação dos frutos e especialmente o melhoramento genético criaram condições excepcionais para o cultivo de espécies tropicais e temperadas em clima subtropical. No presente trabalho foram selecionadas as culturas da atemoieira, do caquizeiro, da figueira e da goiabeira com base não apenas na importância nacional e regional, mas também pelas diferentes contribuições que a pesquisa científica ofereceu a estas frutíferas. Atemoieira - dentre as espécies frutíferas exploradas em larga escala, talvez seja a de mais recente introdução de cultivo no Brasil, iniciado em meados da década de 1980. Diversas técnicas de cultivo foram desenvolvidas, como porta-enxertos mais adequados para cada região, podas de formação e produção, polinização artificial, manejo de pragas e doenças, e diversas outras tecnologias que permitiram rápida expansão da cultura em diversas regiões do País. Embora o importante papel das Universidades, Institutos de Pesquisas e Extensão seja inquestionável, foi fundamental a contribuição dos produtores pioneiros que iniciaram a busca de soluções para os problemas surgidos, indicando as necessidades para intervenções da pesquisa. Caquizeiro - a produção brasileira de caqui (IBGE - 2009), de 171.555 t, é obtida em uma área de 8.770 ha e representa um valor de 146,67 milhões de reais. São Estados maiores produtores São Paulo (111.646 t), Rio Grande do Sul, Paraná e Rio de Janeiro). As principais cultivares em produção são: Rama Forte, Giombo e Fuyu, que são comercializados prioritariamente no mercado interno. Figueira - a produção brasileira de figos vem mantendo-se com pequenas variações nos anos de 2000, atingindo 24.146 t em 2009 (IBGE - IBRAF), sendo os Estados do Rio Grande do Sul e São Paulo, os maiores produtores . No Estado de São Paulo, o cultivo concentra-se quase que exclusivamente na região de Campinas, sendo a produção de 9.469 t em 2010 (IEA). Os frutos colhidos graças à tecnologia desenvolvida é, em parte, exportada como figo de mesa (1.645 t em 2008). Fonte DECEX (MICT) IBRAF - 2010. Goiabeira - o cultivo da goiabeira no Brasil permite considerá-la atualmente como uma espécie plenamente adaptada ao clima subtropical. O desenvolvimento de variedades adaptadas e técnicas especiais de cultivo propiciaram grande expansão desta cultura no Brasil. Segundo o IBGE - IBRAF, em 2009, o Brasil produziu 297.377 t em uma área de 15.048 ha. Pernambuco, São Paulo, Brasília, Rio de Janeiro e Bahia são os principais produtores. No Estado de São Paulo, é importante destacar a produção de goiabas para mesa (50.000 t) que graças à alta qualidade dos frutos é exportado com sucesso.
Resumo:
The evidentiary basis of the currently accepted classification of living amphibians is discussed and shown not to warrant the degree of authority conferred on it by use and tradition. A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one. This new taxonomy is based on the largest phylogenetic analysis of living Amphibia so far accomplished. We combined the comparative anatomical character evidence of Haas (2003) with DNA sequences from the mitochondrial transcription unit HI (12S and 16S ribosomal RNA and tRNA(Valine) genes, 2,400 bp of mitochondrial sequences) and the nuclear genes histone H3, rhodopsin, tyrosinase, and seven in absentia, and the large ribosomal subunit 28S (approximate to 2,300 bp of nuclear sequences; ca. 1.8 million base pairs; x ($) over bar = 3.7 kb/terminal). The dataset includes 532 terminals sampled from 522 species representative of the global diversity of amphibians as well as seven of the closest living relatives of amphibians for outgroup comparisons.
Resumo:
In this article we describe a feature extraction algorithm for pattern classification based on Bayesian Decision Boundaries and Pruning techniques. The proposed method is capable of optimizing MLP neural classifiers by retaining those neurons in the hidden layer that realy contribute to correct classification. Also in this article we proposed a method which defines a plausible number of neurons in the hidden layer based on the stem-and-leaf graphics of training samples. Experimental investigation reveals the efficiency of the proposed method. © 2002 IEEE.
Resumo:
Powdery mildew of rubber tree caused by Oidium heveae is an important disease of rubber plantations worldwide. Identification and classification of this fungus is still uncertain because there is no authoritative report of its morphology and no record of its teleomorphic stage. In this study, we compared five specimens of the rubber powdery mildew fungus collected in Malaysia, Thailand, and Brazil based on morphological and molecular characteristics. Morphological results showed that the fungus on rubber tree belongs to Oidium subgen. Pseudoidium. Nucleotide sequence analysis of the ribosomal DNA internal transcribed spacer (ITS) region and the large subunit rRNA gene (28S rDNA) were conducted to determine the relationships of the rubber powdery mildew fungus and to link this anamorphic fungus with its allied teleomorph. The results showed that the rDNA sequences of the two specimens from Malaysia were identical to a specimen from Thailand, whereas they differed by three bases from the two Brazilian isolates: one nucleotide position in the ITS2 and two positions in the 28S sequences. The ITS sequences of the two Brazilian isolates were identical to sequences of Erysiphe sp. on Quercus phillyraeoides collected in Japan, although the 28S sequences differed at one base from sequences of this fungus. Phylogenetic trees of both rDNA regions constructed by the distance and parsimony methods showed that the rubber powdery mildew fungus grouped with Erysiphe sp. on Q. phillyraeoides with 100% bootstrap support. Comparisons of the anamorph of two isolates of Erysiphe sp. from Q. phillyraeoides with the rubber mildew did not reveal any obvious differences between the two powdery mildew taxa, which suggests that O. heveae may be an anamorph of Erysiphe sp. on Q. phillyraeoides. Cross-inoculation tests are required to substantiate this conclusion. © The Mycological Society of Japan and Springer-Verlag 2005.
Resumo:
We present a molecular phylogenetic analysis of caenophidian (advanced) snakes using sequences from two mitochondrial genes (12S and 16S rRNA) and one nuclear (c-mos) gene (1681 total base pairs), and with 131 terminal taxa sampled from throughout all major caenophidian lineages but focussing on Neotropical xenodontines. Direct optimization parsimony analysis resulted in a well-resolved phylogenetic tree, which corroborates some clades identified in previous analyses and suggests new hypotheses for the composition and relationships of others. The major salient points of our analysis are: (1) placement of Acrochordus, Xenodermatids, and Pareatids as successive outgroups to all remaining caenophidians (including viperids, elapids, atractaspidids, and all other colubrid groups); (2) within the latter group, viperids and homalopsids are sucessive sister clades to all remaining snakes; (3) the following monophyletic clades within crown group caenophidians: Afro-Asian psammophiids (including Mimophis from Madagascar), Elapidae (including hydrophiines but excluding Homoroselaps), Pseudoxyrhophiinae, Colubrinae, Natricinae, Dipsadinae, and Xenodontinae. Homoroselaps is associated with atractaspidids. Our analysis suggests some taxonomic changes within xenodontines, including new taxonomy for Alsophis elegans, Liophis amarali, and further taxonomic changes within Xenodontini and the West Indian radiation of xenodontines. Based on our molecular analysis, we present a revised classification for caenophidians and provide morphological diagnoses for many of the included clades; we also highlight groups where much more work is needed. We name as new two higher taxonomic clades within Caenophidia, one new subfamily within Dipsadidae, and, within Xenodontinae five new tribes, six new genera and two resurrected genera. We synonymize Xenoxybelis and Pseudablabes with Philodryas; Erythrolamprus with Liophis; and Lystrophis and Waglerophis with Xenodon.
Resumo:
This paper describes an investigation of the hybrid PSO/ACO algorithm to classify automatically the well drilling operation stages. The method feasibility is demonstrated by its application to real mud-logging dataset. The results are compared with bio-inspired methods, and rule induction and decision tree algorithms for data mining. © 2009 Springer Berlin Heidelberg.
Resumo:
The cultivation of fruit plants from temperate climate in tropical or subtropical regions can be a good income alternative for the producer. However, due to the little existent information about cultivation of those fruit plants, the producers use imported techniques of other producing areas, or even an association of practices used for other fruit plants, pointing out the leaf spray fertilization of micronutrients without appropriate scientific base. In this context, the objective of this study was to verify the effect of the leaf spray fertilization of B and Zn on productivity and fruit quality of Japanese pear tree. The experiment was conducted from 2004 to 2005, in Ilha Solteira, in northwestern São Paulo State-Brazil. The climate is, according to the Köpppen Classification, tropical wet and dry (Aw). The 'Okusankichi' cultivar, grafted on Pyrus communis L. rootstock was used as well as doses of 110 g.ha-1 of B and 250 g.ha-1 of Zn in each application. The treatments were: T1. water, T2. boric acid, T3. zinc sulfate, T4. T2 + T3, T5. boric acid + urea + citric acid + EDTA, T6. zinc sulfate + urea + citric acid + EDTA, T7. T5 + T6, T8. boric acid + urea + citric acid + EDTA + sodium molibdate + sulfur + calcium chloride, T9. zinc sulfate + urea + citric acid + EDTA + Fe sulfate + Mn sulfate + Mg sulfate and, T10. T8+T9. A randomized blocks design was used and the averages were compared by Tukey test. In the first crop the mixture of boric acid with quelating agents were efficient to supply B to the plants and zinc sulfate plus quelating agents were efficient to increase Zn leaf content. However, the productivity and the fruit quality were not influenced by the leaf spray of B and Zn. In the second crop the leaf content of B and Zn and the productivity were not influenced by the leaf spray; the boric acid and the zinc sulfate with or without quelating agents increased the contents of total soluble solids and, the boric acid with or without quelating agents increased the contents of total titratable acidity.
Resumo:
In this paper we would like to shed light the problem of efficiency and effectiveness of image classification in large datasets. As the amount of data to be processed and further classified has increased in the last years, there is a need for faster and more precise pattern recognition algorithms in order to perform online and offline training and classification procedures. We deal here with the problem of moist area classification in radar image in a fast manner. Experimental results using Optimum-Path Forest and its training set pruning algorithm also provided and discussed. © 2011 IEEE.
Resumo:
Pattern recognition in large amount of data has been paramount in the last decade, since that is not straightforward to design interactive and real time classification systems. Very recently, the Optimum-Path Forest classifier was proposed to overcome such limitations, together with its training set pruning algorithm, which requires a parameter that has been empirically set up to date. In this paper, we propose a Harmony Search-based algorithm that can find near optimal values for that. The experimental results have showed that our algorithm is able to find proper values for the OPF pruning algorithm parameter. © 2011 IEEE.
Resumo:
Fourty-two White Leghorns laying hens, from the commercial Cuban hybrid L-33, were used for eight weeks during the laying peak (36 to 43 weeks of age), to assess the substitution of corn by cassava root meal (Manihot esculenta Crantz) and the crude soybean oil by crude oil of African palm tree (Elaeis guineensis J.) in the diets of laying hens. Analysis of variance was conducted, according to simple classification design, with three treatments and 14 repetitions (a cage with a hen). The treatments consisted of three diets (1- corn meal + soybean oil; 2- 25 % cassava meal + African palm tree oil; 3- 53 % cassava meal + African palm tree oil), with 15.71 % CP; 3.83 % Ca and 0.36 % P available. The viability was of 100 % in all treatments. No differences were found for laying (92.21, 92.09 and 91.59 %), which surpassed the potential of this hybrid during the laying peak (90 %), conversion (118g feedstuff/egg in the three treatments), egg mass produced (3066, 3114 and 3071 g/bird) and mass conversion (1.99, 1.95 y 1.98 feed consumed/egg mass). The pigmentation of the egg yolk was reduced as the level of cassava meal increased in the diets (6, 4 and 3 at Roche's scale), as well as the cost of the feed consumed in 56 d per hen (2.56, 2.15 and 1.83 USD/bird). The possibility of substituting, totally, corn meal by that of cassava and soybean oil by that of the African palm tree in the diets of laying hens during the laying peak was determined, with positive economic effect and without damaging the productive performance of birds.