939 resultados para tree-augmented-Naive Bayes structure


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Tree Augmented Naïve Bayes (TAN) classifier relaxes the sweeping independence assumptions of the Naïve Bayes approach by taking account of conditional probabilities. It does this in a limited sense, by incorporating the conditional probability of each attribute given the class and (at most) one other attribute. The method of boosting has previously proven very effective in improving the performance of Naïve Bayes classifiers and in this paper, we investigate its effectiveness on application to the TAN classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds’ algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). A range of experiments show that we obtain models with better accuracy than TAN and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds' algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). We enhance our procedure with a new score function that only takes into account arcs that are relevant to predict the class, as well as an optimization over the equivalent sample size during learning. These ideas may be useful for structure learning of Bayesian networks in general. A range of experiments shows that we obtain models with better prediction accuracy than naive Bayes and TAN, and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator (AODE). We release our implementation of ETAN so that it can be easily installed and run within Weka.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work presents a new general purpose classifier named Averaged Extended Tree Augmented Naive Bayes (AETAN), which is based on combining the advantageous characteristics of Extended Tree Augmented Naive Bayes (ETAN) and Averaged One-Dependence Estimator (AODE) classifiers. We describe the main properties of the approach and algorithms for learning it, along with an analysis of its computational time complexity. Empirical results with numerous data sets indicate that the new approach is superior to ETAN and AODE in terms of both zero-one classification accuracy and log loss. It also compares favourably against weighted AODE and hidden Naive Bayes. The learning phase of the new approach is slower than that of its competitors, while the time complexity for the testing phase is similar. Such characteristics suggest that the new classifier is ideal in scenarios where online learning is not required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we apply the incremental EM method to Bayesian Network Classifiers to learn and interpret hyperspectral sensor data in robotic planetary missions. Hyperspectral image spectroscopy is an emerging technique for geological investigations from airborne or orbital sensors. Many spacecraft carry spectroscopic equipment as wavelengths outside the visible light in the electromagnetic spectrum give much greater information about an object. The algorithm used is an extension to the standard Expectation Maximisation (EM). The incremental method allows us to learn and interpret the data as they become available. Two Bayesian network classifiers were tested: the Naive Bayes, and the Tree-Augmented-Naive Bayes structures. Our preliminary experiments show that incremental learning with unlabelled data can improve the accuracy of the classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present TANC, a TAN classifier (tree-augmented naive) based on imprecise probabilities. TANC models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM). A first contribution of this paper is the experimental comparison between EDM and the global Imprecise Dirichlet Model using the naive credal classifier (NCC), with the aim of showing that EDM is a sensible approximation of the global IDM. TANC is able to deal with missing data in a conservative manner by considering all possible completions (without assuming them to be missing-at-random), but avoiding an exponential increase of the computational time. By experiments on real data sets, we show that TANC is more reliable than the Bayesian TAN and that it provides better performance compared to previous TANs based on imprecise probabilities. Yet, TANC is sometimes outperformed by NCC because the learned TAN structures are too complex; this calls for novel algorithms for learning the TAN structures, better suited for an imprecise probability classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present TANC, i.e., a tree-augmented naive credal classifier based on imprecise probabilities; it models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM) (Cano et al., 2007) and deals conservatively with missing data in the training set, without assuming them to be missing-at-random. The EDM is an approximation of the global Imprecise Dirichlet Model (IDM), which considerably simplifies the computation of upper and lower probabilities; yet, having been only recently introduced, the quality of the provided approximation needs still to be verified. As first contribution, we extensively compare the output of the naive credal classifier (one of the few cases in which the global IDM can be exactly implemented) when learned with the EDM and the global IDM; the output of the classifier appears to be identical in the vast majority of cases, thus supporting the adoption of the EDM in real classification problems. Then, by experiments we show that TANC is more reliable than the precise TAN (learned with uniform prior), and also that it provides better performance compared to a previous (Zaffalon, 2003) TAN model based on imprecise probabilities. TANC treats missing data by considering all possible completions of the training set, but avoiding an exponential increase of the computational times; eventually, we present some preliminary results with missing data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper aims to identify the communication goal(s) of a user's information-seeking query out of a finite set of within-domain goals in natural language queries. It proposes using Tree-Augmented Naive Bayes networks (TANs) for goal detection. The problem is formulated as N binary decisions, and each is performed by a TAN. Comparative study has been carried out to compare the performance with Naive Bayes, fully-connected TANs, and multi-layer neural networks. Experimental results show that TANs consistently give better results when tested on the ATIS and DARPA Communicator corpora.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims/hypothesis: Diabetic nephropathy is a major diabetic complication, and diabetes is the leading cause of end-stage renal disease (ESRD). Family studies suggest a hereditary component for diabetic nephropathy. However, only a few genes have been associated with diabetic nephropathy or ESRD in diabetic patients. Our aim was to detect novel genetic variants associated with diabetic nephropathy and ESRD. Methods: We exploited a novel algorithm, ‘Bag of Naive Bayes’, whose marker selection strategy is complementary to that of conventional genome-wide association models based on univariate association tests. The analysis was performed on a genome-wide association study of 3,464 patients with type 1 diabetes from the Finnish Diabetic Nephropathy (FinnDiane) Study and subsequently replicated with 4,263 type 1 diabetes patients from the Steno Diabetes Centre, the All Ireland-Warren 3-Genetics of Kidneys in Diabetes UK collection (UK–Republic of Ireland) and the Genetics of Kidneys in Diabetes US Study (GoKinD US). Results: Five genetic loci (WNT4/ZBTB40-rs12137135, RGMA/MCTP2-rs17709344, MAPRE1P2-rs1670754, SEMA6D/SLC24A5-rs12917114 and SIK1-rs2838302) were associated with ESRD in the FinnDiane study. An association between ESRD and rs17709344, tagging the previously identified rs12437854 and located between the RGMA and MCTP2 genes, was replicated in independent case–control cohorts. rs12917114 near SEMA6D was associated with ESRD in the replication cohorts under the genotypic model (p < 0.05), and rs12137135 upstream of WNT4 was associated with ESRD in Steno. Conclusions/interpretation: This study supports the previously identified findings on the RGMA/MCTP2 region and suggests novel susceptibility loci for ESRD. This highlights the importance of applying complementary statistical methods to detect novel genetic variants in diabetic nephropathy and, in general, in complex diseases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Changes in the configuration of a tree stern result insignificant differences in its total volume and in the proportion of that volume that is merchantable timber. Tree allometry, as represented by stem-fo~, is the result of the vertical force of gravity and the horizontal force of wind. The effect of wind force is demonstrated in the relationship between stem-form, standclosure and site-conditions. An increase in wind force on the individual tree due to a decrease in stand density should produce a more tapered tree. The density of the stand is determined by the conditions that the trees are growing under. The ability of the tree to respond to increased wind force may also be a function of these conditions . This stem-form/stand-closure/site-conditions relationship was examined using a pre-existing database from westcentral Alberta. This database consisted of environmental, vegetation, soils and timber data covering a wide range of sites. There were 653 sample trees with 82 variables that formed the basis of the analysis. There were eight tree species consisting of Pinus contorta, Picea mariana, Picea engelmannii x glauca, Abies lasiocarpa, Larix laricina, Populus tremuloides, Betula papyrifera and Populus balsamifera plus a comprehensive all-species data set. As the actual conformation of the stern is very individual, stem-fo~was represented by the diameter at breast height to total height r~tio. The four stand-closure variables, crown closure, total basal area, total volume and total number of stems were reduced to total basal area and total number of stems utilizing a bivariate correlation matrix by species. Site-conditions were subdivided into macro, meso and micro variables and reduced in number 3 using cross-tabulations, bivariate correlation and principal components analysis as screening tools. The stem-fo~/stand-closure relationship was examined using bivariate correlation coefficients for stem-fo~ with total number of stems and stem-fo~ with total basal area. The stem-fo~/site-conditions and the stand-closure/site- conditions relationships were examined using multiple correlation coefficients. The stem-form/stand-closure/site-conditions relationship was examined using multiple correlation coefficients in separate analyses for both total number of stems and total basal area. An increase in stand-closure produced a decrease in stem-form for both total number of stems and total basal area for most species. There was a significant relationship between stem-form and site-conditions and between stand-closure and site-conditions for both total number of stems and total basal area for most species. There was a significant relationship between the stemform and site-conditions, including the stand-closure, for most species; total number of stems was involved independently of the site-conditions in the prediction of stem-form and total basal area was not. Larix laricina and Betula papyrifera were the exceptions to the trends observed with most species. The influence of both stand-closure (total number of stems in particular) and site-conditions (elevation in particular) suggest that forest management practices should include these- ecological parameters in determining appropriate restocking levels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech processing and consequent recognition are important areas of Digital Signal Processing since speech allows people to communicate more natu-rally and efficiently. In this work, a speech recognition system is developed for re-cognizing digits in Malayalam. For recognizing speech, features are to be ex-tracted from speech and hence feature extraction method plays an important role in speech recognition. Here, front end processing for extracting the features is per-formed using two wavelet based methods namely Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Naive Bayes classifier is used for classification purpose. After classification using Naive Bayes classifier, DWT produced a recognition accuracy of 83.5% and WPD produced an accuracy of 80.7%. This paper is intended to devise a new feature extraction method which produces improvements in the recognition accuracy. So, a new method called Dis-crete Wavelet Packet Decomposition (DWPD) is introduced which utilizes the hy-brid features of both DWT and WPD. The performance of this new approach is evaluated and it produced an improved recognition accuracy of 86.2% along with Naive Bayes classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In an area of tropical seasonal semideciduous forest, the soil characteristics, floristic composition, physiognomic structure, and the distribution of three regeneration and three dispersal guilds were studied for four stands within the forest that had documented histories of varying degrees of human disturbance. The aim was to study forest regeneration in areas of preserved forest and secondary forest, with parts of both types of forest experiencing either 'intensive' or 'occasional' cattle trampling. The study was carried out in the Sebastiao Aleixo da Silva Ecological Station, Bauru, São Paulo State, Brazil. Two stands were called 'secondary' because they corresponded to forest tracts that were felled and occupied by crops and pastures in the past and then abandoned to forest regeneration ca. 40 years before this study. The other two stands, called 'preserved', corresponded to areas of the fragment where the forest has been maintained with only minor human impacts. The arboreal component of the tree community (diameter at breast height or dbh greater than or equal to 5 cm) was sampled in 20 plots of 40 m x 40 m, and the subarboreal component (diameter at the base of the stem or dbs < 5 cm and height greater than or equal to 0.5 m) in subplots of 40 m x 2 m. Physiognomic features, such as canopy height and density of climbing plants, were registered all over a 5 m x 5 m gridline laid on the sample plots. Soil bulk samples were collected for chemical and textural analyses. Most detected differences contrasted the secondary to the preserved forest stands. The soils of the secondary stands showed higher proportions of sand and lower levels of mineral nutrients and organic matter than those of the preserved stands, probably due to higher losses by leaching and erosion. Compared to the secondary stands, the preserved ones had higher proportions of tall trees, higher mean canopy height, lower species diversity, higher abundance of autochorous and shade-tolerant climax species, and lower abundance of pioneer and light-demanding climax species. Despite the high proportion of species shared by the preserved and secondary stands (108 out of 139), they differed consistently in terms of density of the most abundant species. on the other hand, the secondary and preserved stands held similar values for tree density and basal area, suggesting that 40 years were enough to restore these features. Effects of cattle trampling on the vegetation were detected for the frequency of trees of anemochorous and zoochorous species, which were higher in the stands under occasional and intensive cattle trampling, respectively. The density of thin climbers was lower in the stands with intensive trampling. (C) 2004 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

All trees with diameter at breast height dbh >= 10.0 cm were stem-mapped in a "terra firme" tropical rainforest in the Brazilian Amazon, at the EMBRAPA Experimental Site, Manaus, Brazil. Specifically, the relationships of tree species with soil properties were determined by using canonical correspondence analyses based on nine soil variables and 68 tree species. From the canonical correspondence analyses, the species were grouped into two groups: one where species occur mainly in sandy sites, presenting low organic matter content; and another one where species occur mainly in dry and clayey sites. Hence, we used Ripley's K function to analyze the distribution of species in 32 plots ranging from 2,500 m(2) to 20,000 m(2) to determine whether each group presents some spatial aggregation as a soil variations result. Significant spatial aggregation for the two groups was found only at over 10,000 m(2) sampling units, particularly for those species found in clayey soils and drier environments, where the sampling units investigated seemed to meet the species requirements. Soil variables, mediated by topographic positions had influenced species spatial aggregation, mainly in an intermediate to large distances varied range (>= 20 m). Based on our findings, we conclude that environmental heterogeneity and 10,000 m(2) minimum sample unit sizes should be considered in forest dynamic studies in order to understand the spatial processes structuring the "terra firme" tropical rainforest in Brazilian Amazon.