922 resultados para Tree planting
Resumo:
We present TANC, a TAN classifier (tree-augmented naive) based on imprecise probabilities. TANC models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM). A first contribution of this paper is the experimental comparison between EDM and the global Imprecise Dirichlet Model using the naive credal classifier (NCC), with the aim of showing that EDM is a sensible approximation of the global IDM. TANC is able to deal with missing data in a conservative manner by considering all possible completions (without assuming them to be missing-at-random), but avoiding an exponential increase of the computational time. By experiments on real data sets, we show that TANC is more reliable than the Bayesian TAN and that it provides better performance compared to previous TANs based on imprecise probabilities. Yet, TANC is sometimes outperformed by NCC because the learned TAN structures are too complex; this calls for novel algorithms for learning the TAN structures, better suited for an imprecise probability classifier.
Resumo:
Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).
Resumo:
In this paper we present TANC, i.e., a tree-augmented naive credal classifier based on imprecise probabilities; it models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM) (Cano et al., 2007) and deals conservatively with missing data in the training set, without assuming them to be missing-at-random. The EDM is an approximation of the global Imprecise Dirichlet Model (IDM), which considerably simplifies the computation of upper and lower probabilities; yet, having been only recently introduced, the quality of the provided approximation needs still to be verified. As first contribution, we extensively compare the output of the naive credal classifier (one of the few cases in which the global IDM can be exactly implemented) when learned with the EDM and the global IDM; the output of the classifier appears to be identical in the vast majority of cases, thus supporting the adoption of the EDM in real classification problems. Then, by experiments we show that TANC is more reliable than the precise TAN (learned with uniform prior), and also that it provides better performance compared to a previous (Zaffalon, 2003) TAN model based on imprecise probabilities. TANC treats missing data by considering all possible completions of the training set, but avoiding an exponential increase of the computational times; eventually, we present some preliminary results with missing data.
Resumo:
This paper strengthens the NP-hardness result for the (partial) maximum a posteriori (MAP) problem in Bayesian networks with topology of trees (every variable has at most one parent) and variable cardinality at most three. MAP is the problem of querying the most probable state configuration of some (not necessarily all) of the network variables given evidence. It is demonstrated that the problem remains hard even in such simplistic networks.
Resumo:
This work presents a new general purpose classifier named Averaged Extended Tree Augmented Naive Bayes (AETAN), which is based on combining the advantageous characteristics of Extended Tree Augmented Naive Bayes (ETAN) and Averaged One-Dependence Estimator (AODE) classifiers. We describe the main properties of the approach and algorithms for learning it, along with an analysis of its computational time complexity. Empirical results with numerous data sets indicate that the new approach is superior to ETAN and AODE in terms of both zero-one classification accuracy and log loss. It also compares favourably against weighted AODE and hidden Naive Bayes. The learning phase of the new approach is slower than that of its competitors, while the time complexity for the testing phase is similar. Such characteristics suggest that the new classifier is ideal in scenarios where online learning is not required.
Resumo:
This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds' algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). We enhance our procedure with a new score function that only takes into account arcs that are relevant to predict the class, as well as an optimization over the equivalent sample size during learning. These ideas may be useful for structure learning of Bayesian networks in general. A range of experiments shows that we obtain models with better prediction accuracy than naive Bayes and TAN, and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator (AODE). We release our implementation of ETAN so that it can be easily installed and run within Weka.
Resumo:
Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [12, 14] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an Informative score function to characterize the quality of a k-tree. The proposed algorithm can efficiently learn a Bayesian network with tree-width at most k. Experiment results indicate that our approach is comparable with exact methods, but is much more computationally efficient.
Resumo:
Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.
Resumo:
Dissertação mest., Engenharia Biológica, Universidade do Algarve, 2009
Resumo:
Aguardente de medronho is the name given in Portugal to a spirit made from the fermented fruit of Arbutus unedo (strawberry tree), a plant grown in the Mediterranean region. In order to gain a better understanding of the fermentation process, as it is performed in the farms, a natural fermentation with wild microbiota was carried out during 36 days, and some physicochemical and microbiological parameters were studied. The microbial parameters analyzed were total viable, lactic and acetic acids bacteria, and yeast counts. The physicochemical parameters monitored were sugars, minerals, ethanol, organic acids and pH. Yeasts were the main responsible for the fermentation of the fruits, as the lactic and acetic acids bacteria are absent. As the fermentation progressed, the sugars increased during the first 2 days and gradually decreased along the fermentation period. Maintaining the good quality of the product could contribute to the preservation and valorization of traditional resources that are of great importance to prevent their disappearance.
Resumo:
Since 2004 several studies have been carried out in order to identify the main insect species that usually inhabiting the olive ecosystem. The field trials have taken place in two olive groves, one situated in Olhão and the other one in Loulé, both in Algarve and also under Integrated Pest Management (IPM). The sampling techniques used differ according to their purpose (sticky traps, pheromone traps, pitfall traps and samples of aerial parts of the trees such as inflorescences, leaves, fruits and branches). Results showed that the main insect pests of olive tree in southern Portugal were the olive fruit fly Bactrocera oleae Gmelin (Diptera: Tephritidae) and the olive moth Prays oleae Bernard (Lepidoptera: Hyponeumetidae). Other insect pests were also found in our olive groves namely the olive psyllid Euphyllura olivina Costa (Homoptera: Psyllidae), the olive dark beetle Phloeotribus scarabaeoides Bernard (Coleoptera: Curculionidae), the mediterranean black scale Saissetia oleae (Olivier) (Homoptera: Coccidae) and the olive thrip Liothripes oleae Costa (Thysanoptera: Phlaeothripidae). Concerning the auxiliary insects that were found in our olives groves they belong to the following orders and families: Diptera (Syrphidae), Coleoptera (Carabidae, Coccinelidae and Staphylinidae), Hemiptera (Anthocoridae and Miridae), Neuroptera (Chrysopidae) and Hymenoptera (Braconidae, Encyrtidae, Eulophidae, Formicidae and Trichogrammatidae).
Resumo:
In the Mediterranean region the fruits of the strawberry tree (Arbutus unedo L.) may be fermented and distilled to produce a traditional beverage very much appreciated in Southern Europe. The aim of the present work was to study the diversity of the yeast population and the killer activity of the isolates identified as Saccharomyces cerevisiae, obtained during solid state industrial fermentations of the arbutus berries. The identification of the isolates was performed by the 5.8S rRNA-ITS region restriction analysis and by sequencing the D1/D2 region of the large subunit of the rRNA gene. At the start of the fermentations, various non-Saccharomyces species were detected including Aureobasidium pullulans, Dothichiza pithyophila, Dioszegia zsoltii, Hanseniaspora uvarum and yeasts belonging to the genera Metschnikowia, Cryptococcus and Rhodotorula. However, as the biological processes progressed the number of different species decreased with S. cerevisiae and Pichia membranaefaciens becoming dominant at advanced stages of the must fermentation that is characterized by high concentrations of ethanol. Forty three isolates identified as S. cerevisiae were tested for killer activity against two sensitive reference strains and Zygosaccharomyces bailii. Their killer sensitivity in relation to five killer referenced toxins (K2, K5, K8, K9 and K10) was also studied. Out of the isolates analyzed, 95.3% were sensitive and 4.7% were tolerant against the killer toxins tested. Only three isolates revealed killer activity against one sensitive strain and two of them against the spoiler yeast Z. bailii. The microbiota obtained revealed an interesting potential to be used as starter cultures to overcome unpredictable uncontrolled fermentations of the arbutus fruits as well as in other applications of biotechnological interest. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Secure group communication is a paradigm that primarily designates one-to-many communication security. The proposed works relevant to secure group communication have predominantly considered the whole network as being a single group managed by a central powerful node capable of supporting heavy communication, computation and storage cost. However, a typical Wireless Sensor Network (WSN) may contain several groups, and each one is maintained by a sensor node (the group controller) with constrained resources. Moreover, the previously proposed schemes require a multicast routing support to deliver the rekeying messages. Nevertheless, multicast routing can incur heavy storage and communication overheads in the case of a wireless sensor network. Due to these two major limitations, we have reckoned it necessary to propose a new secure group communication with a lightweight rekeying process. Our proposal overcomes the two limitations mentioned above, and can be applied to a homogeneous WSN with resource-constrained nodes with no need for a multicast routing support. Actually, the analysis and simulation results have clearly demonstrated that our scheme outperforms the previous well-known solutions.