997 resultados para Tree islands


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present TANC, i.e., a tree-augmented naive credal classifier based on imprecise probabilities; it models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM) (Cano et al., 2007) and deals conservatively with missing data in the training set, without assuming them to be missing-at-random. The EDM is an approximation of the global Imprecise Dirichlet Model (IDM), which considerably simplifies the computation of upper and lower probabilities; yet, having been only recently introduced, the quality of the provided approximation needs still to be verified. As first contribution, we extensively compare the output of the naive credal classifier (one of the few cases in which the global IDM can be exactly implemented) when learned with the EDM and the global IDM; the output of the classifier appears to be identical in the vast majority of cases, thus supporting the adoption of the EDM in real classification problems. Then, by experiments we show that TANC is more reliable than the precise TAN (learned with uniform prior), and also that it provides better performance compared to a previous (Zaffalon, 2003) TAN model based on imprecise probabilities. TANC treats missing data by considering all possible completions of the training set, but avoiding an exponential increase of the computational times; eventually, we present some preliminary results with missing data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper strengthens the NP-hardness result for the (partial) maximum a posteriori (MAP) problem in Bayesian networks with topology of trees (every variable has at most one parent) and variable cardinality at most three. MAP is the problem of querying the most probable state configuration of some (not necessarily all) of the network variables given evidence. It is demonstrated that the problem remains hard even in such simplistic networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Extensive drilling of the Great Barrier Reef (GBR) in the 70s and 80s illuminated the main factors controlling reef growth during the Holocene. However, questions remain about: (1) the precise nature and timing of reef "turnon" or initiation, (2) whether consistent spatio-temporal patterns occur in the bio-sedimentologic response of the reef to Holocene sea-level rise then stability, and (3) how these factors are expressed in the context of the different evolutionary states (juvenile-mature-senile reefs). Combining 21 new C14-AMS and 146 existing recalibrated radiocarbon and U/Th ages, we investigated the detailed spatial and temporal variations in sedimentary facies and coralgal assemblages in fifteen cores across four reefs (Wreck, Fairfax, One Tree and Fitzroy) from the Southern GBR. Our newly defined facies and assemblages record distinct chronostratigraphic patterns in the cores, displaying both lateral zonation across the different reefs and shallowing upwards sequences, characterised by a transition from deep (Porites/faviids) to shallow (Acropora/Isopora) coral types. The revised reef accretion curves show a significant lag period, ranging from 0.7-2 ka, between flooding of the antecedent Pleistocene substrate and Holocene reef turn-on. This lag period and dominance of more environmentally tolerant early colonizers (e.g., domal Porites and faviids), suggests initial conditions that were unfavourable for coral growth. We contend that higher input of fine siliciclastic material from regional terrigenous sources, exposure to hydrodynamic forces and colonisation in deeper waters are the main factors influencing initially reduced growth and development. All four reefs record a time lag and we argue that the size and shape of the antecedent platform is most important in determining the duration between flooding and recolonisation of the Holocene reef. Finally, our study of Capricorn Bunker Group Holocene reefs suggests that the size and shape of the antecedent substrate has a greater impact on reef evolution and final evolutionary state (mature vs. senile), than substrate depth alone. 

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents a new general purpose classifier named Averaged Extended Tree Augmented Naive Bayes (AETAN), which is based on combining the advantageous characteristics of Extended Tree Augmented Naive Bayes (ETAN) and Averaged One-Dependence Estimator (AODE) classifiers. We describe the main properties of the approach and algorithms for learning it, along with an analysis of its computational time complexity. Empirical results with numerous data sets indicate that the new approach is superior to ETAN and AODE in terms of both zero-one classification accuracy and log loss. It also compares favourably against weighted AODE and hidden Naive Bayes. The learning phase of the new approach is slower than that of its competitors, while the time complexity for the testing phase is similar. Such characteristics suggest that the new classifier is ideal in scenarios where online learning is not required.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds' algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). We enhance our procedure with a new score function that only takes into account arcs that are relevant to predict the class, as well as an optimization over the equivalent sample size during learning. These ideas may be useful for structure learning of Bayesian networks in general. A range of experiments shows that we obtain models with better prediction accuracy than naive Bayes and TAN, and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator (AODE). We release our implementation of ETAN so that it can be easily installed and run within Weka.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning Bayesian networks with bounded tree-width has attracted much attention recently, because low tree-width allows exact inference to be performed efficiently. Some existing methods [12, 14] tackle the problem by using k-trees to learn the optimal Bayesian network with tree-width up to k. In this paper, we propose a sampling method to efficiently find representative k-trees by introducing an Informative score function to characterize the quality of a k-tree. The proposed algorithm can efficiently learn a Bayesian network with tree-width at most k. Experiment results indicate that our approach is comparable with exact methods, but is much more computationally efficient.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Native pig breeds in the Iberian Peninsula are broadly classified as belonging to either the Celtic or the Mediterranean breed groups, but there are other local populations that do not fit into any of these groups. Most of the native pig breeds in Iberia are in danger of extinction, and the assessment of their genetic diversity and population structure, relationships and possible admixture between breeds, and the appraisal of conservation alternatives are crucial to adopt appropriate management strategies. Methods: A panel of 24 microsatellite markers was used to genotype 844 animals representing the 17 most important native swine breeds and wild populations existing in Portugal and Spain and various statistical tools were applied to analyze the results. Results: Genetic diversity was high in the breeds studied, with an overall mean of 13.6 alleles per locus and an average expected heterozygosity of 0.80. Signs of genetic bottlenecks were observed in breeds with a small census size, and population substructure was present in some of the breeds with larger census sizes. Variability among breeds accounted for about 20% of the total genetic diversity, and was explained mostly by differences among the Celtic, Mediterranean and Basque breed groups, rather than by differences between domestic and wild pigs. Breeds clustered closely according to group, and proximity was detected between wild pigs and the Mediterranean cluster of breeds. Most breeds had their own structure and identity, with very little evidence of admixture, except for the Retinto and Entrepelado varieties of the Mediterranean group, which are very similar. Genetic influence of the identified breed clusters extends beyond the specific geographical areas across borders throughout the Iberian Peninsula, with a very sharp transition from one breed group to another. Analysis of conservation priorities confirms that the ranking of a breed for conservation depends on the emphasis placed on its contribution to the betweenand within-breed components of genetic diversity. Conclusions: Native pig breeds in Iberia reveal high levels of genetic diversity, a solid breed structure and a clear organization in well-defined clusters.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dissertação mest., Engenharia Biológica, Universidade do Algarve, 2009

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dissertação de mest., Gestão da Água e da Costa, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2010