947 resultados para Hierarchical Bayesian Methods
Resumo:
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.
Resumo:
Undeniably, anticipation plays a crucial role in cognition. By what means, to what extent, and what it achieves remain open questions. In a recent BBS target article, Clark (in press) depicts an integrative model of the brain that builds on hierarchical Bayesian models of neural processing (Rao and Ballard, 1999; Friston, 2005; Brown et al., 2011), and their most recent formulation using the free-energy principle borrowed from thermodynamics (Feldman and Friston, 2010; Friston, 2010; Friston et al., 2010). Hierarchical generative models of cognition, such as those described by Clark, presuppose the manipulation of representations and internal models of the world, in as much detail as is perceptually available. Perhaps surprisingly, Clark acknowledges the existence of a “virtual version of the sensory data” (p. 4), but with no reference to some of the historical debates that shaped cognitive science, related to the storage, manipulation, and retrieval of representations in a cognitive system (Shanahan, 1997), or accounting for the emergence of intentionality within such a system (Searle, 1980; Preston and Bishop, 2002). Instead of demonstrating how this Bayesian framework responds to these foundational questions, Clark describes the structure and the functional properties of an action-oriented, multi-level system that is meant to combine perception, learning, and experience (Niedenthal, 2007).
Resumo:
We present five new cloud detection algorithms over land based on dynamic threshold or Bayesian techniques, applicable to the Advanced Along Track Scanning Radiometer (AATSR) instrument and compare these with the standard threshold based SADIST cloud detection scheme. We use a manually classified dataset as a reference to assess algorithm performance and quantify the impact of each cloud detection scheme on land surface temperature (LST) retrieval. The use of probabilistic Bayesian cloud detection methods improves algorithm true skill scores by 8-9 % over SADIST (maximum score of 77.93 % compared to 69.27 %). We present an assessment of the impact of imperfect cloud masking, in relation to the reference cloud mask, on the retrieved AATSR LST imposing a 2 K tolerance over a 3x3 pixel domain. We find an increase of 5-7 % in the observations falling within this tolerance when using Bayesian methods (maximum of 92.02 % compared to 85.69 %). We also demonstrate that the use of dynamic thresholds in the tests employed by SADIST can significantly improve performance, applicable to cloud-test data to provided by the Sea and Land Surface Temperature Radiometer (SLSTR) due to be launched on the Sentinel 3 mission (estimated 2014).
Dating WF16: exploring the chronology of a Pre-Pottery Neolithic A settlement in the Southern Levant
Resumo:
A pre-requisite for understanding the transition to the Neolithic in the Levant is the establishment of a robust chronology, most notably for the late Epi-Palaeolithic and Pre-Pottery Neolithic A (PPNA) periods. In this contribution we undertake a dating analysis of the Pre-Pottery Neolithic site of WF16, southern Jordan, drawing on a sample of 46 AMS 14C dates. We utilise Bayesian methods to quantify an old wood effect to provide an offset that we factor into chronological models for a number of individual structures at WF16 and for the settlement as a whole. In doing so we address the influence of slope variations in the calibration curve and expose the significance of sediment and sample redeposition within sites of this nature. We conclude that for the excavated deposits at WF16 human activity is likely to have started by c. 11.84 ka cal bp and lasted for at least c. 1590 years, ceasing by c. 10.24 ka cal bp. This is marked by a particularly intensive period of activity lasting for c. 350 years centred on 11.25 ka cal bp followed by less intensive activity lasting a further c. 880 years. The study reveals the potential of WF16 as a laboratory to explore methodological issues concerning 14C dating of early Neolithic sites in arid, erosional environments.
Resumo:
Species of the genus Culex Linnaeus have been incriminated as the main vectors of lymphatic filariases and are important vectors of arboviruses, including West Nile virus. Sequences corresponding to a fragment of 478 bp of the cytochrome c oxidase subunit I gene, which includes part of the barcode region, of 37 individuals of 17 species of genus Culex were generated to establish relationships among five subgenera, Culex, Phenacomyia, Melanoconion, Microculex, and Carrollia, and one species of the genus Lutzia that occurs in Brazil. Bayesian methods were employed for the phylogenetic analyses. Results of sequence comparisons showed that individuals identified as Culex dolosus, Culex mollis, and Culex imitator possess high intraspecific divergence (3.1, 2.3, and 3.5%, respectively) when using the Kimura two parameters model. These differences were associated either with distinct morphological characteristics of the male genitalia or larval and pupal stages, suggesting that these may represent species complexes. The Bayesian topology suggested that the genus and subgenus Culex are paraphyletic relative to Lutzia and Phenacomyia, respectively. The cytochrome c oxidase subunit I sequences may be a useful tool to both estimate phylogenetic relationships and identify morphologically similar species of the genus Culex.
Resumo:
The toucan genus Ramphastos (Piciformes: Ramphastidae) has been a model in the formulation of Neotropical paleobiogeographic hypotheses. Weckstein (2005) reported on the phylogenetic history of this genus based on three mitochondrial genes, but some relationships were weakly supported and one of the subspecies of R. vitellinus (citreolaemus) was unsampled. This study expands on Weckstein (2005) by adding more DNA sequence data (including a nuclear marker) and more samples, including R v. citreolaemus. Maximum parsimony, maximum likelihood, and Bayesian methods recovered similar trees, with nodes showing high support. A monophyletic R. vitellinus complex was strongly supported as the sister-group to R. brevis. The results also confirmed that the southeastern and northern populations of R. vitellinus ariel are paraphyletic. X v. citreolaemus is sister to the Amazonian subspecies of the vitellinus complex. Using three protein-coding genes (COI, cytochrome-b and ND2) and interval-calibrated nodes under a Bayesian relaxed-clock framework, we infer that ramphastid genera originated in the middle Miocene to early Pliocene, Ramphastos species originated between late Miocene and early Pleistocene, and intra-specific divergences took place throughout the Pleistocene. Parsimony-based reconstruction of ancestral areas indicated that evolution of the four trans-Andean Ramphastos taxa (R. v. citreolaemus, R. a. swainsonii, R. brevis and R. sulfuratus) was associated with four independent dispersals from the cis-Andean region. The last pulse of Andean uplift may have been important for the evolution of R. sulfuratus, whereas the origin of the other trans-Andean Ramphastos taxa is consistent with vicariance due to drying events in the lowland forests north of the Andes. Estimated rates of molecular evolution were higher than the ""standard"" bird rate of 2% substitutions/site/million years for two of the three genes analyzed (cytochrome-b and ND2). (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.
Resumo:
P>In the context of either Bayesian or classical sensitivity analyses of over-parametrized models for incomplete categorical data, it is well known that prior-dependence on posterior inferences of nonidentifiable parameters or that too parsimonious over-parametrized models may lead to erroneous conclusions. Nevertheless, some authors either pay no attention to which parameters are nonidentifiable or do not appropriately account for possible prior-dependence. We review the literature on this topic and consider simple examples to emphasize that in both inferential frameworks, the subjective components can influence results in nontrivial ways, irrespectively of the sample size. Specifically, we show that prior distributions commonly regarded as slightly informative or noninformative may actually be too informative for nonidentifiable parameters, and that the choice of over-parametrized models may drastically impact the results, suggesting that a careful examination of their effects should be considered before drawing conclusions.Resume Que ce soit dans un cadre Bayesien ou classique, il est bien connu que la surparametrisation, dans les modeles pour donnees categorielles incompletes, peut conduire a des conclusions erronees. Cependant, certains auteurs persistent a negliger les problemes lies a la presence de parametres non identifies. Nous passons en revue la litterature dans ce domaine, et considerons quelques exemples surparametres simples dans lesquels les elements subjectifs influencent de facon non negligeable les resultats, independamment de la taille des echantillons. Plus precisement, nous montrons comment des a priori consideres comme peu ou non-informatifs peuvent se reveler extremement informatifs en ce qui concerne les parametres non identifies, et que le recours a des modeles surparametres peut avoir sur les conclusions finales un impact considerable. Ceci suggere un examen tres attentif de l`impact potentiel des a priori.
Resumo:
Modelos de regressão aleatória foram utilizados neste estudo para estimar parâmetros genéticos da produção de leite no dia do controle (PLDC) em caprinos leiteiros da raça Alpina, por meio da metodologia Bayesiana. As estimativas geradas foram comparadas às obtidas com análise de regressão aleatória, utilizando-se o REML. As herdabilidades encontradas pela análise Bayesiana variaram de 0,18 a 0,37, enquanto, pelo REML, variaram de 0,09 a 0,32. As correlações genéticas entre dias de controle próximos se aproximaram da unidade, decrescendo gradualmente conforme a distância entre os dias de controle aumentou. Os resultados obtidos indicam que: a estrutura de covariâncias da PLDC em caprinos ao longo da lactação pode ser modelada adequadamente por meio da regressão aleatória; a predição de ganhos genéticos e a seleção de animais geneticamente superiores é viável ao longo de toda a trajetória da lactação; os resultados gerados pelas análises de regressão aleatória utilizando-se a Amostragem de Gibbs e o REML foram semelhantes, embora as estimativas das variâncias genéticas e das herdabilidades tenham sido levemente superiores na análise Bayesiana, utilizando-se a Amostragem de Gibbs.
Resumo:
INTRODUÇÃO: A malaria é uma doença endêmica na região da Amazônia Brasileira, e a detecção de possíveis fatores de risco pode ser de grande interesse às autoridades em saúde pública. O objetivo deste artigo é investigar a associação entre variáveis ambientais e os registros anuais de malária na região amazônica usando métodos bayesianos espaço-temporais. MÉTODOS: Utilizaram-se modelos de regressão espaço-temporais de Poisson para analisar os dados anuais de contagem de casos de malária entre os anos de 1999 a 2008, considerando a presença de alguns fatores como a taxa de desflorestamento. em uma abordagem bayesiana, as inferências foram obtidas por métodos Monte Carlo em cadeias de Markov (MCMC) que simularam amostras para a distribuição conjunta a posteriori de interesse. A discriminação de diferentes modelos também foi discutida. RESULTADOS: O modelo aqui proposto sugeriu que a taxa de desflorestamento, o número de habitants por km² e o índice de desenvolvimento humano (IDH) são importantes para a predição de casos de malária. CONCLUSÕES: É possível concluir que o desenvolvimento humano, o crescimento populacional, o desflorestamento e as alterações ecológicas associadas a estes fatores estão associados ao aumento do risco de malária. Pode-se ainda concluir que o uso de modelos de regressão de Poisson que capturam o efeito temporal e espacial em um enfoque bayesiano é uma boa estratégia para modelar dados de contagem de malária.
Resumo:
Sessenta e nove acessos de Psidium, coletados em seis estados brasileiros, foram analisados para dois métodos não hierárquicos de agrupamento e por componentes principais (CP), visando orientar programas de melhoramento. Foram analisadas as variáveis ácido ascórbico, β-caroteno, licopeno, fenóis totais, flavonóides totais, atividade antioxidante, acidez titulável, sólidos solúveis, açúcares solúveis totais, teor de umidade, diâmetro lateral e transversal do fruto, peso da polpa e das sementes/fruto, número e produção de frutos/planta. Foram observados agrupamentos específicos para os acessos de araçazeiros no método de Tocher e do k-means e na dispersão tridimensional dos quatro CPs. Os acessos de araçazeiros foram separados dos de goiabeira. Não foi observado nenhum agrupamento específico por estado de coleta, indicando a inexistência de barreiras na propagação dos acessos de goiabeira. As análises sugerem a prospecção de maior número de amostras de germoplasma num menor número de regiões, bem como acessos divergentes com alto teor de compostos nutricionais.
Resumo:
O objetivo deste trabalho foi comparar diferentes técnicas multivariadas na caracterização de 35 genótipos de gergelim mediante 769 marcadores RAPD. As distâncias genéticas foram obtidas pelo complemento aritmético do coeficiente de Jaccard e agrupadas pelos métodos hierárquicos do vizinho mais próximo, do vizinho mais distante, das médias aritméticas não ponderadas (UPGMA), do método de otimização de Tocher e análises de coordenadas principais. O agrupamento dos genótipos foi alterado em função dos diferentes métodos usados. Adotando-se a mesma distância genética (0,36) como valor de corte, diferenciaram-se quatro grupos no método do vizinho mais próximo, 13 para o vizinho mais distante, 11 no UPGMA e quatro no Tocher. Entre os métodos hierárquicos, o UPGMA apresentou o melhor ajuste das distâncias originais e estimadas (CCC = 0,89). As análises das coordenadas principais confirmaram a baixa diversidade existente entre os genótipos. A maior divergência ocorreu entre as cultivares Seridó 1 e Arawaca 4, e a menor, entre os genótipos VCR-101 e GP-3314. As três primeiras coordenadas principais contabilizaram 35,13% do total da variabilidade, e 18 autovalores foram necessários para explicar 81% da variação genética. Os métodos UPGMA, de otimização de Tocher, e as análises de coordenadas principais são complementares na formação dos grupos.
Resumo:
Purpose - The purpose of this paper is to present designs for an accelerated life test (ALT). Design/methodology/approach - Bayesian methods and simulation Monte Carlo Markov Chain (MCMC) methods were used. Findings - In the paper a Bayesian method based on MCMC for ALT under EW distribution (for life time) and Arrhenius models (relating the stress variable and parameters) was proposed. The paper can conclude that it is a reasonable alternative to the classical statistical methods since the implementation of the proposed method is simple, not requiring advanced computational understanding and inferences on the parameters can be made easily. By the predictive density of a future observation, a procedure was developed to plan ALT and also to verify if the conformance fraction of the manufactured process reaches some desired level of quality. This procedure is useful for statistical process control in many industrial applications. Research limitations/implications - The results may be applied in a semiconductor manufacturer. Originality/value - The Exponentiated-Weibull-Arrhenius model has never before been used to plan an ALT. © Emerald Group Publishing Limited.
Resumo:
Toadlets of the genus Brachycephalus are endemic to the Atlantic rainforests of southeastern and southern Brazil. The 14 species currently described have snout-vent lengths less than 18. mm and are thought to have evolved through miniaturization: an evolutionary process leading to an extremely small adult body size. Here, we present the first comprehensive phylogenetic analysis for Brachycephalus, using a multilocus approach based on two nuclear (Rag-1 and Tyr) and three mitochondrial (Cyt b, 12S, and 16S rRNA) gene regions. Phylogenetic relationships were inferred using a partitioned Bayesian analysis of concatenated sequences and the hierarchical Bayesian method (BEST) that estimates species trees based on the multispecies coalescent model. Individual gene trees showed conflict and also varied in resolution. With the exception of the mitochondrial gene tree, no gene tree was completely resolved. The concatenated gene tree was completely resolved and is identical in topology and degree of statistical support to the individual mtDNA gene tree. On the other hand, the BEST species tree showed reduced significant node support relative to the concatenate tree and recovered a basal trichotomy, although some bipartitions were significantly supported at the tips of the species tree. Comparison of the log likelihoods for the concatenated and BEST trees suggests that the method implemented in BEST explains the multilocus data for Brachycephalus better than the Bayesian analysis of concatenated data. Landmark-based geometric morphometrics revealed marked variation in cranial shape between the species of Brachycephalus. In addition, a statistically significant association was demonstrated between variation in cranial shape and genetic distances estimated from the mtDNA and nuclear loci. Notably, B. ephippium and B. garbeana that are predicted to be sister-species in the individual and concatenated gene trees and the BEST species tree share an evolutionary novelty, the hyperossified dorsal plate. © 2011 Elsevier Inc.
Resumo:
(10) Hygiea is the fourth largest asteroid of the main belt, by volume and mass, and it is the largest member of its family, that is made mostly by low-albedo, C-type asteroids, typical of the outer main belt. Like many other large families, it is associated with a 'halo' of objects, that extends far beyond the boundary of the core family, as detected by traditional hierarchical clustering methods (HCM) in proper element domains. Numerical simulations of the orbital evolution of family members may help in estimating the family and halo family age, and the original ejection velocity field. But, in order to minimize the errors associated with including too many interlopers, it is important to have good estimates of family membership that include available data on local asteroid taxonomy, geometrical albedo and local dynamics. For this purpose, we obtained synthetic proper elements and frequencies of asteroids in the Hygiea orbital region, with their errors. We revised the current knowledge on asteroid taxonomy, including Sloan Digital Sky Survey-Moving Object Catalog 4th release (SDSS-MOC 4) data, and geometric albedo data from Wide-field Infrared Survey Explorer (WISE) and Near-Earth Object WISE (NEOWISE). We identified asteroid family members using HCM in the domain of proper elements (a, e, sin (i)) and in the domains of proper frequencies most appropriate to study diffusion in the local web of secular resonances, and eliminated possible interlopers based on taxonomic and geometrical albedo considerations. To identify the family halo, we devised a new hierarchical clustering method in an extended domain that includes proper elements, principal components PC1, PC2 obtained based on SDSS photometric data and, for the first time, WISE and NEOWISE geometric albedo. Data on asteroid size distribution, light curves and rotations were also revised for the Hygiea family. The Hygiea family is the largest group in its region, with two smaller families in proper element domain and 18 families in various frequencies domains identified in this work for the first time. Frequency groups tend to extend vertically in the (a, sin (i)) plane and cross not only the Hygiea family but also the near C-type families of Themis and Veritas, causing a mixture of objects all of relatively low albedo in the Hygiea family area. A few high-albedo asteroids, most likely associated with the Eos family, are also present in the region. Finally, the new multidomains hierarchical clustering method allowed us to obtain a good and robust estimate of the membership of the Hygiea family halo, quite separated from other asteroids families halo in the region, and with a very limited (about 3 per cent) presence of likely interlopers. © 2013 The Author Published by Oxford University Press on behalf of the Royal Astronomical Society.