910 resultados para Bayesian maximum entropy
Resumo:
Non-linear methods for estimating variability in time-series are currently of widespread use. Among such methods are approximate entropy (ApEn) and sample approximate entropy (SampEn). The applicability of ApEn and SampEn in analyzing data is evident and their use is increasing. However, consistency is a point of concern in these tools, i.e., the classification of the temporal organization of a data set might indicate a relative less ordered series in relation to another when the opposite is true. As highlighted by their proponents themselves, ApEn and SampEn might present incorrect results due to this lack of consistency. In this study, we present a method which gains consistency by using ApEn repeatedly in a wide range of combinations of window lengths and matching error tolerance. The tool is called volumetric approximate entropy, vApEn. We analyze nine artificially generated prototypical time-series with different degrees of temporal order (combinations of sine waves, logistic maps with different control parameter values, random noises). While ApEn/SampEn clearly fail to consistently identify the temporal order of the sequences, vApEn correctly do. In order to validate the tool we performed shuffled and surrogate data analysis. Statistical analysis confirmed the consistency of the method. (C) 2008 Elsevier Ltd. All rights reserved.
The genus Coleodactylus (Sphaerodactylinae, Gekkota) revisited: A molecular phylogenetic perspective
Resumo:
Nucleotide sequence data from a mitochondrial gene (16S) and two nuclear genes (c-mos, RAG-1) were used to evaluate the monophyly of the genus Coleodactylus, to provide the first phylogenetic hypothesis of relationships among its species in a cladistic framework, and to estimate the relative timing, of species divergences. Maximum Parsimony, Maximum Likelihood and Bayesian analyses of the combined data sets retrieved Coleodactylus as a monophyletic genus, although weakly Supported. Species were recovered as two genetically and morphological distinct clades, with C. amazonicus populations forming the sister taxon to the meridionalis group (C. brachystoma, C. meridionalis, C. natalensis, and C. septentrionalis). Within this group, C. septentrionalis was placed as the sister taxon to a clade comprising the rest of the species, C. meridionalis was recovered as the sister species to C. brachystoma, and C natalensis was found nested within C. meridionalis. Divergence time estimates based on penalized likelihood and Bayesian dating methods do not Support the previous hypothesis based on the Quaternary rain forest fragmentation model proposed to explain the diversification of the genus. The basal cladogenic event between major lineages of Coleodactylus was estimated to have occurred in the late Cretaceous (72.6 +/- 1.77 Mya), approximately at the same point in time than the other genera of Sphaerodactylinae diverged from each other. Within the meridionalis group, the split between C. septentrionalis and C. brachystoma + C. meridionalis was placed in the Eocene (46.4 +/- 4.22 Mya), and the divergence between C. brachystoma and C. meridionalis was estimated to have occurred in the Oligocene (29.3 +/- 4.33 Mya). Most intraspecific cladogenesis occurred through Miocene to Pliocene, and only for two conspecific samples and for C. natalensis could a Quaternary differentiation be assumed (1.9 +/- 1.3 Mya). (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
The open vegetation corridor of South America is a region dominated by savanna biomes. It contains forests (i.e. riverine forests) that may act as corridors for rainforest specialists between the open vegetation corridor and its neighbouring biomes (i.e. the Amazonian and Atlantic forests). A prediction for this scenario is that populations of rainforest specialists in the open vegetation corridor and in the forested biomes show no significant genetic divergence. We addressed this hypothesis by studying plumage and genetic variation of the Planalto woodcreeper Dendrocolaptes platyrostris Spix (1824) (Aves: Furnariidae), a forest specialist that occurs in both open habitat and in the Atlantic forest. The study questions were: (1) is there any evidence of genetic continuity between populations of the open habitat and the Atlantic forest and (2) is plumage variation congruent with patterns of neutral genetic structure or with ecological factors related to habitat type? We used cytochrome b and mitochondrial DNA control region sequences to show that D. platyrostris is monophyletic and presents substantial intraspecific differentiation. We found two areas of plumage stability: one associated with Cerrado and the other associated with southern Atlantic Forest. Multiple Mantel tests showed that most of the plumage variation followed the transition of habitats but not phylogeographical gaps, suggesting that selection may be related to the evolution of the plumage of the species. The results were not compatible with the idea that forest specialists in the open vegetation corridor and in the Atlantic forest are linked at the population level because birds from each region were not part of the same genetic unit. Divergence in the presence of gene flow across the ecotone between both regions might explain our results. Also, our findings indicate that the southern Atlantic forest may have been significantly affected by Pleistocene climatic alteration, although such events did not cause local extinction of most taxa, as occurred in other regions of the globe where forests were significantly affected by global glaciations. Finally, our results neither support plumage stability areas, nor subspecies as full species. (C) 2011 The Linnean Society of London, Biological Journal of the Linnean Society, 2011, 103, 801-820.
Resumo:
The toucan genus Ramphastos (Piciformes: Ramphastidae) has been a model in the formulation of Neotropical paleobiogeographic hypotheses. Weckstein (2005) reported on the phylogenetic history of this genus based on three mitochondrial genes, but some relationships were weakly supported and one of the subspecies of R. vitellinus (citreolaemus) was unsampled. This study expands on Weckstein (2005) by adding more DNA sequence data (including a nuclear marker) and more samples, including R v. citreolaemus. Maximum parsimony, maximum likelihood, and Bayesian methods recovered similar trees, with nodes showing high support. A monophyletic R. vitellinus complex was strongly supported as the sister-group to R. brevis. The results also confirmed that the southeastern and northern populations of R. vitellinus ariel are paraphyletic. X v. citreolaemus is sister to the Amazonian subspecies of the vitellinus complex. Using three protein-coding genes (COI, cytochrome-b and ND2) and interval-calibrated nodes under a Bayesian relaxed-clock framework, we infer that ramphastid genera originated in the middle Miocene to early Pliocene, Ramphastos species originated between late Miocene and early Pleistocene, and intra-specific divergences took place throughout the Pleistocene. Parsimony-based reconstruction of ancestral areas indicated that evolution of the four trans-Andean Ramphastos taxa (R. v. citreolaemus, R. a. swainsonii, R. brevis and R. sulfuratus) was associated with four independent dispersals from the cis-Andean region. The last pulse of Andean uplift may have been important for the evolution of R. sulfuratus, whereas the origin of the other trans-Andean Ramphastos taxa is consistent with vicariance due to drying events in the lowland forests north of the Andes. Estimated rates of molecular evolution were higher than the ""standard"" bird rate of 2% substitutions/site/million years for two of the three genes analyzed (cytochrome-b and ND2). (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
GB virus C/hepatitis G (GBV-C) is an RNA virus of the family Flaviviridae. Despite replicating with an RNA-dependent RNA polymerase, some previous estimates of rates of evolutionary change in GBV-C suggest that it fixes mutations at the anomalously low rate of similar to 100(-7) nucleotide substitution per site, per year. However, these estimates were largely based on the assumption that GBV-C and its close relative GBV-A (New World monkey GB viruses) codiverged with their primate hosts over millions of years. Herein, we estimated the substitution rate of GBV-C using the largest set of dated GBV-C isolates compiled to date and a Bayesian coalescent approach that utilizes the year of sampling and so is independent of the assumption of codivergence. This revealed a rate of evolutionary change approximately four orders of magnitude higher than that estimated previously, in the range of 10(-2) to 10(-3) sub/site/year, and hence in line with those previously determined for RNA viruses in general and the Flaviviridae in particular. In addition, we tested the assumption of host-virus codivergence in GBV-A by performing a reconciliation analysis of host and virus phylogenies. Strikingly, we found no statistical evidence for host-virus codivergence in GBV-A, indicating that substitution rates in the GB viruses should not be estimated from host divergence times.
Resumo:
This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.
Resumo:
It is known that patients may cease participating in a longitudinal study and become lost to follow-up. The objective of this article is to present a Bayesian model to estimate the malaria transition probabilities considering individuals lost to follow-up. We consider a homogeneous population, and it is assumed that the considered period of time is small enough to avoid two or more transitions from one state of health to another. The proposed model is based on a Gibbs sampling algorithm that uses information of lost to follow-up at the end of the longitudinal study. To simulate the unknown number of individuals with positive and negative states of malaria at the end of the study and lost to follow-up, two latent variables were introduced in the model. We used a real data set and a simulated data to illustrate the application of the methodology. The proposed model showed a good fit to these data sets, and the algorithm did not show problems of convergence or lack of identifiability. We conclude that the proposed model is a good alternative to estimate probabilities of transitions from one state of health to the other in studies with low adherence to follow-up.
Resumo:
Sensitivity and specificity are measures that allow us to evaluate the performance of a diagnostic test. In practice, it is common to have situations where a proportion of selected individuals cannot have the real state of the disease verified, since the verification could be an invasive procedure, as occurs with biopsy. This happens, as a special case, in the diagnosis of prostate cancer, or in any other situation related to risks, that is, not practicable, nor ethical, or in situations with high cost. For this case, it is common to use diagnostic tests based only on the information of verified individuals. This procedure can lead to biased results or workup bias. In this paper, we introduce a Bayesian approach to estimate the sensitivity and the specificity for two diagnostic tests considering verified and unverified individuals, a result that generalizes the usual situation based on only one diagnostic test.
Resumo:
In this paper, we compare the performance of two statistical approaches for the analysis of data obtained from the social research area. In the first approach, we use normal models with joint regression modelling for the mean and for the variance heterogeneity. In the second approach, we use hierarchical models. In the first case, individual and social variables are included in the regression modelling for the mean and for the variance, as explanatory variables, while in the second case, the variance at level 1 of the hierarchical model depends on the individuals (age of the individuals), and in the level 2 of the hierarchical model, the variance is assumed to change according to socioeconomic stratum. Applying these methodologies, we analyze a Colombian tallness data set to find differences that can be explained by socioeconomic conditions. We also present some theoretical and empirical results concerning the two models. From this comparative study, we conclude that it is better to jointly modelling the mean and variance heterogeneity in all cases. We also observe that the convergence of the Gibbs sampling chain used in the Markov Chain Monte Carlo method for the jointly modeling the mean and variance heterogeneity is quickly achieved.
Resumo:
In this paper, we introduce a Bayesian analysis for survival multivariate data in the presence of a covariate vector and censored observations. Different ""frailties"" or latent variables are considered to capture the correlation among the survival times for the same individual. We assume Weibull or generalized Gamma distributions considering right censored lifetime data. We develop the Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods.
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.
Resumo:
In this paper, we introduce a Bayesian analysis for bioequivalence data assuming multivariate pharmacokinetic measures. With the introduction of correlation parameters between the pharmacokinetic measures or between the random effects in the bioequivalence models, we observe a good improvement in the bioequivalence results. These results are of great practical interest since they can yield higher accuracy and reliability for the bioequivalence tests, usually assumed by regulatory offices. An example is introduced to illustrate the proposed methodology by comparing the usual univariate bioequivalence methods with multivariate bioequivalence. We also consider some usual existing discrimination Bayesian methods to choose the best model to be used in bioequivalence studies.
Resumo:
Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.
Resumo:
In this paper we deal with a Bayesian analysis for right-censored survival data suitable for populations with a cure rate. We consider a cure rate model based on the negative binomial distribution, encompassing as a special case the promotion time cure model. Bayesian analysis is based on Markov chain Monte Carlo (MCMC) methods. We also present some discussion on model selection and an illustration with a real dataset.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. (C) 2010 Elsevier B.V. All rights reserved.