98 resultados para Bayesian maximum entropy
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
This paper presents an Adaptive Maximum Entropy (AME) approach for modeling biological species. The Maximum Entropy algorithm (MaxEnt) is one of the most used methods in modeling biological species geographical distribution. The approach presented here is an alternative to the classical algorithm. Instead of using the same set features in the training, the AME approach tries to insert or to remove a single feature at each iteration. The aim is to reach the convergence faster without affect the performance of the generated models. The preliminary experiments were well performed. They showed an increasing on performance both in accuracy and in execution time. Comparisons with other algorithms are beyond the scope of this paper. Some important researches are proposed as future works.
Resumo:
We discuss the connection between information and copula theories by showing that a copula can be employed to decompose the information content of a multivariate distribution into marginal and dependence components, with the latter quantified by the mutual information. We define the information excess as a measure of deviation from a maximum-entropy distribution. The idea of marginal invariant dependence measures is also discussed and used to show that empirical linear correlation underestimates the amplitude of the actual correlation in the case of non-Gaussian marginals. The mutual information is shown to provide an upper bound for the asymptotic empirical log-likelihood of a copula. An analytical expression for the information excess of T-copulas is provided, allowing for simple model identification within this family. We illustrate the framework in a financial data set. Copyright (C) EPLA, 2009
Resumo:
In this paper we introduce the Weibull power series (WPS) class of distributions which is obtained by compounding Weibull and power series distributions where the compounding procedure follows same way that was previously carried out by Adamidis and Loukas (1998) This new class of distributions has as a particular case the two-parameter exponential power series (EPS) class of distributions (Chahkandi and Gawk 2009) which contains several lifetime models such as exponential geometric (Adamidis and Loukas 1998) exponential Poisson (Kus 2007) and exponential logarithmic (Tahmasbi and Rezaei 2008) distributions The hazard function of our class can be increasing decreasing and upside down bathtub shaped among others while the hazard function of an EPS distribution is only decreasing We obtain several properties of the WPS distributions such as moments order statistics estimation by maximum likelihood and inference for a large sample Furthermore the EM algorithm is also used to determine the maximum likelihood estimates of the parameters and we discuss maximum entropy characterizations under suitable constraints Special distributions are studied in some detail Applications to two real data sets are given to show the flexibility and potentiality of the new class of distributions (C) 2010 Elsevier B V All rights reserved
Resumo:
Hepatitis B is a worldwide health problem affecting about 2 billion people and more than 350 million are chronic carriers of the virus. Nine HBV genotypes (A to I) have been described. The geographical distribution of HBV genotypes is not completely understood due to the limited number of samples from some parts of the world. One such example is Colombia, in which few studies have described the HBV genotypes. In this study, we characterized HBV genotypes in 143 HBsAg-positive volunteer blood donors from Colombia. A fragment of 1306 bp partially comprising HBsAg and the DNA polymerase coding regions (S/POL) was amplified and sequenced. Bayesian phylogenetic analyses were conducted using the Markov Chain Monte Carlo (MCMC) approach to obtain the maximum clade credibility (MCC) tree using BEAST v.1.5.3. Of all samples, 68 were positive and 52 were successfully sequenced. Genotype F was the most prevalent in this population (77%) - subgenotypes F3 (75%) and Fib (2%). Genotype G (7.7%) and subgenotype A2 (15.3%) were also found. Genotype G sequence analysis suggests distinct introductions of this genotype in the country. Furthermore, we estimated the time of the most recent common ancestor (TMRCA) for each HBV/F subgenotype and also for Colombian F3 sequences using two different datasets: (i) 77 sequences comprising 1306 bp of S/POL region and (ii) 283 sequences comprising 681 bp of S/POL region. We also used two other previously estimated evolutionary rates: (i) 2.60 x 10(-4) s/s/y and (ii) 1.5 x 10(-5) s/s/y. Here we report the HBV genotypes circulating in Colombia and estimated the TMRCA for the four different subgenotypes of genotype F. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Phylogenetic analyses of chloroplast DNA sequences, morphology, and combined data have provided consistent support for many of the major branches within the angiosperm, clade Dipsacales. Here we use sequences from three mitochondrial loci to test the existing broad scale phylogeny and in an attempt to resolve several relationships that have remained uncertain. Parsimony, maximum likelihood, and Bayesian analyses of a combined mitochondrial data set recover trees broadly consistent with previous studies, although resolution and support are lower than in the largest chloroplast analyses. Combining chloroplast and mitochondrial data results in a generally well-resolved and very strongly supported topology but the previously recognized problem areas remain. To investigate why these relationships have been difficult to resolve we conducted a series of experiments using different data partitions and heterogeneous substitution models. Usually more complex modeling schemes are favored regardless of the partitions recognized but model choice had little effect on topology or support values. In contrast there are consistent but weakly supported differences in the topologies recovered from coding and non-coding matrices. These conflicts directly correspond to relationships that were poorly resolved in analyses of the full combined chloroplast-mitochondrial data set. We suggest incongruent signal has contributed to our inability to confidently resolve these problem areas. (c) 2007 Elsevier Inc. All rights reserved.
Resumo:
The multivariate skew-t distribution (J Multivar Anal 79:93-113, 2001; J R Stat Soc, Ser B 65:367-389, 2003; Statistics 37:359-363, 2003) includes the Student t, skew-Cauchy and Cauchy distributions as special cases and the normal and skew-normal ones as limiting cases. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis of repeated measures, pretest/post-test data, under multivariate null intercept measurement error model (J Biopharm Stat 13(4):763-771, 2003) where the random errors and the unobserved value of the covariate (latent variable) follows a Student t and skew-t distribution, respectively. The results and methods are numerically illustrated with an example in the field of dentistry.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is interest in studying latent variables. These latent variables are directly considered in the Item Response Models (IRM) and they are usually called latent traits. A usual assumption for parameter estimation of the IRM, considering one group of examinees, is to assume that the latent traits are random variables which follow a standard normal distribution. However, many works suggest that this assumption does not apply in many cases. Furthermore, when this assumption does not hold, the parameter estimates tend to be biased and misleading inference can be obtained. Therefore, it is important to model the distribution of the latent traits properly. In this paper we present an alternative latent traits modeling based on the so-called skew-normal distribution; see Genton (2004). We used the centred parameterization, which was proposed by Azzalini (1985). This approach ensures the model identifiability as pointed out by Azevedo et al. (2009b). Also, a Metropolis Hastings within Gibbs sampling (MHWGS) algorithm was built for parameter estimation by using an augmented data approach. A simulation study was performed in order to assess the parameter recovery in the proposed model and the estimation method, and the effect of the asymmetry level of the latent traits distribution on the parameter estimation. Also, a comparison of our approach with other estimation methods (which consider the assumption of symmetric normality for the latent traits distribution) was considered. The results indicated that our proposed algorithm recovers properly all parameters. Specifically, the greater the asymmetry level, the better the performance of our approach compared with other approaches, mainly in the presence of small sample sizes (number of examinees). Furthermore, we analyzed a real data set which presents indication of asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of strong negative asymmetry of the latent traits distribution. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Relevant results for (sub-)distribution functions related to parallel systems are discussed. The reverse hazard rate is defined using the product integral. Consequently, the restriction of absolute continuity for the involved distributions can be relaxed. The only restriction is that the sets of discontinuity points of the parallel distributions have to be disjointed. Nonparametric Bayesian estimators of all survival (sub-)distribution functions are derived. Dual to the series systems that use minimum life times as observations, the parallel systems record the maximum life times. Dirichlet multivariate processes forming a class of prior distributions are considered for the nonparametric Bayesian estimation of the component distribution functions, and the system reliability. For illustration, two striking numerical examples are presented.
Resumo:
OBJECTIVE: The aim of the present study was to determine the in vitro maximum inhibitory dilution (MID) of two chlorhexidinebased oral mouthwashes (CHX): Noplak®, Periogard®, and one polyhexamethylene biguanide-based mouthwash (PHMB): Sanifill Premium® against 28 field Staphylococcus aureus strains using the agar dilution method. MATERIALS AND METHODS: For each product, decimal dilutions ranging from 1/10 to 1/655,360 were prepared in distilled water and added to Mueller Hinton Agar culture medium. After homogenization, the culture medium was poured onto Petri dishes. Strains were inoculated using a Steers multipoint inoculator and dishes were incubated at 37ºC for 24hours. For reading, MID was considered as the maximum dilution of the mouthwash still capable of inhibiting microbial growth. RESULTS: Sanifill Premium® inhibited the growth of all strains at 1/40 dilution and of 1 strain at 1/80 dilution. Noplak® inhibited the growth of 23 strains at 1/640 dilution and of all 28 strains at 1/320 dilution. Periogard® showed inhibited growth of 7 strains at 1/640 dilution and of all 28 strains at 1/320 dilution. Data were submitted to Kruskal-Wallis statistical test, showing significant differences between the mouthwashes evaluated (p<0.05). No significant difference was found between Noplak® and Periogard® (p>0.05). Sanifill Premium® was the least effective (p<0.05). CONCLUSION: It was concluded that CHX-based mouthwashes present better antimicrobial activity against S. Aureus than the PHMB-based mouthwash.
Resumo:
The aim of this in vitro study was to determine the maximum inhibitory dilution (MID) of four cetylpyridinium chloride (CPC)-based mouthwashes: CPC+Propolis, CPC+Malva, CPC+Eucaliptol+Juá+Romã+Propolis (Natural Honey®) and CPC (Cepacol®), against 28 Staphylococcus aureus field strains, using the agar dilution method. Decimal dilutions ranging from 1/10 to 1/655,360 were prepared and added to Mueller Hinton Agar. Strains were inoculated using Steers multipoint inoculator. The inocula were seeded onto the surface of the culture medium in Petri dishes containing different dilutions of the mouthwashes. The dishes were incubated at 37ºC for 24 h. For readings, the MID was considered as the maximum dilution of mouthwash still capable of inhibiting microbial growth. The obtained data showed that CPC+Propolis had antimicrobial activity against 27 strains at 1/320 dilution and against all 28 strains at 1/160 dilution, CPC+Malva inhibited the growth of all 28 strains at 1/320 dilution, CPC+Eucaliptol+Juá+Romã+Propolis inhibited the growth of 2 strains at 1/640 dilution and all 28 strains at 1/320 dilution, and Cepacol® showed antimicrobial activity against 3 strains at 1/320 dilution and against all 28 strains at 1/160 dilution. Data were submitted to Kruskal-Wallis test, showing that the MID of Cepacol® was lower than that determined for the other products (p<0.05). In conclusion, CPC-mouthwashes showed antimicrobial activity against S. aureus and the addition of other substances to CPC improved its antimicrobial effect.
Resumo:
O objetivo deste experimento foi avaliar o efeito da adição de doses crescentes de P2O5 sobre a altura do dossel, o número de perfilhos e a produção de matéria seca de folhas e de colmos do capim-Mombaça, em diferentes idades. Foi conduzido um experimento implantado em um Nitossolo Vermelho Eutrófico. O delineamento experimental usado foi o de blocos completos casualizados, com quatro repetições, cinco doses de P2O5 (30, 60, 90, 120 e 150kg ha-1) e uma testemunha. Para a primeira e segunda coletas, observou-se o efeito linear do fósforo sobre o perfilhamento. Para a terceira e quarta coletas, os dados ajustaram-se ao modelo quadrático. A participação das lâminas foliares na matéria seca da parte aérea diminuiu com as doses de P2O5. Por outro lado, a participação de colmos aumentou com as doses de P2O5. A produção de matéria seca (MS) da parte aérea para primeira, segunda e terceira coletas respondeu de forma linear à aplicação de P2O5 observando-se um aumento estimado de 7, 15 e 19kg ha-1 de MS por kg ha-1 de P2O5, respectivamente. Para a quarta coleta, os dados ajustaram-se ao modelo quadrático de regressão, sendo a máxima produção, 8,3Mg ha-1 de MS, obtida com a aplicação de 103kg ha-1 de P2O5.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
The objective of the present study was to evaluate herbage accumulation, morphological composition, growth rate and structural characteristics in Mombasa grass swards subject to different cutting intervals (3, 5 and 7 wk) during the rainy and dry seasons of the year. Treatments were assigned to experimental units (17.5 m(2)) according to a complete randomised block design, with four replicates. Herbage accumulation was greater in the rainy than in the dry season (83 and 17%, respectively). Herbage accumulation (24,300 kg DM ha(-1)), average growth rate (140 kg DM ha(-1) d(-1)) and sward height (111 cm) were highest in the 7 wk cutting interval, but leaf proportion (56%), leaf:stem (1.6) and leaf:non leaf (1.3) ratios decreased. Herbage accumulation, morphological composition and sward structure of Mombasa grass sward may be manipulated through defoliation frequency. The highest leaf proportion was recorded in the 3-wk cutting interval. Longer cutting intervals affected negatively sward structure, with potential negative effects on utilization efficiency, animal intake and performance.
Resumo:
Background: With nearly 1,100 species, the fish family Characidae represents more than half of the species of Characiformes, and is a key component of Neotropical freshwater ecosystems. The composition, phylogeny, and classification of Characidae is currently uncertain, despite significant efforts based on analysis of morphological and molecular data. No consensus about the monophyly of this group or its position within the order Characiformes has been reached, challenged by the fact that many key studies to date have non-overlapping taxonomic representation and focus only on subsets of this diversity. Results: In the present study we propose a new definition of the family Characidae and a hypothesis of relationships for the Characiformes based on phylogenetic analysis of DNA sequences of two mitochondrial and three nuclear genes (4,680 base pairs). The sequences were obtained from 211 samples representing 166 genera distributed among all 18 recognized families in the order Characiformes, all 14 recognized subfamilies in the Characidae, plus 56 of the genera so far considered incertae sedis in the Characidae. The phylogeny obtained is robust, with most lineages significantly supported by posterior probabilities in Bayesian analysis, and high bootstrap values from maximum likelihood and parsimony analyses. Conclusion: A monophyletic assemblage strongly supported in all our phylogenetic analysis is herein defined as the Characidae and includes the characiform species lacking a supraorbital bone and with a derived position of the emergence of the hyoid artery from the anterior ceratohyal. To recognize this and several other monophyletic groups within characiforms we propose changes in the limits of several families to facilitate future studies in the Characiformes and particularly the Characidae. This work presents a new phylogenetic framework for a speciose and morphologically diverse group of freshwater fishes of significant ecological and evolutionary importance across the Neotropics and portions of Africa.
Resumo:
Hardy-Weinberg Equilibrium (HWE) is an important genetic property that populations should have whenever they are not observing adverse situations as complete lack of panmixia, excess of mutations, excess of selection pressure, etc. HWE for decades has been evaluated; both frequentist and Bayesian methods are in use today. While historically the HWE formula was developed to examine the transmission of alleles in a population from one generation to the next, use of HWE concepts has expanded in human diseases studies to detect genotyping error and disease susceptibility (association); Ryckman and Williams (2008). Most analyses focus on trying to answer the question of whether a population is in HWE. They do not try to quantify how far from the equilibrium the population is. In this paper, we propose the use of a simple disequilibrium coefficient to a locus with two alleles. Based on the posterior density of this disequilibrium coefficient, we show how one can conduct a Bayesian analysis to verify how far from HWE a population is. There are other coefficients introduced in the literature and the advantage of the one introduced in this paper is the fact that, just like the standard correlation coefficients, its range is bounded and it is symmetric around zero (equilibrium) when comparing the positive and the negative values. To test the hypothesis of equilibrium, we use a simple Bayesian significance test, the Full Bayesian Significance Test (FBST); see Pereira, Stern andWechsler (2008) for a complete review. The disequilibrium coefficient proposed provides an easy and efficient way to make the analyses, especially if one uses Bayesian statistics. A routine in R programs (R Development Core Team, 2009) that implements the calculations is provided for the readers.