102 resultados para Bayesian mixture model

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Estimation of a population size by means of capture-recapture techniques is an important problem occurring in many areas of life and social sciences. We consider the frequencies of frequencies situation, where a count variable is used to summarize how often a unit has been identified in the target population of interest. The distribution of this count variable is zero-truncated since zero identifications do not occur in the sample. As an application we consider the surveillance of scrapie in Great Britain. In this case study holdings with scrapie that are not identified (zero counts) do not enter the surveillance database. The count variable of interest is the number of scrapie cases per holding. For count distributions a common model is the Poisson distribution and, to adjust for potential heterogeneity, a discrete mixture of Poisson distributions is used. Mixtures of Poissons usually provide an excellent fit as will be demonstrated in the application of interest. However, as it has been recently demonstrated, mixtures also suffer under the so-called boundary problem, resulting in overestimation of population size. It is suggested here to select the mixture model on the basis of the Bayesian Information Criterion. This strategy is further refined by employing a bagging procedure leading to a series of estimates of population size. Using the median of this series, highly influential size estimates are avoided. In limited simulation studies it is shown that the procedure leads to estimates with remarkable small bias.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we focus on the one year ahead prediction of the electricity peak-demand daily trajectory during the winter season in Central England and Wales. We define a Bayesian hierarchical model for predicting the winter trajectories and present results based on the past observed weather. Thanks to the flexibility of the Bayesian approach, we are able to produce the marginal posterior distributions of all the predictands of interest. This is a fundamental progress with respect to the classical methods. The results are encouraging in both skill and representation of uncertainty. Further extensions are straightforward at least in principle. The main two of those consist in conditioning the weather generator model with respect to additional information like the knowledge of the first part of the winter and/or the seasonal weather forecast. Copyright (C) 2006 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we focus on the one year ahead prediction of the electricity peak-demand daily trajectory during the winter season in Central England and Wales. We define a Bayesian hierarchical model for predicting the winter trajectories and present results based on the past observed weather. Thanks to the flexibility of the Bayesian approach, we are able to produce the marginal posterior distributions of all the predictands of interest. This is a fundamental progress with respect to the classical methods. The results are encouraging in both skill and representation of uncertainty. Further extensions are straightforward at least in principle. The main two of those consist in conditioning the weather generator model with respect to additional information like the knowledge of the first part of the winter and/or the seasonal weather forecast. Copyright (C) 2006 John Wiley & Sons, Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Investigation of preferred structures of planetary wave dynamics is addressed using multivariate Gaussian mixture models. The number of components in the mixture is obtained using order statistics of the mixing proportions, hence avoiding previous difficulties related to sample sizes and independence issues. The method is first applied to a few low-order stochastic dynamical systems and data from a general circulation model. The method is next applied to winter daily 500-hPa heights from 1949 to 2003 over the Northern Hemisphere. A spatial clustering algorithm is first applied to the leading two principal components (PCs) and shows significant clustering. The clustering is particularly robust for the first half of the record and less for the second half. The mixture model is then used to identify the clusters. Two highly significant extratropical planetary-scale preferred structures are obtained within the first two to four EOF state space. The first pattern shows a Pacific-North American (PNA) pattern and a negative North Atlantic Oscillation (NAO), and the second pattern is nearly opposite to the first one. It is also observed that some subspaces show multivariate Gaussianity, compatible with linearity, whereas others show multivariate non-Gaussianity. The same analysis is also applied to two subperiods, before and after 1978, and shows a similar regime behavior, with a slight stronger support for the first subperiod. In addition a significant regime shift is also observed between the two periods as well as a change in the shape of the distribution. The patterns associated with the regime shifts reflect essentially a PNA pattern and an NAO pattern consistent with the observed global warming effect on climate and the observed shift in sea surface temperature around the mid-1970s.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This study presents a new simple approach for combining empirical with raw (i.e., not bias corrected) coupled model ensemble forecasts in order to make more skillful interval forecasts of ENSO. A Bayesian normal model has been used to combine empirical and raw coupled model December SST Niño-3.4 index forecasts started at the end of the preceding July (5-month lead time). The empirical forecasts were obtained by linear regression between December and the preceding July Niño-3.4 index values over the period 1950–2001. Coupled model ensemble forecasts for the period 1987–99 were provided by ECMWF, as part of the Development of a European Multimodel Ensemble System for Seasonal to Interannual Prediction (DEMETER) project. Empirical and raw coupled model ensemble forecasts alone have similar mean absolute error forecast skill score, compared to climatological forecasts, of around 50% over the period 1987–99. The combined forecast gives an increased skill score of 74% and provides a well-calibrated and reliable estimate of forecast uncertainty.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work proposes a unified neurofuzzy modelling scheme. To begin with, the initial fuzzy base construction method is based on fuzzy clustering utilising a Gaussian mixture model (GMM) combined with the analysis of covariance (ANOVA) decomposition in order to obtain more compact univariate and bivariate membership functions over the subspaces of the input features. The mean and covariance of the Gaussian membership functions are found by the expectation maximisation (EM) algorithm with the merit of revealing the underlying density distribution of system inputs. The resultant set of membership functions forms the basis of the generalised fuzzy model (GFM) inference engine. The model structure and parameters of this neurofuzzy model are identified via the supervised subspace orthogonal least square (OLS) learning. Finally, instead of providing deterministic class label as model output by convention, a logistic regression model is applied to present the classifier’s output, in which the sigmoid type of logistic transfer function scales the outputs of the neurofuzzy model to the class probability. Experimental validation results are presented to demonstrate the effectiveness of the proposed neurofuzzy modelling scheme.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A new coupled cloud physics–radiation parameterization of the bulk optical properties of ice clouds is presented. The parameterization is consistent with assumptions in the cloud physics scheme regarding particle size distributions (PSDs) and mass–dimensional relationships. The parameterization is based on a weighted ice crystal habit mixture model, and its bulk optical properties are parameterized as simple functions of wavelength and ice water content (IWC). This approach directly couples IWC to the bulk optical properties, negating the need for diagnosed variables, such as the ice crystal effective dimension. The parameterization is implemented into the Met Office Unified Model Global Atmosphere 5.0 (GA5) configuration. The GA5 configuration is used to simulate the annual 20-yr shortwave (SW) and longwave (LW) fluxes at the top of the atmosphere (TOA), as well as the temperature structure of the atmosphere, under various microphysical assumptions. The coupled parameterization is directly compared against the current operational radiation parameterization, while maintaining the same cloud physics assumptions. In this experiment, the impacts of the two parameterizations on the SW and LW radiative effects at TOA are also investigated and compared against observations. The 20-yr simulations are compared against the latest observations of the atmospheric temperature and radiative fluxes at TOA. The comparisons demonstrate that the choice of PSD and the assumed ice crystal shape distribution are as important as each other. Moreover, the consistent radiation parameterization removes a long-standing tropical troposphere cold temperature bias but slightly warms the southern midlatitudes by about 0.5 K.