18 resultados para Mixture-models

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate the performance of phylogenetic mixture models in reducing a well-known and pervasive artifact of phylogenetic inference known as the node-density effect, comparing them to partitioned analyses of the same data. The node-density effect refers to the tendency for the amount of evolutionary change in longer branches of phylogenies to be underestimated compared to that in regions of the tree where there are more nodes and thus branches are typically shorter. Mixture models allow more than one model of sequence evolution to describe the sites in an alignment without prior knowledge of the evolutionary processes that characterize the data or how they correspond to different sites. If multiple evolutionary patterns are common in sequence evolution, mixture models may be capable of reducing node-density effects by characterizing the evolutionary processes more accurately. In gene-sequence alignments simulated to have heterogeneous patterns of evolution, we find that mixture models can reduce node-density effects to negligible levels or remove them altogether, performing as well as partitioned analyses based on the known simulated patterns. The mixture models achieve this without knowledge of the patterns that generated the data and even in some cases without specifying the full or true model of sequence evolution known to underlie the data. The latter result is especially important in real applications, as the true model of evolution is seldom known. We find the same patterns of results for two real data sets with evidence of complex patterns of sequence evolution: mixture models substantially reduced node-density effects and returned better likelihoods compared to partitioning models specifically fitted to these data. We suggest that the presence of more than one pattern of evolution in the data is a common source of error in phylogenetic inference and that mixture models can often detect these patterns even without prior knowledge of their presence in the data. Routine use of mixture models alongside other approaches to phylogenetic inference may often reveal hidden or unexpected patterns of sequence evolution and can improve phylogenetic inference.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Investigation of preferred structures of planetary wave dynamics is addressed using multivariate Gaussian mixture models. The number of components in the mixture is obtained using order statistics of the mixing proportions, hence avoiding previous difficulties related to sample sizes and independence issues. The method is first applied to a few low-order stochastic dynamical systems and data from a general circulation model. The method is next applied to winter daily 500-hPa heights from 1949 to 2003 over the Northern Hemisphere. A spatial clustering algorithm is first applied to the leading two principal components (PCs) and shows significant clustering. The clustering is particularly robust for the first half of the record and less for the second half. The mixture model is then used to identify the clusters. Two highly significant extratropical planetary-scale preferred structures are obtained within the first two to four EOF state space. The first pattern shows a Pacific-North American (PNA) pattern and a negative North Atlantic Oscillation (NAO), and the second pattern is nearly opposite to the first one. It is also observed that some subspaces show multivariate Gaussianity, compatible with linearity, whereas others show multivariate non-Gaussianity. The same analysis is also applied to two subperiods, before and after 1978, and shows a similar regime behavior, with a slight stronger support for the first subperiod. In addition a significant regime shift is also observed between the two periods as well as a change in the shape of the distribution. The patterns associated with the regime shifts reflect essentially a PNA pattern and an NAO pattern consistent with the observed global warming effect on climate and the observed shift in sea surface temperature around the mid-1970s.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Preferred structures in the surface pressure variability are investigated in and compared between two 100-year simulations of the Hadley Centre climate model HadCM3. In the first (control) simulation, the model is forced with pre-industrial carbon dioxide concentration (1×CO2) and in the second simulation the model is forced with doubled CO2 concentration (2×CO2). Daily winter (December-January-February) surface pressures over the Northern Hemisphere are analysed. The identification of preferred patterns is addressed using multivariate mixture models. For the control simulation, two significant flow regimes are obtained at 5% and 2.5% significance levels within the state space spanned by the leading two principal components. They show a high pressure centre over the North Pacific/Aleutian Islands associated with a low pressure centre over the North Atlantic, and its reverse. For the 2×CO2 simulation, no such behaviour is obtained. At higher-dimensional state space, flow patterns are obtained from both simulations. They are found to be significant at the 1% level for the control simulation and at the 2.5% level for the 2×CO2 simulation. Hence under CO2 doubling, regime behaviour in the large-scale wave dynamics weakens. Doubling greenhouse gas concentration affects both the frequency of occurrence of regimes and also the pattern structures. The less frequent regime becomes amplified and the more frequent regime weakens. The largest change is observed over the Pacific where a significant deepening of the Aleutian low is obtained under CO2 doubling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of estimating the individual probabilities of a discrete distribution is considered. The true distribution of the independent observations is a mixture of a family of power series distributions. First, we ensure identifiability of the mixing distribution assuming mild conditions. Next, the mixing distribution is estimated by non-parametric maximum likelihood and an estimator for individual probabilities is obtained from the corresponding marginal mixture density. We establish asymptotic normality for the estimator of individual probabilities by showing that, under certain conditions, the difference between this estimator and the empirical proportions is asymptotically negligible. Our framework includes Poisson, negative binomial and logarithmic series as well as binomial mixture models. Simulations highlight the benefit in achieving normality when using the proposed marginal mixture density approach instead of the empirical one, especially for small sample sizes and/or when interest is in the tail areas. A real data example is given to illustrate the use of the methodology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While the standard models of concentration addition and independent action predict overall toxicity of multicomponent mixtures reasonably, interactions may limit the predictive capability when a few compounds dominate a mixture. This study was conducted to test if statistically significant systematic deviations from concentration addition (i.e. synergism/antagonism, dose ratio- or dose level-dependency) occur when two taxonomically unrelated species, the earthworm Eisenia fetida and the nematode Caenorhabditis elegans were exposed to a full range of mixtures of the similar acting neonicotinoid pesticides imidacloprid and thiacloprid. The effect of the mixtures on C. elegans was described significantly better (p<0.01) by a dose level-dependent deviation from the concentration addition model than by the reference model alone, while the reference model description of the effects on E. fetida could not be significantly improved. These results highlight that deviations from concentration addition are possible even with similar acting compounds, but that the nature of such deviations are species dependent. For improving ecological risk assessment of simple mixtures, this implies that the concentration addition model may need to be used in a probabilistic context, rather than in its traditional deterministic manner. Crown Copyright (C) 2008 Published by Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so. that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Models of windblown pollen or spore movement are required to predict gene flow from genetically modified (GM) crops and the spread of fungal diseases. We suggest a simple form for a function describing the distance moved by a pollen grain or fungal spore, for use in generic models of dispersal. The function has power-law behaviour over sub-continental distances. We show that air-borne dispersal of rapeseed pollen in two experiments was inconsistent with an exponential model, but was fitted by power-law models, implying a large contribution from distant fields to the catches observed. After allowance for this 'background' by applying Fourier transforms to deconvolve the mixture of distant and local sources, the data were best fit by power-laws with exponents between 1.5 and 2. We also demonstrate that for a simple model of area sources, the median dispersal distance is a function of field radius and that measurement from the source edge can be misleading. Using an inverse-square dispersal distribution deduced from the experimental data and the distribution of rapeseed fields deduced by remote sensing, we successfully predict observed rapeseed pollen density in the city centres of Derby and Leicester (UK).