54 resultados para Asymptotic behaviour, Bayesian methods, Mixture models, Overfitting, Posterior concentration
em CentAUR: Central Archive University of Reading - UK
Resumo:
This article presents a statistical method for detecting recombination in DNA sequence alignments, which is based on combining two probabilistic graphical models: (1) a taxon graph (phylogenetic tree) representing the relationship between the taxa, and (2) a site graph (hidden Markov model) representing interactions between different sites in the DNA sequence alignments. We adopt a Bayesian approach and sample the parameters of the model from the posterior distribution with Markov chain Monte Carlo, using a Metropolis-Hastings and Gibbs-within-Gibbs scheme. The proposed method is tested on various synthetic and real-world DNA sequence alignments, and we compare its performance with the established detection methods RECPARS, PLATO, and TOPAL, as well as with two alternative parameter estimation schemes.
Resumo:
We develop the linearization of a semi-implicit semi-Lagrangian model of the one-dimensional shallow-water equations using two different methods. The usual tangent linear model, formed by linearizing the discrete nonlinear model, is compared with a model formed by first linearizing the continuous nonlinear equations and then discretizing. Both models are shown to perform equally well for finite perturbations. However, the asymptotic behaviour of the two models differs as the perturbation size is reduced. This leads to difficulties in showing that the models are correctly coded using the standard tests. To overcome this difficulty we propose a new method for testing linear models, which we demonstrate both theoretically and numerically. © Crown copyright, 2003. Royal Meteorological Society
Resumo:
The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.
Resumo:
We investigate the performance of phylogenetic mixture models in reducing a well-known and pervasive artifact of phylogenetic inference known as the node-density effect, comparing them to partitioned analyses of the same data. The node-density effect refers to the tendency for the amount of evolutionary change in longer branches of phylogenies to be underestimated compared to that in regions of the tree where there are more nodes and thus branches are typically shorter. Mixture models allow more than one model of sequence evolution to describe the sites in an alignment without prior knowledge of the evolutionary processes that characterize the data or how they correspond to different sites. If multiple evolutionary patterns are common in sequence evolution, mixture models may be capable of reducing node-density effects by characterizing the evolutionary processes more accurately. In gene-sequence alignments simulated to have heterogeneous patterns of evolution, we find that mixture models can reduce node-density effects to negligible levels or remove them altogether, performing as well as partitioned analyses based on the known simulated patterns. The mixture models achieve this without knowledge of the patterns that generated the data and even in some cases without specifying the full or true model of sequence evolution known to underlie the data. The latter result is especially important in real applications, as the true model of evolution is seldom known. We find the same patterns of results for two real data sets with evidence of complex patterns of sequence evolution: mixture models substantially reduced node-density effects and returned better likelihoods compared to partitioning models specifically fitted to these data. We suggest that the presence of more than one pattern of evolution in the data is a common source of error in phylogenetic inference and that mixture models can often detect these patterns even without prior knowledge of their presence in the data. Routine use of mixture models alongside other approaches to phylogenetic inference may often reveal hidden or unexpected patterns of sequence evolution and can improve phylogenetic inference.
Resumo:
In this article we describe recent progress on the design, analysis and implementation of hybrid numerical-asymptotic boundary integral methods for boundary value problems for the Helmholtz equation that model time harmonic acoustic wave scattering in domains exterior to impenetrable obstacles. These hybrid methods combine conventional piecewise polynomial approximations with high-frequency asymptotics to build basis functions suitable for representing the oscillatory solutions. They have the potential to solve scattering problems accurately in a computation time that is (almost) independent of frequency and this has been realized for many model problems. The design and analysis of this class of methods requires new results on the analysis and numerical analysis of highly oscillatory boundary integral operators and on the high-frequency asymptotics of scattering problems. The implementation requires the development of appropriate quadrature rules for highly oscillatory integrals. This article contains a historical account of the development of this currently very active field, a detailed account of recent progress and, in addition, a number of original research results on the design, analysis and implementation of these methods.
Resumo:
Systems Engineering often involves computer modelling the behaviour of proposed systems and their components. Where a component is human, fallibility must be modelled by a stochastic agent. The identification of a model of decision-making over quantifiable options is investigated using the game-domain of Chess. Bayesian methods are used to infer the distribution of players’ skill levels from the moves they play rather than from their competitive results. The approach is used on large sets of games by players across a broad FIDE Elo range, and is in principle applicable to any scenario where high-value decisions are being made under pressure.
Resumo:
In this paper, the mixed logit (ML) using Bayesian methods was employed to examine willingness-to-pay (WTP) to consume bread produced with reduced levels of pesticides so as to ameliorate environmental quality, from data generated by a choice experiment. Model comparison used the marginal likelihood, which is preferable for Bayesian model comparison and testing. Models containing constant and random parameters for a number of distributions were considered, along with models in ‘preference space’ and ‘WTP space’ as well as those allowing for misreporting. We found: strong support for the ML estimated in WTP space; little support for fixing the price coefficient a common practice advocated and adopted in the environmental economics literature; and, weak evidence for misreporting.
Resumo:
The problem of estimating the individual probabilities of a discrete distribution is considered. The true distribution of the independent observations is a mixture of a family of power series distributions. First, we ensure identifiability of the mixing distribution assuming mild conditions. Next, the mixing distribution is estimated by non-parametric maximum likelihood and an estimator for individual probabilities is obtained from the corresponding marginal mixture density. We establish asymptotic normality for the estimator of individual probabilities by showing that, under certain conditions, the difference between this estimator and the empirical proportions is asymptotically negligible. Our framework includes Poisson, negative binomial and logarithmic series as well as binomial mixture models. Simulations highlight the benefit in achieving normality when using the proposed marginal mixture density approach instead of the empirical one, especially for small sample sizes and/or when interest is in the tail areas. A real data example is given to illustrate the use of the methodology.
Resumo:
We consider second kind integral equations of the form x(s) - (abbreviated x - K x = y ), in which Ω is some unbounded subset of Rn. Let Xp denote the weighted space of functions x continuous on Ω and satisfying x (s) = O(|s|-p ),s → ∞We show that if the kernel k(s,t) decays like |s — t|-q as |s — t| → ∞ for some sufficiently large q (and some other mild conditions on k are satisfied), then K ∈ B(XP) (the set of bounded linear operators on Xp), for 0 ≤ p ≤ q. If also (I - K)-1 ∈ B(X0) then (I - K)-1 ∈ B(XP) for 0 < p < q, and (I- K)-1∈ B(Xq) if further conditions on k hold. Thus, if k(s, t) = O(|s — t|-q). |s — t| → ∞, and y(s)=O(|s|-p), s → ∞, the asymptotic behaviour of the solution x may be estimated as x (s) = O(|s|-r), |s| → ∞, r := min(p, q). The case when k(s,t) = к(s — t), so that the equation is of Wiener-Hopf type, receives especial attention. Conditions, in terms of the symbol of I — K, for I — K to be invertible or Fredholm on Xp are established for certain cases (Ω a half-space or cone). A boundary integral equation, which models three-dimensional acoustic propaga-tion above flat ground, absorbing apart from an infinite rigid strip, illustrates the practical application and sharpness of the above results. This integral equation mod-els, in particular, road traffic noise propagation along an infinite road surface sur-rounded by absorbing ground. We prove that the sound propagating along the rigid road surface eventually decays with distance at the same rate as sound propagating above the absorbing ground.
Resumo:
We derive energy-norm a posteriori error bounds, using gradient recovery (ZZ) estimators to control the spatial error, for fully discrete schemes for the linear heat equation. This appears to be the �rst completely rigorous derivation of ZZ estimators for fully discrete schemes for evolution problems, without any restrictive assumption on the timestep size. An essential tool for the analysis is the elliptic reconstruction technique.Our theoretical results are backed with extensive numerical experimentation aimed at (a) testing the practical sharpness and asymptotic behaviour of the error estimator against the error, and (b) deriving an adaptive method based on our estimators. An extra novelty provided is an implementation of a coarsening error "preindicator", with a complete implementation guide in ALBERTA in the appendix.
Resumo:
Preferred structures in the surface pressure variability are investigated in and compared between two 100-year simulations of the Hadley Centre climate model HadCM3. In the first (control) simulation, the model is forced with pre-industrial carbon dioxide concentration (1×CO2) and in the second simulation the model is forced with doubled CO2 concentration (2×CO2). Daily winter (December-January-February) surface pressures over the Northern Hemisphere are analysed. The identification of preferred patterns is addressed using multivariate mixture models. For the control simulation, two significant flow regimes are obtained at 5% and 2.5% significance levels within the state space spanned by the leading two principal components. They show a high pressure centre over the North Pacific/Aleutian Islands associated with a low pressure centre over the North Atlantic, and its reverse. For the 2×CO2 simulation, no such behaviour is obtained. At higher-dimensional state space, flow patterns are obtained from both simulations. They are found to be significant at the 1% level for the control simulation and at the 2.5% level for the 2×CO2 simulation. Hence under CO2 doubling, regime behaviour in the large-scale wave dynamics weakens. Doubling greenhouse gas concentration affects both the frequency of occurrence of regimes and also the pattern structures. The less frequent regime becomes amplified and the more frequent regime weakens. The largest change is observed over the Pacific where a significant deepening of the Aleutian low is obtained under CO2 doubling.
Resumo:
In this article we review recent progress on the design, analysis and implementation of numerical-asymptotic boundary integral methods for the computation of frequency-domain acoustic scattering in a homogeneous unbounded medium by a bounded obstacle. The main aim of the methods is to allow computation of scattering at arbitrarily high frequency with finite computational resources.
Resumo:
Investigation of preferred structures of planetary wave dynamics is addressed using multivariate Gaussian mixture models. The number of components in the mixture is obtained using order statistics of the mixing proportions, hence avoiding previous difficulties related to sample sizes and independence issues. The method is first applied to a few low-order stochastic dynamical systems and data from a general circulation model. The method is next applied to winter daily 500-hPa heights from 1949 to 2003 over the Northern Hemisphere. A spatial clustering algorithm is first applied to the leading two principal components (PCs) and shows significant clustering. The clustering is particularly robust for the first half of the record and less for the second half. The mixture model is then used to identify the clusters. Two highly significant extratropical planetary-scale preferred structures are obtained within the first two to four EOF state space. The first pattern shows a Pacific-North American (PNA) pattern and a negative North Atlantic Oscillation (NAO), and the second pattern is nearly opposite to the first one. It is also observed that some subspaces show multivariate Gaussianity, compatible with linearity, whereas others show multivariate non-Gaussianity. The same analysis is also applied to two subperiods, before and after 1978, and shows a similar regime behavior, with a slight stronger support for the first subperiod. In addition a significant regime shift is also observed between the two periods as well as a change in the shape of the distribution. The patterns associated with the regime shifts reflect essentially a PNA pattern and an NAO pattern consistent with the observed global warming effect on climate and the observed shift in sea surface temperature around the mid-1970s.