23 resultados para Probabilistic graphical models

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents a statistical method for detecting recombination in DNA sequence alignments, which is based on combining two probabilistic graphical models: (1) a taxon graph (phylogenetic tree) representing the relationship between the taxa, and (2) a site graph (hidden Markov model) representing interactions between different sites in the DNA sequence alignments. We adopt a Bayesian approach and sample the parameters of the model from the posterior distribution with Markov chain Monte Carlo, using a Metropolis-Hastings and Gibbs-within-Gibbs scheme. The proposed method is tested on various synthetic and real-world DNA sequence alignments, and we compare its performance with the established detection methods RECPARS, PLATO, and TOPAL, as well as with two alternative parameter estimation schemes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When performing data fusion, one often measures where targets were and then wishes to deduce where targets currently are. There has been recent research on the processing of such out-of-sequence data. This research has culminated in the development of a number of algorithms for solving the associated tracking problem. This paper reviews these different approaches in a common Bayesian framework and proposes an architecture that orthogonalises the data association and out-of-sequence problems such that any combination of solutions to these two problems can be used together. The emphasis is not on advocating one approach over another on the basis of computational expense, but rather on understanding the relationships among the algorithms so that any approximations made are explicit. Results for a multi-sensor scenario involving out-of-sequence data association are used to illustrate the utility of this approach in a specific context.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There are several scoring rules that one can choose from in order to score probabilistic forecasting models or estimate model parameters. Whilst it is generally agreed that proper scoring rules are preferable, there is no clear criterion for preferring one proper scoring rule above another. This manuscript compares and contrasts some commonly used proper scoring rules and provides guidance on scoring rule selection. In particular, it is shown that the logarithmic scoring rule prefers erring with more uncertainty, the spherical scoring rule prefers erring with lower uncertainty, whereas the other scoring rules are indifferent to either option.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Undirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Logistic models are studied as a tool to convert dynamical forecast information (deterministic and ensemble) into probability forecasts. A logistic model is obtained by setting the logarithmic odds ratio equal to a linear combination of the inputs. As with any statistical model, logistic models will suffer from overfitting if the number of inputs is comparable to the number of forecast instances. Computational approaches to avoid overfitting by regularization are discussed, and efficient techniques for model assessment and selection are presented. A logit version of the lasso (originally a linear regression technique), is discussed. In lasso models, less important inputs are identified and the corresponding coefficient is set to zero, providing an efficient and automatic model reduction procedure. For the same reason, lasso models are particularly appealing for diagnostic purposes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Several methods are examined which allow to produce forecasts for time series in the form of probability assignments. The necessary concepts are presented, addressing questions such as how to assess the performance of a probabilistic forecast. A particular class of models, cluster weighted models (CWMs), is given particular attention. CWMs, originally proposed for deterministic forecasts, can be employed for probabilistic forecasting with little modification. Two examples are presented. The first involves estimating the state of (numerically simulated) dynamical systems from noise corrupted measurements, a problem also known as filtering. There is an optimal solution to this problem, called the optimal filter, to which the considered time series models are compared. (The optimal filter requires the dynamical equations to be known.) In the second example, we aim at forecasting the chaotic oscillations of an experimental bronze spring system. Both examples demonstrate that the considered time series models, and especially the CWMs, provide useful probabilistic information about the underlying dynamical relations. In particular, they provide more than just an approximation to the conditional mean.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Improvements in the resolution of satellite imagery have enabled extraction of water surface elevations at the margins of the flood. Comparison between modelled and observed water surface elevations provides a new means for calibrating and validating flood inundation models, however the uncertainty in this observed data has yet to be addressed. Here a flood inundation model is calibrated using a probabilistic treatment of the observed data. A LiDAR guided snake algorithm is used to determine an outline of a flood event in 2006 on the River Dee, North Wales, UK, using a 12.5m ERS-1 image. Points at approximately 100m intervals along this outline are selected, and the water surface elevation recorded as the LiDAR DEM elevation at each point. With a planar water surface from the gauged upstream to downstream water elevations as an approximation, the water surface elevations at points along this flooded extent are compared to their ‘expected’ value. The pattern of errors between the two show a roughly normal distribution, however when plotted against coordinates there is obvious spatial autocorrelation. The source of this spatial dependency is investigated by comparing errors to the slope gradient and aspect of the LiDAR DEM. A LISFLOOD-FP model of the flood event is set-up to investigate the effect of observed data uncertainty on the calibration of flood inundation models. Multiple simulations are run using different combinations of friction parameters, from which the optimum parameter set will be selected. For each simulation a T-test is used to quantify the fit between modelled and observed water surface elevations. The points chosen for use in this T-test are selected based on their error. The criteria for selection enables evaluation of the sensitivity of the choice of optimum parameter set to uncertainty in the observed data. This work explores the observed data in detail and highlights possible causes of error. The identification of significant error (RMSE = 0.8m) between approximate expected and actual observed elevations from the remotely sensed data emphasises the limitations of using this data in a deterministic manner within the calibration process. These limitations are addressed by developing a new probabilistic approach to using the observed data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Process-based integrated modelling of weather and crop yield over large areas is becoming an important research topic. The production of the DEMETER ensemble hindcasts of weather allows this work to be carried out in a probabilistic framework. In this study, ensembles of crop yield (groundnut, Arachis hypogaea L.) were produced for 10 2.5 degrees x 2.5 degrees grid cells in western India using the DEMETER ensembles and the general large-area model (GLAM) for annual crops. Four key issues are addressed by this study. First, crop model calibration methods for use with weather ensemble data are assessed. Calibration using yield ensembles was more successful than calibration using reanalysis data (the European Centre for Medium-Range Weather Forecasts 40-yr reanalysis, ERA40). Secondly, the potential for probabilistic forecasting of crop failure is examined. The hindcasts show skill in the prediction of crop failure, with more severe failures being more predictable. Thirdly, the use of yield ensemble means to predict interannual variability in crop yield is examined and their skill assessed relative to baseline simulations using ERA40. The accuracy of multi-model yield ensemble means is equal to or greater than the accuracy using ERA40. Fourthly, the impact of two key uncertainties, sowing window and spatial scale, is briefly examined. The impact of uncertainty in the sowing window is greater with ERA40 than with the multi-model yield ensemble mean. Subgrid heterogeneity affects model accuracy: where correlations are low on the grid scale, they may be significantly positive on the subgrid scale. The implications of the results of this study for yield forecasting on seasonal time-scales are as follows. (i) There is the potential for probabilistic forecasting of crop failure (defined by a threshold yield value); forecasting of yield terciles shows less potential. (ii) Any improvement in the skill of climate models has the potential to translate into improved deterministic yield prediction. (iii) Whilst model input uncertainties are important, uncertainty in the sowing window may not require specific modelling. The implications of the results of this study for yield forecasting on multidecadal (climate change) time-scales are as follows. (i) The skill in the ensemble mean suggests that the perturbation, within uncertainty bounds, of crop and climate parameters, could potentially average out some of the errors associated with mean yield prediction. (ii) For a given technology trend, decadal fluctuations in the yield-gap parameter used by GLAM may be relatively small, implying some predictability on those time-scales.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the current concern over climate change, descriptions of how rainfall patterns are changing over time can be useful. Observations of daily rainfall data over the last few decades provide information on these trends. Generalized linear models are typically used to model patterns in the occurrence and intensity of rainfall. These models describe rainfall patterns for an average year but are more limited when describing long-term trends, particularly when these are potentially non-linear. Generalized additive models (GAMS) provide a framework for modelling non-linear relationships by fitting smooth functions to the data. This paper describes how GAMS can extend the flexibility of models to describe seasonal patterns and long-term trends in the occurrence and intensity of daily rainfall using data from Mauritius from 1962 to 2001. Smoothed estimates from the models provide useful graphical descriptions of changing rainfall patterns over the last 40 years at this location. GAMS are particularly helpful when exploring non-linear relationships in the data. Care is needed to ensure the choice of smooth functions is appropriate for the data and modelling objectives. (c) 2008 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Graphical tracking is a technique for crop scheduling where the actual plant state is plotted against an ideal target curve which encapsulates all crop and environmental characteristics. Management decisions are made on the basis of the position of the actual crop against the ideal position. Due to the simplicity of the approach it is possible for graphical tracks to be developed on site without the requirement for controlled experimentation. Growth models and graphical tracks are discussed, and an implementation of the Richards curve for graphical tracking described. In many cases, the more intuitively desirable growth models perform sub-optimally due to problems with the specification of starting conditions, environmental factors outside the scope of the original model and the introduction of new cultivars. Accurate specification for a biological model requires detailed and usually costly study, and as such is not adaptable to a changing cultivar range and changing cultivation techniques. Fitting of a new graphical track for a new cultivar can be conducted on site and improved over subsequent seasons. Graphical tracking emphasises the current position relative to the objective, and as such does not require the time consuming or system specific input of an environmental history, although it does require detailed crop measurement. The approach is flexible and could be applied to a variety of specification metrics, with digital imaging providing a route for added value. For decision making regarding crop manipulation from the observed current state, there is a role for simple predictive modelling over the short term to indicate the short term consequences of crop manipulation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract: Long-term exposure of skylarks to a fictitious insecticide and of wood mice to a fictitious fungicide were modelled probabilistically in a Monte Carlo simulation. Within the same simulation the consequences of exposure to pesticides on reproductive success were modelled using the toxicity-exposure-linking rules developed by R.S. Bennet et al. (2005) and the interspecies extrapolation factors suggested by R. Luttik et al.(2005). We built models to reflect a range of scenarios and as a result were able to show how exposure to pesticide might alter the number of individuals engaged in any given phase of the breeding cycle at any given time and predict the numbers of new adults at the season’s end.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ensembles of extended Atmospheric Model Intercomparison Project (AMIP) runs from the general circulation models of the National Centers for Environmental Prediction (formerly the National Meteorological Center) and the Max-Planck Institute (Hamburg, Germany) are used to estimate the potential predictability (PP) of an index of the Pacific–North America (PNA) mode of climate change. The PP of this pattern in “perfect” prediction experiments is 20%–25% of the index’s variance. The models, particularly that from MPI, capture virtually all of this variance in their hindcasts of the winter PNA for the period 1970–93. The high levels of internally generated model noise in the PNA simulations reconfirm the need for an ensemble averaging approach to climate prediction. This means that the forecasts ought to be expressed in a probabilistic manner. It is shown that the models’ skills are higher by about 50% during strong SST events in the tropical Pacific, so the probabilistic forecasts need to be conditional on the tropical SST. Taken together with earlier studies, the present results suggest that the original set of AMIP integrations (single 10-yr runs) is not adequate to reliably test the participating models’ simulations of interannual climate variability in the midlatitudes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Once you have generated a 3D model of a protein, how do you know whether it bears any resemblance to the actual structure? To determine the usefulness of 3D models of proteins, they must be assessed in terms of their quality by methods that predict their similarity to the native structure. The ModFOLD4 server is the latest version of our leading independent server for the estimation of both the global and local (per-residue) quality of 3D protein models. The server produces both machine readable and graphical output, providing users with intuitive visual reports on the quality of predicted protein tertiary structures. The ModFOLD4 server is freely available to all at: http://www.reading.ac.uk/bioinf/ModFOLD/.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Three wind gust estimation (WGE) methods implemented in the numerical weather prediction (NWP) model COSMO-CLM are evaluated with respect to their forecast quality using skill scores. Two methods estimate gusts locally from mean wind speed and the turbulence state of the atmosphere, while the third one considers the mixing-down of high momentum within the planetary boundary layer (WGE Brasseur). One hundred and fifty-eight windstorms from the last four decades are simulated and results are compared with gust observations at 37 stations in Germany. Skill scores reveal that the local WGE methods show an overall better behaviour, whilst WGE Brasseur performs less well except for mountain regions. The here introduced WGE turbulent kinetic energy (TKE) permits a probabilistic interpretation using statistical characteristics of gusts at observational sites for an assessment of uncertainty. The WGE TKE formulation has the advantage of a ‘native’ interpretation of wind gusts as result of local appearance of TKE. The inclusion of a probabilistic WGE TKE approach in NWP models has, thus, several advantages over other methods, as it has the potential for an estimation of uncertainties of gusts at observational sites.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We compare a number of models of post War US output growth in terms of the degree and pattern of non-linearity they impart to the conditional mean, where we condition on either the previous period's growth rate, or the previous two periods' growth rates. The conditional means are estimated non-parametrically using a nearest-neighbour technique on data simulated from the models. In this way, we condense the complex, dynamic, responses that may be present in to graphical displays of the implied conditional mean.