44 resultados para likelihood-based inference
Recent developments in genetic data analysis: what can they tell us about human demographic history?
Resumo:
Over the last decade, a number of new methods of population genetic analysis based on likelihood have been introduced. This review describes and explains the general statistical techniques that have recently been used, and discusses the underlying population genetic models. Experimental papers that use these methods to infer human demographic and phylogeographic history are reviewed. It appears that the use of likelihood has hitherto had little impact in the field of human population genetics, which is still primarily driven by more traditional approaches. However, with the current uncertainty about the effects of natural selection, population structure and ascertainment of single-nucleotide polymorphism markers, it is suggested that likelihood-based methods may have a greater impact in the future.
Resumo:
Many well-established statistical methods in genetics were developed in a climate of severe constraints on computational power. Recent advances in simulation methodology now bring modern, flexible statistical methods within the reach of scientists having access to a desktop workstation. We illustrate the potential advantages now available by considering the problem of assessing departures from Hardy-Weinberg (HW) equilibrium. Several hypothesis tests of HW have been established, as well as a variety of point estimation methods for the parameter which measures departures from HW under the inbreeding model. We propose a computational, Bayesian method for assessing departures from HW, which has a number of important advantages over existing approaches. The method incorporates the effects-of uncertainty about the nuisance parameters--the allele frequencies--as well as the boundary constraints on f (which are functions of the nuisance parameters). Results are naturally presented visually, exploiting the graphics capabilities of modern computer environments to allow straightforward interpretation. Perhaps most importantly, the method is founded on a flexible, likelihood-based modelling framework, which can incorporate the inbreeding model if appropriate, but also allows the assumptions of the model to he investigated and, if necessary, relaxed. Under appropriate conditions, information can be shared across loci and, possibly, across populations, leading to more precise estimation. The advantages of the method are illustrated by application both to simulated data and to data analysed by alternative methods in the recent literature.
Resumo:
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).
Resumo:
The Homeric epics are among the greatest masterpieces of literature, but when they were produced is not known with certainty. Here we apply evolutionary-linguistic phylogenetic statistical methods to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to estimate a date of approximately 710–760 BCE for these great works. Our analysis compared a common set of vocabulary items among the three pairs of languages, recording for each item whether the words in the two languages were cognate – derived from a shared ancestral word – or not. We then used a likelihood-based Markov chain Monte Carlo procedure to estimate the most probable times in years separating these languages given the percentage of words they shared, combined with knowledge of the rates at which different words change. Our date for the epics is in close agreement with historians' and classicists' beliefs derived from historical and archaeological sources.
Resumo:
Background Polygalacturonase-inhibiting proteins (PGIPs) are leucine-rich repeat (LRR) plant cell wall glycoproteins involved in plant immunity. They are typically encoded by gene families with a small number of gene copies whose evolutionary origin has been poorly investigated. Here we report the complete characterization of the full complement of the pgip family in soybean (Glycine max [L.] Merr.) and the characterization of the genomic region surrounding the pgip family in four legume species. Results BAC clone and genome sequence analyses showed that the soybean genome contains two pgip loci. Each locus is composed of three clustered genes that are induced following infection with the fungal pathogen Sclerotinia sclerotiorum (Lib.) de Bary, and remnant sequences of pgip genes. The analyzed homeologous soybean genomic regions (about 126 Kb) that include the pgip loci are strongly conserved and this conservation extends also to the genomes of the legume species Phaseolus vulgaris L., Medicago truncatula Gaertn. and Cicer arietinum L., each containing a single pgip locus. Maximum likelihood-based gene trees suggest that the genes within the pgip clusters have independently undergone tandem duplication in each species. Conclusions The paleopolyploid soybean genome contains two pgip loci comprised in large and highly conserved duplicated regions, which are also conserved in bean, M. truncatula and C. arietinum. The genomic features of these legume pgip families suggest that the forces driving the evolution of pgip genes follow the birth-and-death model, similar to that proposed for the evolution of resistance (R) genes of NBS-LRR-type.
Resumo:
We investigated the processes of how adult readers evaluate and revise their situation model during reading by monitoring their eye movements as they read narrative texts and subsequent critical sentences. In each narrative text, a short introduction primed a knowledge-based inference, followed by a target concept that was either expected (e.g., “oven”) or unexpected (e.g., “grill”) in relation to the inferred concept. Eye movements showed that readers detected a mismatch between the new unexpected information and their prior interpretation, confirming their ability to evaluate inferential information. Just below the narrative text, a critical sentence included a target word that was either congruent (e.g., “roasted”) or incongruent (e.g., “barbecued”) with the expected but not the unexpected concept. Readers spent less time reading the congruent than the incongruent target word, reflecting the facilitation of prior information. In addition, when the unexpected (but not expected) concept had been presented, participants with lower verbal (but not visuospatial) working memory span exhibited longer reading times and made more regressions (from the critical sentence to previous information) on encountering congruent information, indicating difficulty in inhibiting their initial incorrect interpretation and revising their situation model
Resumo:
We introduce a procedure for association based analysis of nuclear families that allows for dichotomous and more general measurements of phenotype and inclusion of covariate information. Standard generalized linear models are used to relate phenotype and its predictors. Our test procedure, based on the likelihood ratio, unifies the estimation of all parameters through the likelihood itself and yields maximum likelihood estimates of the genetic relative risk and interaction parameters. Our method has advantages in modelling the covariate and gene-covariate interaction terms over recently proposed conditional score tests that include covariate information via a two-stage modelling approach. We apply our method in a study of human systemic lupus erythematosus and the C-reactive protein that includes sex as a covariate.
Resumo:
Stephens and Donnelly have introduced a simple yet powerful importance sampling scheme for computing the likelihood in population genetic models. Fundamental to the method is an approximation to the conditional probability of the allelic type of an additional gene, given those currently in the sample. As noted by Li and Stephens, the product of these conditional probabilities for a sequence of draws that gives the frequency of allelic types in a sample is an approximation to the likelihood, and can be used directly in inference. The aim of this note is to demonstrate the high level of accuracy of "product of approximate conditionals" (PAC) likelihood when used with microsatellite data. Results obtained on simulated microsatellite data show that this strategy leads to a negligible bias over a wide range of the scaled mutation parameter theta. Furthermore, the sampling variance of likelihood estimates as well as the computation time are lower than that obtained with importance sampling on the whole range of theta. It follows that this approach represents an efficient substitute to IS algorithms in computer intensive (e.g. MCMC) inference methods in population genetics. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
The Bryaceae are a large cosmopolitan family of mosses containing genera of considerable taxonomic difficulty. Phylogenetic relationships within the family were inferred using data from chloroplast DNA sequences (rps4 and trnL-trnF region). Parsimony and maximum likelihood optimality criteria, and Bayesian phylogenetic inference procedures were employed to reconstruct relationships. The genera Bryum and Brachymenium are not monophyletic groups. A clade comprising Plagiobryum, Acidodontium, Mielichhoferia macrocarpa, Bryum sects. Bryum, Apalodictyon, Limbata, Leucodontium, Caespiticia, Capillaria (in part: sect. Capillaria), and Brachymenium sect. Dicranobryum, is well supported in all analyses and represents a major lineage within the family. Section Dicranobryum of Brachymenium is more closely related to section Bryum than to the other sections of Brachymenium, as are Mielichhoferia macrocarpa and M. himalayana. Species of Acidodontium form a clade with Anomobryum julaceum. The grouping of species with a rosulate gametophytic growth form suggests the presence of a 'rosulate' clade similar in circumscription to the genus Rosulabryum. Mielichhoferia macrocarpa and M. himalayana are transferred to Bryum as B. porsildii and B. caucasicum, respectively.
Resumo:
Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of “likelihood-free” methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models. Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.
Resumo:
The use of Bayesian inference in the inference of time-frequency representations has, thus far, been limited to offline analysis of signals, using a smoothing spline based model of the time-frequency plane. In this paper we introduce a new framework that allows the routine use of Bayesian inference for online estimation of the time-varying spectral density of a locally stationary Gaussian process. The core of our approach is the use of a likelihood inspired by a local Whittle approximation. This choice, along with the use of a recursive algorithm for non-parametric estimation of the local spectral density, permits the use of a particle filter for estimating the time-varying spectral density online. We provide demonstrations of the algorithm through tracking chirps and the analysis of musical data.
Resumo:
Despite the many models developed for phosphorus concentration prediction at differing spatial and temporal scales, there has been little effort to quantify uncertainty in their predictions. Model prediction uncertainty quantification is desirable, for informed decision-making in river-systems management. An uncertainty analysis of the process-based model, integrated catchment model of phosphorus (INCA-P), within the generalised likelihood uncertainty estimation (GLUE) framework is presented. The framework is applied to the Lugg catchment (1,077 km2), a River Wye tributary, on the England–Wales border. Daily discharge and monthly phosphorus (total reactive and total), for a limited number of reaches, are used to initially assess uncertainty and sensitivity of 44 model parameters, identified as being most important for discharge and phosphorus predictions. This study demonstrates that parameter homogeneity assumptions (spatial heterogeneity is treated as land use type fractional areas) can achieve higher model fits, than a previous expertly calibrated parameter set. The model is capable of reproducing the hydrology, but a threshold Nash-Sutcliffe co-efficient of determination (E or R 2) of 0.3 is not achieved when simulating observed total phosphorus (TP) data in the upland reaches or total reactive phosphorus (TRP) in any reach. Despite this, the model reproduces the general dynamics of TP and TRP, in point source dominated lower reaches. This paper discusses why this application of INCA-P fails to find any parameter sets, which simultaneously describe all observed data acceptably. The discussion focuses on uncertainty of readily available input data, and whether such process-based models should be used when there isn’t sufficient data to support the many parameters.
Resumo:
It has been generally accepted that the method of moments (MoM) variogram, which has been widely applied in soil science, requires about 100 sites at an appropriate interval apart to describe the variation adequately. This sample size is often larger than can be afforded for soil surveys of agricultural fields or contaminated sites. Furthermore, it might be a much larger sample size than is needed where the scale of variation is large. A possible alternative in such situations is the residual maximum likelihood (REML) variogram because fewer data appear to be required. The REML method is parametric and is considered reliable where there is trend in the data because it is based on generalized increments that filter trend out and only the covariance parameters are estimated. Previous research has suggested that fewer data are needed to compute a reliable variogram using a maximum likelihood approach such as REML, however, the results can vary according to the nature of the spatial variation. There remain issues to examine: how many fewer data can be used, how should the sampling sites be distributed over the site of interest, and how do different degrees of spatial variation affect the data requirements? The soil of four field sites of different size, physiography, parent material and soil type was sampled intensively, and MoM and REML variograms were calculated for clay content. The data were then sub-sampled to give different sample sizes and distributions of sites and the variograms were computed again. The model parameters for the sets of variograms for each site were used for cross-validation. Predictions based on REML variograms were generally more accurate than those from MoM variograms with fewer than 100 sampling sites. A sample size of around 50 sites at an appropriate distance apart, possibly determined from variograms of ancillary data, appears adequate to compute REML variograms for kriging soil properties for precision agriculture and contaminated sites. (C) 2007 Elsevier B.V. All rights reserved.