208 resultados para Ancestral inference
em Université de Lausanne, Switzerland
Resumo:
Aim Recently developed parametric methods in historical biogeography allow researchers to integrate temporal and palaeogeographical information into the reconstruction of biogeographical scenarios, thus overcoming a known bias of parsimony-based approaches. Here, we compare a parametric method, dispersal-extinction-cladogenesis (DEC), against a parsimony-based method, dispersal-vicariance analysis (DIVA), which does not incorporate branch lengths but accounts for phylogenetic uncertainty through a Bayesian empirical approach (Bayes-DIVA). We analyse the benefits and limitations of each method using the cosmopolitan plant family Sapindaceae as a case study.Location World-wide.Methods Phylogenetic relationships were estimated by Bayesian inference on a large dataset representing generic diversity within Sapindaceae. Lineage divergence times were estimated by penalized likelihood over a sample of trees from the posterior distribution of the phylogeny to account for dating uncertainty in biogeographical reconstructions. We compared biogeographical scenarios between Bayes-DIVA and two different DEC models: one with no geological constraints and another that employed a stratified palaeogeographical model in which dispersal rates were scaled according to area connectivity across four time slices, reflecting the changing continental configuration over the last 110 million years.Results Despite differences in the underlying biogeographical model, Bayes-DIVA and DEC inferred similar biogeographical scenarios. The main differences were: (1) in the timing of dispersal events - which in Bayes-DIVA sometimes conflicts with palaeogeographical information, and (2) in the lower frequency of terminal dispersal events inferred by DEC. Uncertainty in divergence time estimations influenced both the inference of ancestral ranges and the decisiveness with which an area can be assigned to a node.Main conclusions By considering lineage divergence times, the DEC method gives more accurate reconstructions that are in agreement with palaeogeographical evidence. In contrast, Bayes-DIVA showed the highest decisiveness in unequivocally reconstructing ancestral ranges, probably reflecting its ability to integrate phylogenetic uncertainty. Care should be taken in defining the palaeogeographical model in DEC because of the possibility of overestimating the frequency of extinction events, or of inferring ancestral ranges that are outside the extant species ranges, owing to dispersal constraints enforced by the model. The wide-spanning spatial and temporal model proposed here could prove useful for testing large-scale biogeographical patterns in plants.
Resumo:
Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.
Resumo:
Continuing developments in science and technology mean that the amounts of information forensic scientists are able to provide for criminal investigations is ever increasing. The commensurate increase in complexity creates difficulties for scientists and lawyers with regard to evaluation and interpretation, notably with respect to issues of inference and decision. Probability theory, implemented through graphical methods, and specifically Bayesian networks, provides powerful methods to deal with this complexity. Extensions of these methods to elements of decision theory provide further support and assistance to the judicial system. Bayesian Networks for Probabilistic Inference and Decision Analysis in Forensic Science provides a unique and comprehensive introduction to the use of Bayesian decision networks for the evaluation and interpretation of scientific findings in forensic science, and for the support of decision-makers in their scientific and legal tasks. Includes self-contained introductions to probability and decision theory. Develops the characteristics of Bayesian networks, object-oriented Bayesian networks and their extension to decision models. Features implementation of the methodology with reference to commercial and academically available software. Presents standard networks and their extensions that can be easily implemented and that can assist in the reader's own analysis of real cases. Provides a technique for structuring problems and organizing data based on methods and principles of scientific reasoning. Contains a method for the construction of coherent and defensible arguments for the analysis and evaluation of scientific findings and for decisions based on them. Is written in a lucid style, suitable for forensic scientists and lawyers with minimal mathematical background. Includes a foreword by Ian Evett. The clear and accessible style of this second edition makes this book ideal for all forensic scientists, applied statisticians and graduate students wishing to evaluate forensic findings from the perspective of probability and decision analysis. It will also appeal to lawyers and other scientists and professionals interested in the evaluation and interpretation of forensic findings, including decision making based on scientific information.
Multimodel inference and multimodel averaging in empirical modeling of occupational exposure levels.
Resumo:
Empirical modeling of exposure levels has been popular for identifying exposure determinants in occupational hygiene. Traditional data-driven methods used to choose a model on which to base inferences have typically not accounted for the uncertainty linked to the process of selecting the final model. Several new approaches propose making statistical inferences from a set of plausible models rather than from a single model regarded as 'best'. This paper introduces the multimodel averaging approach described in the monograph by Burnham and Anderson. In their approach, a set of plausible models are defined a priori by taking into account the sample size and previous knowledge of variables influent on exposure levels. The Akaike information criterion is then calculated to evaluate the relative support of the data for each model, expressed as Akaike weight, to be interpreted as the probability of the model being the best approximating model given the model set. The model weights can then be used to rank models, quantify the evidence favoring one over another, perform multimodel prediction, estimate the relative influence of the potential predictors and estimate multimodel-averaged effects of determinants. The whole approach is illustrated with the analysis of a data set of 1500 volatile organic compound exposure levels collected by the Institute for work and health (Lausanne, Switzerland) over 20 years, each concentration having been divided by the relevant Swiss occupational exposure limit and log-transformed before analysis. Multimodel inference represents a promising procedure for modeling exposure levels that incorporates the notion that several models can be supported by the data and permits to evaluate to a certain extent model selection uncertainty, which is seldom mentioned in current practice.
Resumo:
Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.
Improving the performance of positive selection inference by filtering unreliable alignment regions.
Resumo:
Errors in the inferred multiple sequence alignment may lead to false prediction of positive selection. Recently, methods for detecting unreliable alignment regions were developed and were shown to accurately identify incorrectly aligned regions. While removing unreliable alignment regions is expected to increase the accuracy of positive selection inference, such filtering may also significantly decrease the power of the test, as positively selected regions are fast evolving, and those same regions are often those that are difficult to align. Here, we used realistic simulations that mimic sequence evolution of HIV-1 genes to test the hypothesis that the performance of positive selection inference using codon models can be improved by removing unreliable alignment regions. Our study shows that the benefit of removing unreliable regions exceeds the loss of power due to the removal of some of the true positively selected sites.
Resumo:
Natural selection is typically exerted at some specific life stages. If natural selection takes place before a trait can be measured, using conventional models can cause wrong inference about population parameters. When the missing data process relates to the trait of interest, a valid inference requires explicit modeling of the missing process. We propose a joint modeling approach, a shared parameter model, to account for nonrandom missing data. It consists of an animal model for the phenotypic data and a logistic model for the missing process, linked by the additive genetic effects. A Bayesian approach is taken and inference is made using integrated nested Laplace approximations. From a simulation study we find that wrongly assuming that missing data are missing at random can result in severely biased estimates of additive genetic variance. Using real data from a wild population of Swiss barn owls Tyto alba, our model indicates that the missing individuals would display large black spots; and we conclude that genes affecting this trait are already under selection before it is expressed. Our model is a tool to correctly estimate the magnitude of both natural selection and additive genetic variance.
Resumo:
Analysis of TRIM5α and APOBEC3G genes suggests that these two restriction factors underwent strong positive selection throughout primate evolution. This pressure was possibly imposed by ancient exogenous retroviruses, of which endogenous retroviruses are remnants. Our study aims to assess in vitro the activity of these factors against ancient retroviruses by reconstructing their ancestral gag sequences, as well as the ancestral TRIM5α and APOBEC3G for primates. Based on evolutionary genomics approach, we reconstructed ancestors of the two largest families of human endogenous retroviruses (HERV), namely HERV-K and HERV-H, as well as primate ancestral TRIM5α and APOBEC3G variants. The oldest TRIM5α sequence was the catarhinne TRIM5α, common ancestor of Old World monkeys and hominoids, dated from 25 million years ago (mya). From the oldest, to the youngest, ancestral TRIM5α variants showed less restriction of HIV-1 in vitro [1]. Likewise three ancestral APOBEC3Gs sequences common to hominoids (18 mya), Old World monkeys, and catarhinnes (25 mya) were reconstructed. All ancestral APOBEC3G variants inhibited efficiently HIV-1Δvif in vitro, compared to modern APOBEC3Gs. The ability of Vif proteins (HIV-1, HIV-2, SIVmac and SIVagm) to counteract their activity tallied with the residue 128 on ancestral APOBEC3Gs. Moreover we are attempting to reconstruct older ancestral sequences of both restriction factors by using prosimian orthologue sequences. An infectious onemillion- years-old HERV-KCON previously reconstituted was shown to be resistant to modern TRIM5α and APOBEC3G [2]. Our ancestral TRIM5α and APOBEC3G variants were inactive against HERV-KCON. Besides we reconstructed chimeric HERV-K bearing ancestral capsids (up to 7 mya) that resulted in infectious viruses resistant to modern and ancestral TRIM5α. Likewise HERV-K viruses bearing ancestral nucleocapsids will be tested for ancestral and modern APOBEC3G restriction. In silico reconstruction and structural modeling of ancestral HERV-H capsids resulted in structures homologous to that of the gammaretrovirus MLV. Thus we are attempting to construct chimeric MLV virus bearing HERV-H ancestral capsids. These chimeric ancestral HERVs will be tested for infectivity and restriction by ancestral TRIM5α. Similarly chimeric MLV viruses bearing ancestral HERV-H nucleocapsids will be reconstructed and tested for APOBEC3G restriction.
Resumo:
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Resumo:
The CD209 gene family that encodes C-type lectins in primates includes CD209 (DC-SIGN), CD209L (L-SIGN) and CD209L2. Understanding the evolution of these genes can help understand the duplication events generating this family, the process leading to the repeated neck region and identify protein domains under selective pressure. We compiled sequences from 14 primates representing 40 million years of evolution and from three non-primate mammal species. Phylogenetic analyses used Bayesian inference, and nucleotide substitutional patterns were assessed by codon-based maximum likelihood. Analyses suggest that CD209 genes emerged from a first duplication event in the common ancestor of anthropoids, yielding CD209L2 and an ancestral CD209 gene, which, in turn, duplicated in the common Old World primate ancestor, giving rise to CD209L and CD209. K(A)/K(S) values averaged over the entire tree were 0.43 (CD209), 0.52 (CD209L) and 0.35 (CD209L2), consistent with overall signatures of purifying selection. We also assessed the Toll-like receptor (TLR) gene family, which shares with CD209 genes a common profile of evolutionary constraint. The general feature of purifying selection of CD209 genes, despite an apparent redundancy (gene absence and gene loss), may reflect the need to faithfully recognize a multiplicity of pathogen motifs, commensals and a number of self-antigens