11 resultados para Divergence time estimates
em Duke University
Resumo:
Alewife, Alosa pseudoharengus, populations occur in two discrete life-history variants, an anadromous form and a landlocked (freshwater resident) form. Landlocked populations display a consistent pattern of life-history divergence from anadromous populations, including earlier age at maturity, smaller adult body size, and reduced fecundity. In Connecticut (USA), dams constructed on coastal streams separate anadromous spawning runs from lake-resident landlocked populations. Here, we used sequence data from the mtDNA control region and allele frequency data from five microsatellite loci to ask whether coastal Connecticut landlocked alewife populations are independently evolved from anadromous populations or whether they share a common freshwater ancestor. We then used microsatellite data to estimate the timing of the divergence between anadromous and landlocked populations. Finally, we examined anadromous and landlocked populations for divergence in foraging morphology and used divergence time estimates to calculate the rate of evolution for foraging traits. Our results indicate that landlocked populations have evolved multiple times independently. Tests of population divergence and estimates of gene flow show that landlocked populations are genetically isolated, whereas anadromous populations exchange genes. These results support a 'phylogenetic raceme' model of landlocked alewife divergence, with anadromous populations forming an ancestral core from which landlocked populations independently diverged. Divergence time estimates suggest that landlocked populations diverged from a common anadromous ancestor no longer than 5000 years ago and perhaps as recently as 300 years ago, depending on the microsatellite mutation rate assumed. Examination of foraging traits reveals landlocked populations to have significantly narrower gapes and smaller gill raker spacings than anadromous populations, suggesting that they are adapted to foraging on smaller prey items. Estimates of evolutionary rates (in haldanes) indicate rapid evolution of foraging traits, possibly in response to changes in available resources.
Resumo:
Mitchell et al. argue that divergence-time estimates for our avian phylogeny were too young because of an "inappropriate" maximum age constraint for the most recent common ancestor of modern birds and that, as a result, most modern bird orders diverged before the Cretaceous-Paleogene mass extinction event 66 million years ago instead of after. However, their interpretations of the fossil record and timetrees are incorrect.
Resumo:
New applications of genetic data to questions of historical biogeography have revolutionized our understanding of how organisms have come to occupy their present distributions. Phylogenetic methods in combination with divergence time estimation can reveal biogeographical centres of origin, differentiate between hypotheses of vicariance and dispersal, and reveal the directionality of dispersal events. Despite their power, however, phylogenetic methods can sometimes yield patterns that are compatible with multiple, equally well-supported biogeographical hypotheses. In such cases, additional approaches must be integrated to differentiate among conflicting dispersal hypotheses. Here, we use a synthetic approach that draws upon the analytical strengths of coalescent and population genetic methods to augment phylogenetic analyses in order to assess the biogeographical history of Madagascar's Triaenops bats (Chiroptera: Hipposideridae). Phylogenetic analyses of mitochondrial DNA sequence data for Malagasy and east African Triaenops reveal a pattern that equally supports two competing hypotheses. While the phylogeny cannot determine whether Africa or Madagascar was the centre of origin for the species investigated, it serves as the essential backbone for the application of coalescent and population genetic methods. From the application of these methods, we conclude that a hypothesis of two independent but unidirectional dispersal events from Africa to Madagascar is best supported by the data.
Resumo:
Illicit trade carries the potential to magnify existing tobacco-related health care costs through increased availability of untaxed and inexpensive cigarettes. What is known with respect to the magnitude of illicit trade for Vietnam is produced primarily by the industry, and methodologies are typically opaque. Independent assessment of the illicit cigarette trade in Vietnam is vital to tobacco control policy. This paper measures the magnitude of illicit cigarette trade for Vietnam between 1998 and 2010 using two methods, discrepancies between legitimate domestic cigarette sales and domestic tobacco consumption estimated from surveys, and trade discrepancies as recorded by Vietnam and trade partners. The results indicate that Vietnam likely experienced net smuggling in during the period studied. With the inclusion of adjustments for survey respondent under-reporting, inward illicit trade likely occurred in three of the four years for which surveys were available. Discrepancies in trade records indicate that the value of smuggled cigarettes into Vietnam ranges from $100 million to $300 million between 2000 and 2010 and that these cigarettes primarily originate in Singapore, Hong Kong, Macao, Malaysia, and Australia. Notable differences in trends over time exist between the two methods, but by comparison, the industry estimates consistently place the magnitude of illicit trade at the upper bounds of what this study shows. The unavailability of annual, survey-based estimates of consumption may obscure the true, annual trend over time. Second, as surveys changed over time, estimates relying on them may be inconsistent with one another. Finally, these two methods measure different components of illicit trade, specifically consumption of illicit cigarettes regardless of origin and smuggling of cigarettes into a particular market. However, absent a gold standard, comparisons of different approaches to illicit trade measurement serve efforts to refine and improve measurement approaches and estimates.
Resumo:
PREMISE OF THE STUDY: We investigated the origins of 252 Southern Appalachian woody species representing 158 clades to analyze larger patterns of biogeographic connectivity around the northern hemisphere. We tested biogeographic hypotheses regarding the timing of species disjunctions to eastern Asia and among areas of North America. METHODS: We delimited species into biogeographically informative clades, compiled sister-area data, and generated graphic representations of area connections across clades. We calculated taxon diversity within clades and plotted divergence times. KEY RESULTS: Of the total taxon diversity, 45% were distributed among 25 North American endemic clades. Sister taxa within eastern North America and eastern Asia were proportionally equal in frequency, accounting for over 50% of the sister-area connections. At increasing phylogenetic depth, connections to the Old World dominated. Divergence times for 65 clades with intercontinental disjunctions were continuous, whereas 11 intracontinental disjunctions to western North America and nine to eastern Mexico were temporally congruent. CONCLUSIONS: Over one third of the clades have likely undergone speciation within the region of eastern North America. The biogeographic pattern for the region is asymmetric, consisting of mostly mixed-aged, low-diversity clades connecting to the Old World, and a minority of New World clades. Divergence time data suggest that climate change in the Late Miocene to Early Pliocene generated disjunct patterns within North America. Continuous splitting times during the last 45 million years support the hypothesis that widespread distributions formed repeatedly during favorable periods, with serial cooling trends producing pseudocongruent area disjunctions between eastern North America and eastern Asia.
Resumo:
UNLABELLED: • PREMISE OF THE STUDY: Understanding fern (monilophyte) phylogeny and its evolutionary timescale is critical for broad investigations of the evolution of land plants, and for providing the point of comparison necessary for studying the evolution of the fern sister group, seed plants. Molecular phylogenetic investigations have revolutionized our understanding of fern phylogeny, however, to date, these studies have relied almost exclusively on plastid data.• METHODS: Here we take a curated phylogenomics approach to infer the first broad fern phylogeny from multiple nuclear loci, by combining broad taxon sampling (73 ferns and 12 outgroup species) with focused character sampling (25 loci comprising 35877 bp), along with rigorous alignment, orthology inference and model selection.• KEY RESULTS: Our phylogeny corroborates some earlier inferences and provides novel insights; in particular, we find strong support for Equisetales as sister to the rest of ferns, Marattiales as sister to leptosporangiate ferns, and Dennstaedtiaceae as sister to the eupolypods. Our divergence-time analyses reveal that divergences among the extant fern orders all occurred prior to ∼200 MYA. Finally, our species-tree inferences are congruent with analyses of concatenated data, but generally with lower support. Those cases where species-tree support values are higher than expected involve relationships that have been supported by smaller plastid datasets, suggesting that deep coalescence may be reducing support from the concatenated nuclear data.• CONCLUSIONS: Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies.
Resumo:
Numerical approximation of the long time behavior of a stochastic di.erential equation (SDE) is considered. Error estimates for time-averaging estimators are obtained and then used to show that the stationary behavior of the numerical method converges to that of the SDE. The error analysis is based on using an associated Poisson equation for the underlying SDE. The main advantages of this approach are its simplicity and universality. It works equally well for a range of explicit and implicit schemes, including those with simple simulation of random variables, and for hypoelliptic SDEs. To simplify the exposition, we consider only the case where the state space of the SDE is a torus, and we study only smooth test functions. However, we anticipate that the approach can be applied more widely. An analogy between our approach and Stein's method is indicated. Some practical implications of the results are discussed. Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Resumo:
BACKGROUND: Speciation begins when populations become genetically separated through a substantial reduction in gene flow, and it is at this point that a genetically cohesive set of populations attain the sole property of species: the independent evolution of a population-level lineage. The comprehensive delimitation of species within biodiversity hotspots, regardless of their level of divergence, is important for understanding the factors that drive the diversification of biota and for identifying them as targets for conservation. However, delimiting recently diverged species is challenging due to insufficient time for the differential evolution of characters--including morphological differences, reproductive isolation, and gene tree monophyly--that are typically used as evidence for separately evolving lineages. METHODOLOGY: In this study, we assembled multiple lines of evidence from the analysis of mtDNA and nDNA sequence data for the delimitation of a high diversity of cryptically diverged population-level mouse lemur lineages across the island of Madagascar. Our study uses a multi-faceted approach that applies phylogenetic, population genetic, and genealogical analysis for recognizing lineage diversity and presents the most thoroughly sampled species delimitation of mouse lemur ever performed. CONCLUSIONS: The resolution of a large number of geographically defined clades in the mtDNA gene tree provides strong initial evidence for recognizing a high diversity of population-level lineages in mouse lemurs. We find additional support for lineage recognition in the striking concordance between mtDNA clades and patterns of nuclear population structure. Lineages identified using these two sources of evidence also exhibit patterns of population divergence according to genealogical exclusivity estimates. Mouse lemur lineage diversity is reflected in both a geographically fine-scaled pattern of population divergence within established and geographically widespread taxa, as well as newly resolved patterns of micro-endemism revealed through expanded field sampling into previously poorly and well-sampled regions.
Resumo:
Studies of adaptive divergence have traditionally focused on the ecological causes of trait diversification, while the ecological consequences of phenotypic divergence remain relatively unexplored. Divergence in predator foraging traits, in particular, has the potential to impact the structure and dynamics of ecological communities. To examine the effects of predator trait divergence on prey communities, we exposed zooplankton communities in lake mesocosms to predation from either anadromous or landlocked (freshwater resident) alewives, which have undergone recent and rapid phenotypic differentiation in foraging traits (gape width, gill raker spacing, and prey size-selectivity). Anadromous alewives, which exploit large prey items, significantly reduced the mean body size, total biomass, species richness, and diversity of crustacean zooplankton relative to landlocked alewives, which exploit smaller prey. The zooplankton responses observed in this experiment are consistent with patterns observed in lakes. This study provides direct evidence that phenotypic divergence in predators, even in its early stages, can play a critical role in determining prey community structure.
Resumo:
Molecular data have converged on a consensus about the genus-level phylogeny of extant platyrrhine monkeys, but for most extinct taxa and certainly for those older than the Pleistocene we must rely upon morphological evidence from fossils. This raises the question as to how well anatomical data mirror molecular phylogenies and how best to deal with discrepancies between the molecular and morphological data as we seek to extend our phylogenies to the placement of fossil taxa. Here I present parsimony-based phylogenetic analyses of extant and fossil platyrrhines based on an anatomical dataset of 399 dental characters and osteological features of the cranium and postcranium. I sample 16 extant taxa (one from each platyrrhine genus) and 20 extinct taxa of platyrrhines. The tree structure is constrained with a "molecular scaffold" of extant species as implemented in maximum parsimony using PAUP with the molecular-based 'backbone' approach. The data set encompasses most of the known extinct species of platyrrhines, ranging in age from latest Oligocene (∼26 Ma) to the Recent. The tree is rooted with extant catarrhines, and Late Eocene and Early Oligocene African anthropoids. Among the more interesting patterns to emerge are: (1) known early platyrrhines from the Late Oligocene through Early Miocene (26-16.5Ma) represent only stem platyrrhine taxa; (2) representatives of the three living platyrrhine families first occur between 15.7 Ma and 13.5 Ma; and (3) recently extinct primates from the Greater Antilles (Cuba, Jamaica, Hispaniola) are sister to the clade of extant platyrrhines and may have diverged in the Early Miocene. It is probable that the crown platyrrhine clade did not originate before about 20-24 Ma, a conclusion consistent with the phylogenetic analysis of fossil taxa presented here and with recent molecular clock estimates. The following biogeographic scenario is consistent with the phylogenetic findings and climatic and geologic evidence: Tropical South America has been a center for platyrrhine diversification since platyrrhines arrived on the continent in the middle Cenozoic. Platyrrhines dispersed from tropical South America to Patagonia at ∼25-24 Ma via a "Paraná Portal" through eastern South America across a retreating Paranense Sea. Phylogenetic bracketing suggests Antillean primates arrived via a sweepstakes route or island chain from northern South America in the Early Miocene, not via a proposed land bridge or island chain (GAARlandia) in the Early Oligocene (∼34 Ma). Patagonian and Antillean platyrrhines went extinct without leaving living descendants, the former at the end of the Early Miocene and the latter within the past six thousand years. Molecular evidence suggests crown platyrrhines arrived in Central America by crossing an intermittent connection through the Isthmus of Panama at or after 3.5Ma. Any more ancient Central American primates, should they be discovered, are unlikely to have given rise to the extant Central American taxa in situ.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.