923 resultados para Population-structure
Resumo:
Understanding genetic variability and gene flow between populations of scleractinian corals separated by one to several hundred kilometers is crucially important as we head into a century of climate change in which an understanding of the connectivity of populations is a critically important question in management. Genetic methods that directly use molecular variance in the DNA should offer greater precision in detecting differences among individuals and populations than the more traditional allozyme electrophoresis. However, this paper highlights the point that the limited number of DNA markers that have been identified for scleractinian coral genetic studies do not necessarily offer greater precision than that offered by allozymes. In fact, at present allozyme electrophoresis yields greater information than the eight different DNA markers used in this study. Given the relative ease of use of allozymes and the wealth of comparable data sets from numerous previously published studies, allozyme electrophoresis should not be dismissed for population structure and connectivity studies on coral reefs. While continued effort should be placed into searching for new DNA markers, until a more sensitive DNA marker becomes available for scleractinian corals, allozyme electrophoresis remains a powerful and relevant technique for understanding the connectivity of coral population studies.
Resumo:
Genotypic diversity in Fusarium pseudograminearum and F. graminearum from Australia and the relationship between diversity and pathogen aggressiveness for head blight and/or crown rot of wheat were examined. Amplified fragment length polymorphism (AFLP) analysis revealed a high level of genotypic diversity within each species. Sixty-three of the 149 AFLP loci were significantly different between the two species and 70 of 72 F. pseudograminearum and 56 of 59 F. graminearum isolates had distinct haplotypes. When head blight and crown rot severity data from a recently published work on isolates representing the entire range of aggressiveness were used, only the genotypic diversity of F. pseudograminearum was significantly associated with its aggressiveness for the two diseases. Cluster analyses clearly demonstrated the polyphyletic structures that exist in both pathogen populations. The spatial diversity within F. graminearum was high within a single field, while frequent gene flow (N-m similar to 14) and a low fixation index (G(st) = 0.03) were recorded among F. pseudograminearum isolates from the adjacent states of New South Wales and Queensland. The differences in population structure between the heterothallic F. pseudograminearum (teleomorph G. coronicola) and the homothallic F. graminearum (teleomorph G. zeae) were not as pronounced as expected given their contrasting mating systems. Neither species was panmictic or strictly clonal. This points to sexual recombination in F. pseudograminearum, suggesting that ascospores of G. coronicola may also play a role in its biology and epidemiology.
Resumo:
We have isolated 18 polymorphic microsatellite loci for Cophixalus ornatus from genomic libraries enriched for (AAAG)(n), (AACC)(n) and (AAGG)(n) repetitive elements. The number of alleles ranges from five to 22 per locus with the observed heterozygosity ranging from 0.10 to 0.92. These markers will be useful for the analysis of population structure in C. ornatus and testing alternative models of speciation.
Resumo:
The aim of this study was to identify a set of genetic polymorphisms that efficiently divides methicillin-resistant Staphylococcus aureus (MRSA) strains into groups consistent with the population structure. The rationale was that such polymorphisms could underpin rapid real-time PCR or low-density array-based methods for monitoring MRSA dissemination in a cost-effective manner. Previously, the authors devised a computerized method for identifying sets of single nucleoticle polymorphisms (SNPs) with high resolving power that are defined by multilocus sequence typing (MLST) databases, and also developed a real-time PCR method for interrogating a seven-member SNP set for genotyping S. aureus. Here, it is shown that these seven SNPs efficiently resolve the major MRSA lineages and define 27 genotypes. The SNP-based genotypes are consistent with the MRSA population structure as defined by eBURST analysis. The capacity of binary markers to improve resolution was tested using 107 diverse MRSA isolates of Australian origin that encompass nine SNP-based genotypes. The addition of the virulence-associated genes cna, pvl and bbplsdrE, and the integrated plasmids pT181, p1258 and pUB110, resolved the nine SNP-based genotypes into 21 combinatorial genotypes. Subtyping of the SCCmec locus revealed new SCCmec types and increased the number of combinatorial genotypes to 24. It was concluded that these polymorphisms provide a facile means of assigning MRSA isolates into well-recognized lineages.
Resumo:
We have developed a novel multilocus sequence typing (MLST) scheme and database (http://pubmlst.org/pacnes/) for Propionibacterium acnes based on the analysis of seven core housekeeping genes. The scheme, which was validated against previously described antibody, single locus and random amplification of polymorphic DNA typing methods, displayed excellent resolution and differentiated 123 isolates into 37 sequence types (STs). An overall clonal population structure was detected with six eBURST groups representing the major clades I, II and III, along with two singletons. Two highly successful and global clonal lineages, ST6 (type IA) and ST10 (type IB1), representing 64?% of this current MLST isolate collection were identified. The ST6 clone and closely related single locus variants, which comprise a large clonal complex CC6, dominated isolates from patients with acne, and were also significantly associated with ophthalmic infections. Our data therefore support an association between acne and P. acnes strains from the type IA cluster and highlight the role of a widely disseminated clonal genotype in this condition. Characterization of type I cell surface-associated antigens that are not detected in ST10 or strains of type II and III identified two dermatan-sulphate-binding proteins with putative phase/antigenic variation signatures. We propose that the expression of these proteins by type IA organisms contributes to their role in the pathophysiology of acne and helps explain the recurrent nature of the disease. The MLST scheme and database described in this study should provide a valuable platform for future epidemiological and evolutionary studies of P. acnes.
Resumo:
This study investigated how harvest and water management affected the ecology of the Pig Frog, Rana grylio. It also examined how mercury levels in leg muscle tissue vary spatially across the Everglades. Rana grylio is an intermediate link in the Everglades food web. Although common, this inconspicuous species can be affected by three forms of anthropogenic disturbance: harvest, water management and mercury contamination. This frog is harvested both commercially and recreationally for its legs, is aquatic and thus may be susceptible to water management practices, and can transfer mercury throughout the Everglades food web. ^ This two-year study took place in three major regions: Everglades National Park (ENP), Water Conservation Areas 3A (A), and Water Conservation Area 3B (B). The study categorized the three sites by their relative harvest level and hydroperiod. During the spring of 2001, areas of the Everglades dried completely. On a regional and local scale Pig Frog abundance was highest in Site A, the longest hydroperiod, heavily harvested site, followed by ENP and B. More frogs were found along survey transects and in capture-recapture plots before the dry-down than after the dry-down in Sites ENP and B. Individual growth patterns were similar across all sites, suggesting differences in body size may be due to selective harvest. Frogs from Site A, the flooded and harvested site, had no differences in survival rates between adults and juveniles. Site B populations shifted from a juvenile to adult dominated population after the dry-down. Dry-downs appeared to affect survival rates more than harvest. ^ Total mercury in frog leg tissue was highest in protected areas of Everglades National Park with a maximum concentration of 2.3 mg/kg wet mass where harvesting is prohibited. Similar spatial patterns in mercury levels were found among pig frogs and other wildlife throughout parts of the Everglades. Pig Frogs may be transferring substantial levels of mercury to other wildlife species in ENP. ^ In summary, although it was found that abundance and survival were reduced by dry-down, lack of adult size classes in Site A, suggest harvest also plays a role in regulating population structure. ^
Resumo:
Paracalanus quasimodo and Temora turbinata are two calanoid copepods prominent in the planktonic communities of the southeastern United States. Despite their prominence, the species and population level structure of these copepods is yet unexplored. The phylogeographic, temporal and phylogenetic structure of P. quasimodo and T. turbinata are examined in my study. Samples were collected from ten sites along the Gulf of Mexico and Florida peninsular coasts. Three sites were sampled quarterly for two years. Individuals were screened for unique ITS-1 sequences with denaturing gradient gel electrophoresis. Unique variants were sequenced at the nuclear ITS-1 and mitochondrial COI loci. Sampling sites were analyzed for pairwise community differences and for variances between geographic and temporal groupings. Genetic variants were analyzed for phylogenetic and coalescent topology. Paracalanus quasimodo is highly structured geographically with populations divided between the Gulf of Mexico, temperate Atlantic and subtropical Atlantic, in addition to isolation by distance. No significant differences were detected between the T. turbinata samples. Both P. quasimodo and T. turbinata are stable within sites over time and between sites within a sampling period, with two exceptions. The first was a pilot sample from Miami taken two years prior to the general sampling whose community showed significant differences from most of the other Miami samples. Paracalanus quasimodo had a positive correlation of Fst with time. The second was high temporal variability detected in the samples from Fort Pierce. Phylogenetically, both P. quasimodo and T. turbinata were in well supported, congeneric clades. Paracalanus quasimodo was not monophyletic, divided into two well-supported clades. Temora turbinata variants were in one clade with insignificant support for topology within the clade and very little intraspecific variation. Paracalanus quasimodo and T. turbinata populations show opposite trends. Paracalanus quasimodo occurs near shore and shows population structure mediated by hydrological features and distance, both geographic and temporal. The phylogeny shows two deeply divergent clades suggestive of cryptic speciation. In contrast, T. turbinata populations range further offshore and show little geographic or temporal structure. However, the low genetic variation detected in this region suggests a recent bottleneck event.
Resumo:
In fire-dependent forests, managers are interested in predicting the consequences of prescribed burning on postfire tree mortality. We examined the effects of prescribed fire on tree mortality in Florida Keys pine forests, using a factorial design with understory type, season, and year of burn as factors. We also used logistic regression to model the effects of burn season, fire severity, and tree dimensions on individual tree mortality. Despite limited statistical power due to problems in carrying out the full suite of planned experimental burns, associations with tree and fire variables were observed. Post-fire pine tree mortality was negatively correlated with tree size and positively correlated with char height and percent crown scorch. Unlike post-fire mortality, tree mortality associated with storm surge from Hurricane Wilma was greater in the large size classes. Due to their influence on population structure and fuel dynamics, the size-selective mortality patterns following fire and storm surge have practical importance for using fire as a management tool in Florida Keys pinelands in the future, particularly when the threats to their continued existence from tropical storms and sea level rise are expected to increase.
Resumo:
Peer reviewed
Resumo:
Peer reviewed
Resumo:
Funding This work was supported by the HADEEP projects, funded by the Nippon Foundation, Japan (2009765188), the Natural Environmental Research Council, UK (NE/E007171/1) and the Total Foundation, France. We acknowledge additional support from the Marine Alliance for Science and Technology for Scotland (MASTS) funded by the Scottish Funding Council (Ref: HR09011) and contributing institutions. We also acknowledge support from the Leverhulme Trust to SBP. Additional sea time was supported by NIWA’s ‘Impact of Resource Use on Vulnerable Deep-Sea Communities’ project (CO1_0906)
Resumo:
Peer reviewed
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.