10 resultados para Thom’s theorem
em Duke University
Resumo:
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham's-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains. © Institute of Mathematical Statistics, 2010.
Resumo:
BACKGROUND: Speciation begins when populations become genetically separated through a substantial reduction in gene flow, and it is at this point that a genetically cohesive set of populations attain the sole property of species: the independent evolution of a population-level lineage. The comprehensive delimitation of species within biodiversity hotspots, regardless of their level of divergence, is important for understanding the factors that drive the diversification of biota and for identifying them as targets for conservation. However, delimiting recently diverged species is challenging due to insufficient time for the differential evolution of characters--including morphological differences, reproductive isolation, and gene tree monophyly--that are typically used as evidence for separately evolving lineages. METHODOLOGY: In this study, we assembled multiple lines of evidence from the analysis of mtDNA and nDNA sequence data for the delimitation of a high diversity of cryptically diverged population-level mouse lemur lineages across the island of Madagascar. Our study uses a multi-faceted approach that applies phylogenetic, population genetic, and genealogical analysis for recognizing lineage diversity and presents the most thoroughly sampled species delimitation of mouse lemur ever performed. CONCLUSIONS: The resolution of a large number of geographically defined clades in the mtDNA gene tree provides strong initial evidence for recognizing a high diversity of population-level lineages in mouse lemurs. We find additional support for lineage recognition in the striking concordance between mtDNA clades and patterns of nuclear population structure. Lineages identified using these two sources of evidence also exhibit patterns of population divergence according to genealogical exclusivity estimates. Mouse lemur lineage diversity is reflected in both a geographically fine-scaled pattern of population divergence within established and geographically widespread taxa, as well as newly resolved patterns of micro-endemism revealed through expanded field sampling into previously poorly and well-sampled regions.
Resumo:
Today, the only surviving wild population of giant tortoises in the Indian Ocean occurs on the island of Aldabra. However, giant tortoises once inhabited islands throughout the western Indian Ocean. Madagascar, Africa, and India have all been suggested as possible sources of colonization for these islands. To address the origin of Indian Ocean tortoises (Dipsochelys, formerly Geochelone gigantea), we sequenced the 12S, 16S, and cyt b genes of the mitochondrial DNA. Our phylogenetic analysis shows Dipsochelys to be embedded within the Malagasy lineage, providing evidence that Indian Ocean giant tortoises are derived from a common Malagasy ancestor. This result points to Madagascar as the source of colonization for western Indian Ocean islands by giant tortoises. Tortoises are known to survive long oceanic voyages by floating with ocean currents, and thus, currents flowing northward towards the Aldabra archipelago from the east coast of Madagascar would have provided means for the colonization of western Indian Ocean islands. Additionally, we found an accelerated rate of sequence evolution in the two Malagasy Pyxis species examined. This finding supports previous theories that shorter generation time and smaller body size are related to an increase in mitochondrial DNA substitution rate in vertebrates.
Resumo:
New applications of genetic data to questions of historical biogeography have revolutionized our understanding of how organisms have come to occupy their present distributions. Phylogenetic methods in combination with divergence time estimation can reveal biogeographical centres of origin, differentiate between hypotheses of vicariance and dispersal, and reveal the directionality of dispersal events. Despite their power, however, phylogenetic methods can sometimes yield patterns that are compatible with multiple, equally well-supported biogeographical hypotheses. In such cases, additional approaches must be integrated to differentiate among conflicting dispersal hypotheses. Here, we use a synthetic approach that draws upon the analytical strengths of coalescent and population genetic methods to augment phylogenetic analyses in order to assess the biogeographical history of Madagascar's Triaenops bats (Chiroptera: Hipposideridae). Phylogenetic analyses of mitochondrial DNA sequence data for Malagasy and east African Triaenops reveal a pattern that equally supports two competing hypotheses. While the phylogeny cannot determine whether Africa or Madagascar was the centre of origin for the species investigated, it serves as the essential backbone for the application of coalescent and population genetic methods. From the application of these methods, we conclude that a hypothesis of two independent but unidirectional dispersal events from Africa to Madagascar is best supported by the data.
Association between DNA damage response and repair genes and risk of invasive serous ovarian cancer.
Resumo:
BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results.
Resumo:
BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Resumo:
Given a probability distribution on an open book (a metric space obtained by gluing a disjoint union of copies of a half-space along their boundary hyperplanes), we define a precise concept of when the Fréchet mean (barycenter) is sticky. This nonclassical phenomenon is quantified by a law of large numbers (LLN) stating that the empirical mean eventually almost surely lies on the (codimension 1 and hence measure 0) spine that is the glued hyperplane, and a central limit theorem (CLT) stating that the limiting distribution is Gaussian and supported on the spine.We also state versions of the LLN and CLT for the cases where the mean is nonsticky (i.e., not lying on the spine) and partly sticky (i.e., is, on the spine but not sticky). © Institute of Mathematical Statistics, 2013.
Resumo:
The time reversal of stochastic diffusion processes is revisited with emphasis on the physical meaning of the time-reversed drift and the noise prescription in the case of multiplicative noise. The local kinematics and mechanics of free diffusion are linked to the hydrodynamic description. These properties also provide an interpretation of the Pope-Ching formula for the steady-state probability density function along with a geometric interpretation of the fluctuation-dissipation relation. Finally, the statistics of the local entropy production rate of diffusion are discussed in the light of local diffusion properties, and a stochastic differential equation for entropy production is obtained using the Girsanov theorem for reversed diffusion. The results are illustrated for the Ornstein-Uhlenbeck process.
Resumo:
We recently developed an approach for testing the accuracy of network inference algorithms by applying them to biologically realistic simulations with known network topology. Here, we seek to determine the degree to which the network topology and data sampling regime influence the ability of our Bayesian network inference algorithm, NETWORKINFERENCE, to recover gene regulatory networks. NETWORKINFERENCE performed well at recovering feedback loops and multiple targets of a regulator with small amounts of data, but required more data to recover multiple regulators of a gene. When collecting the same number of data samples at different intervals from the system, the best recovery was produced by sampling intervals long enough such that sampling covered propagation of regulation through the network but not so long such that intervals missed internal dynamics. These results further elucidate the possibilities and limitations of network inference based on biological data.