6 resultados para Molecular Sequence Data.
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
Resumo:
Several approaches have been developed to estimate both the relative and absolute rates of speciation and extinction within clades based on molecular phylogenetic reconstructions of evolutionary relationships, according to an underlying model of diversification. However, the macroevolutionary models established for eukaryotes have scarcely been used with prokaryotes. We have investigated the rate and pattern of cladogenesis in the genus Aeromonas (γ-Proteobacteria, Proteobacteria, Bacteria) using the sequences of five housekeeping genes and an uncorrelated relaxed-clock approach. To our knowledge, until now this analysis has never been applied to all the species described in a bacterial genus and thus opens up the possibility of establishing models of speciation from sequence data commonly used in phylogenetic studies of prokaryotes. Our results suggest that the genus Aeromonas began to diverge between 248 and 266 million years ago, exhibiting a constant divergence rate through the Phanerozoic, which could be described as a pure birth process.
Resumo:
Several approaches have been developed to estimate both the relative and absolute rates of speciation and extinction within clades based on molecular phylogenetic reconstructions of evolutionary relationships, according to an underlying model of diversification. However, the macroevolutionary models established for eukaryotes have scarcely been used with prokaryotes. We have investigated the rate and pattern of cladogenesis in the genus Aeromonas (γ-Proteobacteria, Proteobacteria, Bacteria) using the sequences of five housekeeping genes and an uncorrelated relaxed-clock approach. To our knowledge, until now this analysis has never been applied to all the species described in a bacterial genus and thus opens up the possibility of establishing models of speciation from sequence data commonly used in phylogenetic studies of prokaryotes. Our results suggest that the genus Aeromonas began to diverge between 248 and 266 million years ago, exhibiting a constant divergence rate through the Phanerozoic, which could be described as a pure birth process.
Resumo:
In recent years, new analytical tools have allowed researchers to extract historical information contained in molecular data, which has fundamentally transformed our understanding of processes ruling biological invasions. However, the use of these new analytical tools has been largely restricted to studies of terrestrial organisms despite the growing recognition that the sea contains ecosystems that are amongst the most heavily affected by biological invasions, and that marine invasion histories are often remarkably complex. Here, we studied the routes of invasion and colonisation histories of an invasive marine invertebrate Microcosmus squamiger (Ascidiacea) using microsatellite loci, mitochondrial DNA sequence data and 11 worldwide populations. Discriminant analysis of principal components, clustering methods and approximate Bayesian computation (ABC) methods showed that the most likely source of the introduced populations was a single admixture event that involved populations from two genetically differentiated ancestral regions - the western and eastern coasts of Australia. The ABC analyses revealed that colonisation of the introduced range of M. squamiger consisted of a series of non-independent introductions along the coastlines of Africa, North America and Europe. Furthermore, we inferred that the sequence of colonisation across continents was in line with historical taxonomic records - first the Mediterranean Sea and South Africa from an unsampled ancestral population, followed by sequential introductions in California and, more recently, the NE Atlantic Ocean. We revealed the most likely invasion history for world populations of M. squamiger, which is broadly characterized by the presence of multiple ancestral sources and non-independent introductions within the introduced range. The results presented here illustrate the complexity of marine invasion routes and identify a cause-effect relationship between human-mediated transport and the success of widespread marine non-indigenous species, which benefit from stepping-stone invasions and admixture processes involving different sources for the spread and expansion of their range.
Resumo:
Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.
Resumo:
DnaSP, DNA Sequence Polymorphism, is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DnaSP can estimate several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites, or in various sorts of codon positions), as well as linkage disequilibrium, recombination, gene flow and gene conversion parameters. DnaSP can also carry out several tests of neutrality: Hudson, Kreitman and Aguadé (1987), Tajima (1989), McDonald and Kreitman (1991), Fu and Li (1993), and Fu (1997) tests. Additionally, DnaSP can estimate the confidence intervals of some test-statistics by the coalescent. The results of the analyses are displayed on tabular and graphic form.