894 resultados para SEQUENCE DATABASES
Resumo:
Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.
Resumo:
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
Resumo:
Background: Single nucleotide polymorphisms (SNPs) are the most frequent type of sequence variation between individuals, and represent a promising tool for finding genetic determinants of complex diseases and understanding the differences in drug response. In this regard, it is of particular interest to study the effect of non-synonymous SNPs in the context of biological networks such as cell signalling pathways. UniProt provides curated information about the functional and phenotypic effects of sequence variation, including SNPs, as well as on mutations of protein sequences. However, no strategy has been developed to integrate this information with biological networks, with the ultimate goal of studying the impact of the functional effect of SNPs in the structure and dynamics of biological networks. Results: First, we identified the different challenges posed by the integration of the phenotypic effect of sequence variants and mutations with biological networks. Second, we developed a strategy for the combination of data extracted from public resources, such as UniProt, NCBI dbSNP, Reactome and BioModels. We generated attribute files containing phenotypic and genotypic annotations to the nodes of biological networks, which can be imported into network visualization tools such as Cytoscape. These resources allow the mapping and visualization of mutations and natural variations of human proteins and their phenotypic effect on biological networks (e.g. signalling pathways, protein-protein interaction networks, dynamic models). Finally, an example on the use of the sequence variation data in the dynamics of a network model is presented. Conclusion: In this paper we present a general strategy for the integration of pathway and sequence variation data for visualization, analysis and modelling purposes, including the study of the functional impact of protein sequence variations on the dynamics of signalling pathways. This is of particular interest when the SNP or mutation is known to be associated to disease. We expect that this approach will help in the study of the functional impact of disease-associated SNPs on the behaviour of cell signalling pathways, which ultimately will lead to a better understanding of the mechanisms underlying complex diseases.
Resumo:
Background: A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners. Results: The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST. Conclusion: Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.
Resumo:
Shrews of the genus Sorex are characterized by a Holarctic distribution, and relationships among extant taxa have never been fully resolved. Phylogenies have been proposed based on morphological, karyological, and biochemical comparisons, but these analyses often produced controversial and contradictory results. Phylogenetic analyses of partial mitochondrial cytochrome b gene sequences (1011 bp) were used to examine the relationships among 27 Sorex species. The molecular data suggest that Sorex comprises two major monophyletic lineages, one restricted mostly to the New World and one with a primarily Palearctic distribution. Furthermore, several sister-species relationships are revealed by the analysis. Based on the split between the Soricinae and Crocidurinae subfamilies, we used a 95% confidence interval for both the calibration of a molecular clock and the subsequent calculation of major diversification events within the genus Sorex. Our analysis does not support an unambiguous acceleration of the molecular clock in shrews, the estimated rate being similar to other estimates of mammalian mitochondrial clocks. In addition, the data presented here indicate that estimates from the fossil record greatly underestimate divergence dates among Sorex taxa.
Resumo:
The bacterial insertion sequence IS21 shares with many insertion sequences a two-step, reactive junction transposition pathway, for which a model is presented in this review: a reactive junction with abutted inverted repeats is first formed and subsequently integrated into the target DNA. The reactive junction occurs in IS21-IS21 tandems and IS21 minicircles. In addition, IS21 shows a unique specialization of transposition functions. By alternative translation initiation, the transposase gene codes for two products: the transposase, capable of promoting both steps of the reactive junction pathway, and the cointegrase, which only promotes the integration of reactive junctions but with higher efficiency. This review also includes a survey of the IS21 family and speculates on the possibility that other members present a similar transpositional specialization.
Resumo:
The Mountain Research Initiative invited Dr Eva Spehn, Director of the Global Mountain Biodiversity Assessment (GMBA), and Dr Antoine Guisan, head of the Spatial Ecology Group at the University of Lausanne, to introduce the reader to their coordinated efforts to advance understanding and prediction of mountain biodiversity. Antoine Guisan's EUROMONT project is one of the many scientific projects that may potentially provide data for the new GMBA initiative for a GIS mountain biodiversity database.
Resumo:
Epidemiological processes leave a fingerprint in the pattern of genetic structure of virus populations. Here, we provide a new method to infer epidemiological parameters directly from viral sequence data. The method is based on phylogenetic analysis using a birth-death model (BDM) rather than the commonly used coalescent as the model for the epidemiological transmission of the pathogen. Using the BDM has the advantage that transmission and death rates are estimated independently and therefore enables for the first time the estimation of the basic reproductive number of the pathogen using only sequence data, without further assumptions like the average duration of infection. We apply the method to genetic data of the HIV-1 epidemic in Switzerland.
Resumo:
In contrast with mammals and birds, most poikilothermic vertebrates feature structurally undifferentiated sex chromosomes, which may result either from frequent turnovers, or from occasional events of XY recombination. The latter mechanism was recently suggested to be responsible for sex-chromosome homomorphy in European tree frogs (Hyla arborea). However, no single case of male recombination has been identified in large-scale laboratory crosses, and populations from NW Europe consistently display sex-specific allelic frequencies with male-diagnostic alleles, suggesting the absence of recombination in their recent history. To address this apparent paradox, we extended the phylogeographic scope of investigations, by analyzing the sequences of three sex-linked markers throughout the whole species distribution. Refugial populations (southern Balkans and Adriatic coast) show a mix of X and Y alleles in haplotypic networks, and no more within-individual pairwise nucleotide differences in males than in females, testifying to recurrent XY recombination. In contrast, populations of NW Europe, which originated from a recent postglacial expansion, show a clear pattern of XY differentiation; the X and Y gametologs of the sex-linked gene Med15 present different alleles, likely fixed by drift on the front wave of expansions, and kept differentiated since. Our results support the view that sex-chromosome homomorphy in H. arborea is maintained by occasional or historical events of recombination; whether the frequency of these events indeed differs between populations remains to be clarified.
Resumo:
Microtubule plus-end-tracking proteins (+TIPs) specifically localize to the growing plus-ends of microtubules to regulate microtubule dynamics and functions. A large group of +TIPs contain a short linear motif, SXIP, which is essential for them to bind to end-binding proteins (EBs) and target microtubule ends. The SXIP sequence site thus acts as a widespread microtubule tip localization signal (MtLS). Here we have analyzed the sequence-function relationship of a canonical MtLS. Using synthetic peptide arrays on membrane supports, we identified the residue preferences at each amino acid position of the SXIP motif and its surrounding sequence with respect to EB binding. We further developed an assay based on fluorescence polarization to assess the mechanism of the EB-SXIP interaction and to correlate EB binding and microtubule tip tracking of MtLS sequences from different +TIPs. Finally, we investigated the role of phosphorylation in regulating the EB-SXIP interaction. Together, our results define the sequence determinants of a canonical MtLS and provide the experimental data for bioinformatics approaches to carry out genome-wide predictions of novel +TIPs in multiple organisms.
Resumo:
In coronary magnetic resonance angiography, a magnetization-preparation scheme for T2 -weighting (T2 Prep) is widely used to enhance contrast between the coronary blood-pool and the myocardium. This prepulse is commonly applied without spatial selection to minimize flow sensitivity, but the nonselective implementation results in a reduced magnetization of the in-flowing blood and a related penalty in signal-to-noise ratio. It is hypothesized that a spatially selective T2 Prep would leave the magnetization of blood outside the T2 Prep volume unaffected and thereby lower the signal-to-noise ratio penalty. To test this hypothesis, a spatially selective T2 Prep was implemented where the user could freely adjust angulation and position of the T2 Prep slab to avoid covering the ventricular blood-pool and saturating the in-flowing spins. A time gap of 150 ms was further added between the T2 Prep and other prepulses to allow for in-flow of a larger volume of unsaturated spins. Consistent with numerical simulation, the spatially selective T2 Prep increased in vivo human coronary artery signal-to-noise ratio (42.3 ± 2.9 vs. 31.4 ± 2.2, n = 22, P < 0.0001) and contrast-to-noise-ratio (18.6 ± 1.5 vs. 13.9 ± 1.2, P = 0.009) as compared to those of the nonselective T2 Prep. Additionally, a segmental analysis demonstrated that the spatially selective T2 Prep was most beneficial in proximal and mid segments where the in-flowing blood volume was largest compared to the distal segments. Magn Reson Med, 2013. © 2012 Wiley Periodicals, Inc.
Resumo:
The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.
Resumo:
The large spatial inhomogeneity in transmit B(1) field (B(1)(+)) observable in human MR images at high static magnetic fields (B(0)) severely impairs image quality. To overcome this effect in brain T(1)-weighted images, the MPRAGE sequence was modified to generate two different images at different inversion times, MP2RAGE. By combining the two images in a novel fashion, it was possible to create T(1)-weighted images where the result image was free of proton density contrast, T(2) contrast, reception bias field, and, to first order, transmit field inhomogeneity. MP2RAGE sequence parameters were optimized using Bloch equations to maximize contrast-to-noise ratio per unit of time between brain tissues and minimize the effect of B(1)(+) variations through space. Images of high anatomical quality and excellent brain tissue differentiation suitable for applications such as segmentation and voxel-based morphometry were obtained at 3 and 7 T. From such T(1)-weighted images, acquired within 12 min, high-resolution 3D T(1) maps were routinely calculated at 7 T with sub-millimeter voxel resolution (0.65-0.85 mm isotropic). T(1) maps were validated in phantom experiments. In humans, the T(1) values obtained at 7 T were 1.15+/-0.06 s for white matter (WM) and 1.92+/-0.16 s for grey matter (GM), in good agreement with literature values obtained at lower spatial resolution. At 3 T, where whole-brain acquisitions with 1 mm isotropic voxels were acquired in 8 min, the T(1) values obtained (0.81+/-0.03 s for WM and 1.35+/-0.05 for GM) were once again found to be in very good agreement with values in the literature.