891 resultados para sequence database
Resumo:
Background: It has been suggested that chromosomal rearrangements harbor the molecular footprint of the biological phenomena which they induce, in the form, for instance, of changes in the sequence divergence rates of linked genes. So far, all the studies of these potential associations have focused on the relationship between structural changes and the rates of evolution of single-copy DNA and have tried to exclude segmental duplications (SDs). This is paradoxical, since SDs are one of the primary forces driving the evolution of structure and function in our genomes and have been linked not only with novel genes acquiring new functions, but also with overall higher DNA sequence divergence and major chromosomal rearrangements.Results: Here we take the opposite view and focus on SDs. We analyze several of the features of SDs, including the rates of intraspecific divergence between paralogous copies of human SDs and of interspecific divergence between human SDs and chimpanzee DNA. We study how divergence measures relate to chromosomal rearrangements, while considering other factors that affect evolutionary rates in single copy DNA. Conclusion: We find that interspecific SD divergence behaves similarly to divergence of single-copy DNA. In contrast, old and recent paralogous copies of SDs do present different patterns of intraspecific divergence. Also, we show that some relatively recent SDs accumulate in regions that carry inversions in sister lineages.
Resumo:
TWEAK (TNF homologue with weak apoptosis-inducing activity) and Fn14 (fibroblast growth factor-inducible protein 14) are members of the tumor necrosis factor (TNF) ligand and receptor super-families. Having observed that Xenopus Fn14 cross-reacts with human TWEAK, despite its relatively low sequence homology to human Fn14, we examined the conservation in tertiary fold and binding interfaces between the two species. Our results, combining NMR solution structure determination, binding assays, extensive site-directed mutagenesis and molecular modeling, reveal that, in addition to the known and previously characterized β-hairpin motif, the helix-loop-helix motif makes an essential contribution to the receptor/ligand binding interface. We further discuss the insight provided by the structural analyses regarding how the cysteine-rich domains of the TNF receptor super-family may have evolved over time. DATABASE: Structural data are available in the Protein Data Bank/BioMagResBank databases under the accession codes 2KMZ, 2KN0 and 2KN1 and 17237, 17247 and 17252. STRUCTURED DIGITAL ABSTRACT: TWEAK binds to hFn14 by surface plasmon resonance (View interaction) xeFn14 binds to TWEAK by enzyme linked immunosorbent assay (View interaction) TWEAK binds to xeFn14 by surface plasmon resonance (View interaction) hFn14 binds to TWEAK by enzyme linked immunosorbent assay (View interaction).
Resumo:
The Quaternary Active Faults Database of Iberia (QAFI) is an initiative lead by the Institute of Geology and Mines of Spain (IGME) for building a public repository of scientific data regarding faults having documented activity during the last 2.59 Ma (Quaternary). QAFI also addresses a need to transfer geologic knowledge to practitioners of seismic hazard and risk in Iberia by identifying and characterizing seismogenic fault-sources. QAFI is populated by the information freely provided by more than 40 Earth science researchers, storing to date a total of 262 records. In this article we describe the development and evolution of the database, as well as its internal architecture. Aditionally, a first global analysis of the data is provided with a special focus on length and slip-rate fault parameters. Finally, the database completeness and the internal consistency of the data are discussed. Even though QAFI v.2.0 is the most current resource for calculating fault-related seismic hazard in Iberia, the database is still incomplete and requires further review.
Resumo:
Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels.
Resumo:
The objective of this work was to standardize a semiautomated method for genotyping soybean, based on universal tail sequence primers (UTSP), and to compare it with the conventional genotyping method that uses electrophoresis in polyacrylamide gels. Thirty soybean cultivars were genotypically characterized by both methods, using 13 microsatellite loci. For the UTSP method, the number of alleles (NA) was 50 (2-7 per marker) and the polymorphic information content (PIC) ranged from 0.40 to 0.74. For the conventional method, the NA was 38 (2-5 per marker) and the PIC varied from 0.39 to 0.67. The genetic dissimilarity matrices obtained by the two methods were highly correlated with each other (0.8026), and the formed groups were coherent with the phenotypic data used for varietal registration. The 13 markers allowed the distinction of all analyzed cultivars. The low cost of the UTSP method, associated with its high accuracy, makes it ideal for the characterization of soybean cultivars and for the determination of genetic purity.
Resumo:
BACKGROUND: Several European HIV observational data bases have, over the last decade, accumulated a substantial number of resistance test results and developed large sample repositories, There is a need to link these efforts together, We here describe the development of such a novel tool that allows to bind these data bases together in a distributed fashion for which the control and data remains with the cohorts rather than classic data mergers.METHODS: As proof-of-concept we entered two basic queries into the tool: available resistance tests and available samples. We asked for patients still alive after 1998-01-01, and between 180 and 195 cm of height, and how many samples or resistance tests there would be available for these patients, The queries were uploaded with the tool to a central web server from which each participating cohort downloaded the queries with the tool and ran them against their database, The numbers gathered were then submitted back to the server and we could accumulate the number of available samples and resistance tests.RESULTS: We obtained the following results from the cohorts on available samples/resistance test: EuResist: not availableI11,194; EuroSIDA: 20,71611,992; ICONA: 3,751/500; Rega: 302/302; SHCS: 53,78311,485, In total, 78,552 samples and 15,473 resistance tests were available amongst these five cohorts. Once these data items have been identified, it is trivial to generate lists of relevant samples that would be usefuI for ultra deep sequencing in addition to the already available resistance tests, Saon the tool will include small analysis packages that allow each cohort to pull a report on their cohort profile and also survey emerging resistance trends in their own cohort,CONCLUSIONS: We plan on providing this tool to all cohorts within the Collaborative HIV and Anti-HIV Drug Resistance Network (CHAIN) and will provide the tool free of charge to others for any non-commercial use, The potential of this tool is to ease collaborations, that is, in projects requiring data to speed up identification of novel resistance mutations by increasing the number of observations across multiple cohorts instead of awaiting single cohorts or studies to reach the critical number needed to address such issues.
Resumo:
Dermatophytes are the most common agents of superficial mycoses, and exclusively infect stratum corneum, nails or hair. Therefore, secreted proteolytic activity is considered a virulence trait of these fungi. In a medium containing protein as a sole nitrogen and carbon source Trichophyton rubrum secretes a metallocarboxypeptidase (TruMcpA) of the M14 family according to the MEROPS proteolytic enzyme database. TruMcpA is homologous to human pancreatic carboxypeptidase A, and is synthesized as a precursor in a preproprotein form. The propeptide is removed to generate the mature active enzyme alternatively by either one of two subtilisins which are concomitantly secreted by the fungus. In addition, T. rubrum was shown to possess two genes (TruSCPA and TruSCPB) encoding serine carboxypeptidases of the S10 family which are homologues of the previously characterized Aspergillus and Penicillium secreted acid carboxypeptidases. However, in contrast to the Aspergillus and Penicillium homologues, TruScpA and TruScpB enzymes are not secreted into the environment, but are membrane-associated with a glycosylphosphatidylinositol (GPI) anchor. During infection, T. rubrum secreted and GPI-anchored carboxypeptidases may contribute to fungal virulence by cooperating with previously characterized endoproteases and aminopeptidases in the degradation of compact keratinized tissues into assimilable amino acids and short peptides.
Resumo:
Bacillus subtilis is the best-characterized member of the Gram-positive bacteria. Its genome of 4,214,810 base pairs comprises 4,100 protein-coding genes. Of these protein-coding genes, 53% are represented once, while a quarter of the genome corresponds to several gene families that have been greatly expanded by gene duplication, the largest family containing 77 putative ATP-binding transport proteins. In addition, a large proportion of the genetic capacity is devoted to the utilization of a variety of carbon sources, including many plant-derived molecules. The identification of five signal peptidase genes, as well as several genes for components of the secretion apparatus, is important given the capacity of Bacillus strains to secrete large amounts of industrially important enzymes. Many of the genes are involved in the synthesis of secondary metabolites, including antibiotics, that are more typically associated with Streptomyces species. The genome contains at least ten prophages or remnants of prophages, indicating that bacteriophage infection has played an important evolutionary role in horizontal gene transfer, in particular in the propagation of bacterial pathogenesis.
Resumo:
Abstract One of the most important issues in molecular biology is to understand regulatory mechanisms that control gene expression. Gene expression is often regulated by proteins, called transcription factors which bind to short (5 to 20 base pairs),degenerate segments of DNA. Experimental efforts towards understanding the sequence specificity of transcription factors is laborious and expensive, but can be substantially accelerated with the use of computational predictions. This thesis describes the use of algorithms and resources for transcriptionfactor binding site analysis in addressing quantitative modelling, where probabilitic models are built to represent binding properties of a transcription factor and can be used to find new functional binding sites in genomes. Initially, an open-access database(HTPSELEX) was created, holding high quality binding sequences for two eukaryotic families of transcription factors namely CTF/NF1 and LEFT/TCF. The binding sequences were elucidated using a recently described experimental procedure called HTP-SELEX, that allows generation of large number (> 1000) of binding sites using mass sequencing technology. For each HTP-SELEX experiments we also provide accurate primary experimental information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, and assembled clone sequences of binding sequences. The database also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols.The database is available at http://wwwisrec.isb-sib.ch/htpselex/ and and ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex. The Expectation-Maximisation(EM) algorithm is one the frequently used methods to estimate probabilistic models to represent the sequence specificity of transcription factors. We present computer simulations in order to estimate the precision of EM estimated models as a function of data set parameters(like length of initial sequences, number of initial sequences, percentage of nonbinding sequences). We observed a remarkable robustness of the EM algorithm with regard to length of training sequences and the degree of contamination. The HTPSELEX database and the benchmarked results of the EM algorithm formed part of the foundation for the subsequent project, where a statistical framework called hidden Markov model has been developed to represent sequence specificity of the transcription factors CTF/NF1 and LEF1/TCF using the HTP-SELEX experiment data. The hidden Markov model framework is capable of both predicting and classifying CTF/NF1 and LEF1/TCF binding sites. A covariance analysis of the binding sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism. We next tested the LEF1/TCF model by computing binding scores for a set of LEF1/TCF binding sequences for which relative affinities were determined experimentally using non-linear regression. The predicted and experimentally determined binding affinities were in good correlation.
Resumo:
Diplomityössä on tutkittu reaaliaikaisen toimintolaskennan toteuttamista suomalaisen lasersiruja valmistavan PK-yrityksen tietojärjestelmään. Lisäksi on tarkasteltu toimintolaskennan vaikutuksia operatiiviseen toimintaan sekä toimintojen johtamiseen. Työn kirjallisuusosassa on käsitelty kirjallisuuslähteiden perusteella toimintolaskennan teorioita, laskentamenetelmiä sekä teknisessä toteutuksessa käytettyjä teknologioita. Työn toteutusosassa suunniteltiin ja toteutettiin WWW-pohjainen toimintolaskentajärjestelmä case-yrityksen kustannuslaskennan sekä taloushallinnon avuksi. Työkalu integroitiin osaksi yrityksen toiminnanohjaus- sekä valmistuksenohjausjärjestelmää. Perinteisiin toimintolaskentamallien tiedonkeruujärjestelmiin verrattuna case-yrityksessä syötteet toimintolaskentajärjestelmälle tulevat reaaliaikaisesti osana suurempaa tietojärjestelmäintegraatiota.Diplomityö pyrkii luomaan suhteen toimintolaskennan vaatimusten ja tietokantajärjestelmien välille. Toimintolaskentajärjestelmää yritys voi hyödyntää esimerkiksi tuotteiden hinnoittelussa ja kustannuslaskennassa näkemällä tuotteisiin liittyviä kustannuksia eri näkökulmista. Päätelmiä voidaan tehdä tarkkaan kustannusinformaatioon perustuen sekä määrittää järjestelmän tuottaman datan perusteella, onko tietyn projektin, asiakkuuden tai tuotteen kehittäminen taloudellisesti kannattavaa.
Resumo:
The numerous yeast genome sequences presently available provide a rich source of information for functional as well as evolutionary genomics but unequally cover the large phylogenetic diversity of extant yeasts. We present here the complete sequence of the nuclear genome of the haploid-type strain of Kuraishia capsulata (CBS1993(T)), a nitrate-assimilating Saccharomycetales of uncertain taxonomy, isolated from tunnels of insect larvae underneath coniferous barks and characterized by its copious production of extracellular polysaccharides. The sequence is composed of seven scaffolds, one per chromosome, totaling 11.4 Mb and containing 6,029 protein-coding genes, ~13.5% of which being interrupted by introns. This GC-rich yeast genome (45.7%) appears phylogenetically related with the few other nitrate-assimilating yeasts sequenced so far, Ogataea polymorpha, O. parapolymorpha, and Dekkera bruxellensis, with which it shares a very reduced number of tRNA genes, a novel tRNA sparing strategy, and a common nitrate assimilation cluster, three specific features to this group of yeasts. Centromeres were recognized in GC-poor troughs of each scaffold. The strain bears MAT alpha genes at a single MAT locus and presents a significant degree of conservation with Saccharomyces cerevisiae genes, suggesting that it can perform sexual cycles in nature, although genes involved in meiosis were not all recognized. The complete absence of conservation of synteny between K. capsulata and any other yeast genome described so far, including the three other nitrate-assimilating species, validates the interest of this species for long-range evolutionary genomic studies among Saccharomycotina yeasts.
Resumo:
Phylogenetic trees representing the evolutionary relationships of homologous genes are the entry point for many evolutionary analyses. For instance, the use of a phylogenetic tree can aid in the inference of orthology and paralogy relationships, and in the detection of relevant evolutionary events such as gene family expansions and contractions, horizontal gene transfer, recombination or incomplete lineage sorting. Similarly, given the plurality of evolutionary histories among genes encoded in a given genome, there is a need for the combined analysis of genome-wide collections of phylogenetic trees (phylomes). Here, we introduce a new release of PhylomeDB (http://phylomedb.org), a public repository of phylomes. Currently, PhylomeDB hosts 120 public phylomes, comprising >1.5 million maximum likelihood trees and multiple sequence alignments. In the current release, phylogenetic trees are annotated with taxonomic, protein-domain arrangement, functional and evolutionary information. PhylomeDB is also a major source for phylogeny-based predictions of orthology and paralogy, covering >10 million proteins across 1059 sequenced species. Here we describe newly implemented PhylomeDB features, and discuss a benchmark of the orthology predictions provided by the database, the impact of proteome updates and the use of the phylome approach in the analysis of newly sequenced genomes and transcriptomes.
Resumo:
With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
Resumo:
L'introduction des technologies de séquençage de nouvelle génération est en vue de révolutionner la médecine moderne. L'impact de ces nouveaux outils a déjà contribué à la découverte de nouveaux gènes et de voies cellulaires impliqués dans la pathologie de maladies génétiques rares ou communes. En revanche, l'énorme quantité de données générées par ces systèmes ainsi que la complexité des analyses bioinformatiques nécessaires, engendre un goulet d'étranglement pour résoudre les cas les plus difficiles. L'objectif de cette thèse a été d'identifier les causes génétiques de deux maladies héréditaires utilisant ces nouvelles techniques de séquençage, couplées à des technologies d'enrichissement de gènes. Dans ce cadre, nous avons développé notre propre méthode de travail (pipeline) pour l'alignement des fragments de séquence (reads). Suite à l'identification de gènes, nous avons réalisé une analyse fonctionnelle pour élucider leur rôle dans la maladie. Dans un premier temps, nous avons étudié et identifié des mutations impliquées dans une forme récessive de la rétinite pigmentaire qui est à ce jour la dégénérescence rétinienne héréditaire la plus fréquente. En particulier, nous avons constaté que des mutations faux-sens dans le gène FAM161A étaient la cause de la rétinite pigmentaire préalablement associé avec le locus RP28. De plus, nous avons démontré que ce gène avait des fonctions au niveau du cil du photorécepteur, complétant le large spectre des cilliopathies rétiniennes héréditaires. Dans un second temps, nous avons exploré la possibilité qu'un syndrome, relativement fréquent en pédiatrie de fièvre récurrente, appelé PFAPA (acronyme de fièvre périodique avec adénite stomatite, pharyngite et cervical aphteuse) puisse avoir une origine génétique. L'étiologie de cette maladie n'étant pas claire, nous avons tenté d'identifier le spectre génétique de patients PFAPA. Comme nous n'avons pas pu mettre à jour un nouveau gène unique muté et responsable de la maladie chez tous les individus dépistés, il semblerait qu'un modèle génétique plus complexe suggérant l'implication de plusieurs gènes dans la pathologie ait été identifié chez les patients touchés. Ces gènes seraient notamment impliqués dans des processus liés à l'inflammation ce qui élargirait l'impact de ces études à d'autres maladies auto-inflammatoires.