984 resultados para Molecular Sequence Annotation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background and aims Recent studies have adopted a broad definition of Sapindaceae that includes taxa traditionally placed in Aceraceae and Hippocastanaceae, achieving monophyly but yielding a family difficult to characterize and for which no obvious morphological synapomorphy exists. This expanded circumscription was necessitated by the finding that the monotypic, temperate Asian genus Xanthoceras, historically placed in Sapindaceae tribe Harpullieae, is basal within the group. Here we seek to clarify the relationships of Xanthoceras based on phylogenetic analyses using a dataset encompassing nearly 3/4 of sapindaceous genera, comparing the results with information from morphology and biogeography, in particular with respect to the other taxa placed in Harpullieae. We then re-examine the appropriateness of maintaining the current broad, morphologically heterogeneous definition of Sapindaceae and explore the advantages of an alternative family circumscription. Methods Using 243 samples representing 104 of the 142 currently recognized genera of Sapindaceae s. lat. (including all in Harpullieae), sequence data were analyzed for nuclear (ITS) and plastid (matK, rpoB, trnD-trnT, trnK-matK, trnL-trnF and trnS-trnG) markers, adopting the methodology of a recent family-wide study, performing single-gene and total evidence analyses based on maximum likelihood (ML) and maximum parsimony (MP) criteria, and applying heuristic searches developed for large datasets, viz, a new strategy implemented in RAxML (for ML) and the parsimony ratchet (for MP). Bootstrap analyses were performed for each method to test for congruence between markers. Key results Our findings support earlier suggestions that Harpullieae are polyphyletic: Xanthoceras is confirmed as sister to all other sampled taxa of Sapindaceae s. lat.; the remaining members belong to three other clades within Sapindaceae s. lat., two of which correspond respectively to the groups traditionally treated as Aceraceae and Hippocastanaceae, together forming a clade sister to the largely tropical Sapindaceae s. str., which is monophyletic and morphologically coherent provided Xanthoceras is excluded. Conclusion To overcome the difficulties of a broadly circumscribed Sapindaceae, we resurrect the historically recognized temperate families Aceraceae and Hippocastanaceae, and describe a new family, Xanthoceraceae, thus adopting a monophyletic and easily characterized circumscription of Sapindaceae nearly identical to that used for over a century.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the basic building block for the assembly of larger and more complex pathways, including species-specific pathway variants. All of the pathway data in UniPathway has been extensively cross-linked to existing pathway resources such as KEGG and MetaCyc, as well as sequence resources such as the UniProt KnowledgeBase (UniProtKB), for which UniPathway provides a controlled vocabulary for pathway annotation. We introduce here the basic concepts underlying the UniPathway resource, with the aim of allowing users to fully exploit the information provided by UniPathway.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to Mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5'-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP-chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community (http://fantom.gsc.riken.jp/4/). Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ability of Mycobacterium tuberculosis to establish a latent infection (LTBI) in humans confounds the treatment of tuberculosis. Consequently, there is a need to discover new therapeutic agents that can kill M. tuberculosis both during active disease and LTBI. The streptomycin-dependent strain of M. tuberculosis, 18b, provides a useful tool for this purpose since upon removal of streptomycin (STR) it enters a non-replicating state that mimics latency both in vitro and in animal models. The 4.41 Mb genome sequence of M. tuberculosis 18b was determined and this revealed the strain to belong to clade 3 of the ancient ancestral lineage of the Beijing family. STR-dependence was attributable to insertion of a single cytosine in the 530 loop of the 16S rRNA and to a single amino acid insertion in the N-terminal domain of initiation factor 3. RNA-seq was used to understand the genetic programme activated upon STR-withdrawal and hence to gain insight into LTBI. This revealed reconfiguration of gene expression and metabolic pathways showing strong similarities between non-replicating 18b and M. tuberculosis residing within macrophages, and with the core stationary phase and microaerophilic responses. The findings of this investigation confirm the validity of 18b as a model for LTBI, and provide insight into both the evolution of tubercle bacilli and the functioning of the ribosome.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While the influence of HLA-AB and -DRB1 matching on the outcome of bone marrow transplantation (BMT) with unrelated donors is clear, the evaluation of HLA-C has been hampered by its poor serological definition. Because the low resolution of standard HLA-C typing could explain the significant number of positive cytotoxic T lymphocyte precursor frequency (CTLpf) tests found among HLA-AB-subtype, DRB1/B3/B5-subtype matched patient/donor pairs, we have identified by sequencing the incompatibilities recognized by CD8+ CTL clones obtained from such positive CTLpf tests. In most cases the target molecules were HLA-C antigens that had escaped detection by serology (e.g. Cw*1601, 1502 or 0702). Direct recognition of HLA-C by a CTL clone was demonstrated by lysis of the HLA class I-negative 721.221 cell line transfected with Cw*1601 cDNA. Because of the functional importance of Cw polymorphism, a PCR-SSO oligotyping procedure was set up allowing the resolution of 29 Cw alleles. Oligotyping of a panel of 382 individuals (including 101 patients and their 272 potential unrelated donors, 5 related donors and 4 platelet donors) allowed to determine HLA-C and HLA A-B-Cw-DRB1 allelic frequencies, as well as a number of A-Cw, B-Cw, and DRB1-Cw associations. Two new HLA-Cw alleles (Cw*02023 and Cw*0707) were identified by DNA sequencing of PCR-amplified exon 2-intron 2-exon 3 amplicons. Furthermore, we determined the degree of HLA-C compatibility in 287 matched pairs that could be formed from 73 patients and their 184 potential unrelated donors compatible for HLA-AB by serology and for HLA-DRB1/ B3/B5 by oligotyping. Cw mismatches were identified in 42.1% of these pairs, and AB-subtype oligotyping showed that 30% of these Cw-incompatible pairs were also mismatched for A or B-locus subtype. The degree of HLA-C incompatibility was strongly influenced by the linkage with B alleles and by the ABDR haplotypes. Cw alleles linked with B*4403, B*5101, B18, and B62 haplotypes were frequently mismatched. Apparently high resolution DNA typing for HLA-AB does not result in full matching at locus C. Since HLA-C polymorphism is recognized by alloreactive CTLs, such incompatibilities might be as relevant as AB-subtype mismatches in clinical transplantation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Affiliation: Centre Robert-Cedergren de l'Université de Montréal en bio-informatique et génomique & Département de biochimie, Université de Montréal

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Grape berry is considered a non climacteric fruit, but there are some evidences that ethylene plays a role in the control of berry ripening. This PhD thesis aimed to give insights in the role of ethylene and ethylene-related genes in the regulation of grape berry ripening. During this study a small increase in ethylene concentration one week before véraison has been measured in Vitis vinifera L. ‘Pinot Noir’ grapes confirming previous findings in ‘Cabernet Sauvignon’. In addition, ethylene-related genes have been identified in the grapevine genome sequence. Similarly to other species, biosynthesis and ethylene receptor genes are present in grapevine as multi-gene families and their expression appeared tissue or developmental specific. All the other elements of the ethylene signal transduction cascade were also identified in the grape genome. Among them, there were ethylene response factors (ERF) which modulate the transcription of many effector genes in response to ethylene. In this study seven grapevine ERFs have been characterized and they showed tissue and berry development specific expression profiles. Two sequences, VvERF045 and VvERF063, seemed likely involved in berry ripening control due to their expression profiles and their sequence annotation. VvERF045 was induced before véraison and was specific of the ripe berry, by sequence similarity it was likely a transcription activator. VvERF063 displayed high sequence similarity to repressors of transcription and its expression, very high in green berries, was lowest at véraison and during ripening. To functionally characterize VvERF045 and VvERF063, a stable transformation strategy was chosen. Both sequences were cloned in vectors for over-expression and silencing and transferred in grape by Agrobacterium-mediated or biolistic-mediated gene transfer. In vitro, transgenic VvERF045 over-expressing plants displayed an epinastic phenotype whose extent was correlated to the transgene expression level. Four pathogen stress response genes were significantly induced in the transgenic plants, suggesting a putative function of VvERF045 in biotic stress defense during berry ripening. Further molecular analysis on the transgenic plants will help in identifying the actual VvERF045 target genes and together with the phenotypic characterization of the adult transgenic plants, will allow to extensively define the role of VvERF045 in berry ripening.