37 resultados para KEGG
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection.
Resumo:
L-2-hydroxyglutaric aciduria (L-2-HGA, MIM 236792) is a neurometabolic disorder caused by the toxic accumulation of high concentration of L-2-hydroxyglutaric acid in plasma and cerebrospinal fluid. Distinct mutations on the L2HGDH gene have been associated with the clinical and biochemical phenotype. Here we present three novel mutations (Gln197X, Gly211Val and c.540+1 G>A), which increase the present deleterious collection of L2HGDH gene up to 35 mutations that we have compiled in this study. In addition, we used the haplotypic information based on polymorphic markers to demonstrate the common origin of Gly57Arg harboring chromosomes. Journal of Human Genetics (2010) 55, 55-58; doi: 10.1038/jhg.2009.110; published online 13 November 2009
Resumo:
Although patterns of somatic alterations have been reported for tumor genomes, little is known on how they compare with alterations present in non-tumor genomes. A comparison of the two would be crucial to better characterize the genetic alterations driving tumorigenesis. We sequenced the genomes of a lymphoblastoid (HCC1954BL) and a breast tumor (HCC1954) cell line derived from the same patient and compared the somatic alterations present in both. The lymphoblastoid genome presents a comparable number and similar spectrum of nucleotide substitutions to that found in the tumor genome. However, a significant difference in the ratio of non-synonymous to synonymous substitutions was observed between both genomes (P = 0.031). Protein-protein interaction analysis revealed that mutations in the tumor genome preferentially affect hub-genes (P = 0.0017) and are co-selected to present synergistic functions (P < 0.0001). KEGG analysis showed that in the tumor genome most mutated genes were organized into signaling pathways related to tumorigenesis. No such organization or synergy was observed in the lymphoblastoid genome. Our results indicate that endogenous mutagens and replication errors can generate the overall number of mutations required to drive tumorigenesis and that it is the combination rather than the frequency of mutations that is crucial to complete tumorigenic transformation.
Resumo:
This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.
Resumo:
Hidrolases englobam um grupo de enzimas que catalisam a quebra de ligações covalentes em reação com água; entre elas estão as proteases, amilases, lipases, pectinases, celulases e catalases. Essas enzimas são muito importantes, com ampla utilização na indústria em geral. O solo é um ambiente muito rico e diverso em microrganismos, sendo considerado a maior fonte para obtenção de substâncias, enzimas e antibióticos, por exemplo. Com a metagenômica, passou a ser possível acessar melhor esse potencial microbiano, permitindo a descoberta de novos genes e biomoléculas. Neste estudo foram coletadas amostras de solos (0-10 cm de profundidade) do norte do Paraná visando buscar hidrolases microbianas funcionais. Foi realizada a extração do DNA de um Latossolo Vermelho Eutroférrico sob quatro manejos de solo e de culturas distintos e as amostras foram submetidas ao sequenciamento utilizando a plataforma 454 (Applied Science). As sequências de DNA foram comparadas com o banco de dados não redundante (NR) do NCBI (National Center for Biotechnology Information) e KEGG (Kyoto Encyclopedia of Genes and Genomes) para busca de similaridade com proteases, amilases, lipases, pectinases, celulases e catalases. A partir do DNA total foram realizadas reações de PCR (Polymerase Chain Reaction) com primers degenerados direcionados para a amplificação de pectinases, celulases e lacases e os produtos de PCR foram purificados com Purelink kit (Invitrogen®) e sequenciados (ABI 3500xL, Aplied Biosystems®). A comparação com as sequências do NCBI e KEGG resultou na identificação de 1.137 sequências com grande similaridade com a enzima lacase; 16.883 sequências para celulase; 2.001 para pectinase; 1.006 para amilase; e 3.725 para lipase. Esses resultados mostram que esses solos agrícolas representam uma fonte importante de recursos biológicos para aplicação industrial, principalmente de enzimas celulases. Até o presente momento, o sequenciamento de 26 produtos amplificados por PCR apresentou identidade para uma amostra, que foi identificada como a enzima celulase.
Resumo:
Dissertação de mestrado em Bioinformática
Resumo:
En la presente memoria se detallan con exactitud los pasos y procesos realizados para construir una aplicación que posibilite el cruce de datos genéticos a partir de información contenida en bases de datos remotas. Desarrolla un estudio en profundidad del contenido y estructura de las bases de datos remotas del NCBI y del KEGG, documentando una minería de datos con el objetivo de extraer de ellas la información necesaria para desarrollar la aplicación de cruce de datos genéticos. Finalmente se establecen los programas, scripts y entornos gráficos que han sido implementados para la construcción y posterior puesta en marcha de la aplicación que proporciona la funcionalidad de cruce de la que es objeto este proyecto fin de carrera.
Resumo:
Pneumocystis jirovecii is a fungus causing severe pneumonia in immuno-compromised patients. Progress in understanding its pathogenicity and epidemiology has been hampered by the lack of a long-term in vitro culture method. Obligate parasitism of this pathogen has been suggested on the basis of various features but remains controversial. We analysed the 7.0 Mb draft genome sequence of the closely related species Pneumocystis carinii infecting rats, which is a well established experimental model of the disease. We predicted 8'085 (redundant) peptides and 14.9% of them were mapped onto the KEGG biochemical pathways. The proteome of the closely related yeast Schizosaccharomyces pombe was used as a control for the annotation procedure (4'974 genes, 14.1% mapped). About two thirds of the mapped peptides of each organism (65.7% and 73.2%, respectively) corresponded to crucial enzymes for the basal metabolism and standard cellular processes. However, the proportion of P. carinii genes relative to those of S. pombe was significantly smaller for the "amino acid metabolism" category of pathways than for all other categories taken together (40 versus 114 against 278 versus 427, P<0.002). Importantly, we identified in P. carinii only 2 enzymes specifically dedicated to the synthesis of the 20 standard amino acids. By contrast all the 54 enzymes dedicated to this synthesis reported in the KEGG atlas for S. pombe were detected upon reannotation of S. pombe proteome (2 versus 54 against 278 versus 427, P<0.0001). This finding strongly suggests that species of the genus Pneumocystis are scavenging amino acids from their host's lung environment. Consequently, they would have no form able to live independently from another organism, and these parasites would be obligate in addition to being opportunistic. These findings have implications for the management of patients susceptible to P. jirovecii infection given that the only source of infection would be other humans.
Resumo:
UniPathway (http://www.unipathway.org) is a fully manually curated resource for the representation and annotation of metabolic pathways. UniPathway provides explicit representations of enzyme-catalyzed and spontaneous chemical reactions, as well as a hierarchical representation of metabolic pathways. This hierarchy uses linear subpathways as the basic building block for the assembly of larger and more complex pathways, including species-specific pathway variants. All of the pathway data in UniPathway has been extensively cross-linked to existing pathway resources such as KEGG and MetaCyc, as well as sequence resources such as the UniProt KnowledgeBase (UniProtKB), for which UniPathway provides a controlled vocabulary for pathway annotation. We introduce here the basic concepts underlying the UniPathway resource, with the aim of allowing users to fully exploit the information provided by UniPathway.
Resumo:
Chronic hepatitis B (HBV) and C (HCV) virus infections are the most important factors associated with hepatocellular carcinoma (HCC), but tumor prognosis remains poor due to the lack of diagnostic biomarkers. In order to identify novel diagnostic markers and therapeutic targets, the gene expression profile associated with viral and non-viral HCC was assessed in 9 tumor samples by oligo-microarrays. The differentially expressed genes were examined using a z-score and KEGG pathway for the search of ontological biological processes. We selected a non-redundant set of 15 genes with the lowest P value for clustering samples into three groups using the non-supervised algorithm k-means. Fisher’s linear discriminant analysis was then applied in an exhaustive search of trios of genes that could be used to build classifiers for class distinction. Different transcriptional levels of genes were identified in HCC of different etiologies and from different HCC samples. When comparing HBV-HCC vs HCV-HCC, HBV-HCC/HCV-HCC vs non-viral (NV)-HCC, HBC-HCC vs NV-HCC, and HCV-HCC vs NV-HCC of the 58 non-redundant differentially expressed genes, only 6 genes (IKBKβ, CREBBP, WNT10B, PRDX6, ITGAV, and IFNAR1) were found to be associated with hepatic carcinogenesis. By combining trios, classifiers could be generated, which correctly classified 100% of the samples. This expression profiling may provide a useful tool for research into the pathophysiology of HCC. A detailed understanding of how these distinct genes are involved in molecular pathways is of fundamental importance to the development of effective HCC chemoprevention and treatment.
Resumo:
In this study, biomarkers and transcriptional factor motifs were identified in order to investigate the etiology and phenotypic severity of Down syndrome. GSE 1281, GSE 1611, and GSE 5390 were downloaded from the gene expression ominibus (GEO). A robust multiarray analysis (RMA) algorithm was applied to detect differentially expressed genes (DEGs). In order to screen for biological pathways and to interrogate the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, the database for annotation, visualization, and integrated discovery (DAVID) was used to carry out a gene ontology (GO) function enrichment for DEGs. Finally, a transcriptional regulatory network was constructed, and a hypergeometric distribution test was applied to select for significantly enriched transcriptional factor motifs. CBR1, DYRK1A, HMGN1, ITSN1, RCAN1, SON, TMEM50B, and TTC3 were each up-regulated two-fold in Down syndrome samples compared to normal samples; of these, SON and TTC3 were newly reported. CBR1, DYRK1A, HMGN1, ITSN1, RCAN1, SON, TMEM50B, and TTC3 were located on human chromosome 21 (mouse chromosome 16). The DEGs were significantly enriched in macromolecular complex subunit organization and focal adhesion pathways. Eleven significantly enriched transcription factor motifs (PAX5, EGR1, XBP1, SREBP1, OLF1, MZF1, NFY, NFKAPPAB, MYCMAX, NFE2, and RP58) were identified. The DEGs and transcription factor motifs identified in our study provide biomarkers for the understanding of Down syndrome pathogenesis and progression.
Resumo:
L’objectif de ce projet était de faire le lien entre gènes et métabolites afin d’éventuellement proposer des métabolites à mesurer en lien avec la fonction de gènes. Plus particulièrement, nous nous sommes intéressés aux gènes codant pour des protéines ayant un impact sur le métabolisme, soit les enzymes qui catalysent les réactions faisant partie intégrante des voies métaboliques. Afin de quantifier ce lien, nous avons développé une méthode bio-informatique permettant de calculer la distance qui est définie comme le nombre de réactions entre l’enzyme encodée par le gène et le métabolite dans la carte globale du métabolisme de la base de données Kyoto Encyclopedia of Genes and Genomes (KEGG). Notre hypothèse était que les métabolites d’intérêt sont des substrats/produits se trouvant à proximité des réactions catalysées par l’enzyme encodée par le gène. Afin de tester cette hypothèse et de valider la méthode, nous avons utilisé les études d’association pangénomique combinées à la métabolomique (mGWAS) car elles rapportent des associations entre variants génétiques, annotés en gènes, et métabolites mesurés. Plus précisément, la méthode a été appliquée à l’étude mGWAS par Shin et al. Bien que la couverture des associations de Shin et al. était limitée (24/299), nous avons pu valider de façon significative la proximité entre gènes et métabolites associés (P<0,01). En somme, cette méthode et ses développements futurs permettront d’interpréter de façon quantitative les associations mGWAS, de prédire quels métabolites mesurer en lien avec la fonction d’un gène et, plus généralement, de permettre une meilleure compréhension du contrôle génétique sur le métabolisme.
Resumo:
The animal gastrointestinal tract houses a large microbial community, the gut microbiota, that confers many benefits to its host, such as protection from pathogens and provision of essential metabolites. Metagenomic approaches have defined the chicken fecal microbiota in other studies, but here, we wished to assess the correlation between the metagenome and the bacterial proteome in order to better understand the healthy chicken gut microbiota. Here, we performed high-throughput sequencing of 16S rRNA gene amplicons and metaproteomics analysis of fecal samples to determine microbial gut composition and protein expression. 16 rRNA gene sequencing analysis identified Clostridiales, Bacteroidaceae, and Lactobacillaceae species as the most abundant species in the gut. For metaproteomics analysis, peptides were generated by using the Fasp method and subsequently fractionated by strong anion exchanges. Metaproteomics analysis identified 3,673 proteins. Among the most frequently identified proteins, 380 proteins belonged to Lactobacillus spp., 155 belonged to Clostridium spp., and 66 belonged to Streptococcus spp. The most frequently identified proteins were heat shock chaperones, including 349 GroEL proteins, from many bacterial species, whereas the most abundant enzymes were pyruvate kinases, as judged by the number of peptides identified per protein (spectral counting). Gene ontology and KEGG pathway analyses revealed the functions and locations of the identified proteins. The findings of both metaproteomics and 16S rRNA sequencing analyses are discussed.