205 resultados para Bioinformatic


Relevância:

20.00% 20.00%

Publicador:

Resumo:

To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multiple Myeloma (MM) is a hematologic cancer with heterogeneous and complex genomic landscape, where Copy Number Alterations (CNAs) play a key role in the disease's pathogenesis and prognosis. It is of biological and clinical interest to study the temporal occurrence of early alterations, as they play a disease "driver" function by deregulating key tumor pathways. This study presents an innovative bioinformatic tools suite created for harmonizing and tracing the origin of CNAs throughout the evolutionary history of MM. To this aim, large cohorts of newly-diagnosed MM (NDMM, N=1582) and Smoldering-MM (SMM, N=282) were aggregated. The tools developed in this study enable the harmonization of CNAs as obtained from different genomic platforms in such a way that a high statistical power can be obtained. By doing so, the high numerosity of those cohorts was harnessed for the identification of novel genes characterized as "driver" (NFKB2, NOTCH2, MAX, EVI5 and MYC-ME2-enhancer), and the generation of an innovative timing model, implemented with a statistical method to introduce confidence intervals in the CNAs-calls. By applying this model on both NDMM and SMM cohorts, it was possible to identify specific CNAs (1q(CKS1B)amp, 13q(RB1)del, 11q(CCND1)amp and 14q(MAX)del) and categorize them as "early"/ "driver" events. A high level of precision was guaranteed by the narrow confidence intervals in the timing estimates. These CNAs were proposed as critical MM alterations, which play a foundational role in the evolutionary history of both SMM and NDMM. Finally, a multivariate survival model was able to identify the independent genomic alterations with the greatest effect on patients’ survival, including RB1-del, CKS1B-amp, MYC-amp, NOTCH2-amp and TRAF3-del/mut. In conclusion, the alterations that were identified as both "early-drivers” and correlated with patients’ survival were proposed as biomarkers that, if included in wider survival models, could provide a better disease stratification and an improved prognosis definition.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Melanoma is a highly aggressive and therapy resistant tumor for which the identification of specific markers and therapeutic targets is highly desirable. We describe here the development and use of a bioinformatic pipeline tool, made publicly available under the name of EST2TSE, for the in silico detection of candidate genes with tissue-specific expression. Using this tool we mined the human EST (Expressed Sequence Tag) database for sequences derived exclusively from melanoma. We found 29 UniGene clusters of multiple ESTs with the potential to predict novel genes with melanoma-specific expression. Using a diverse panel of human tissues and cell lines, we validated the expression of a subset of three previously uncharacterized genes (clusters Hs.295012, Hs.518391, and Hs.559350) to be highly restricted to melanoma/melanocytes and named them RMEL1, 2 and 3, respectively. Expression analysis in nevi, primary melanomas, and metastatic melanomas revealed RMEL1 as a novel melanocytic lineage-specific gene up-regulated during melanoma development. RMEL2 expression was restricted to melanoma tissues and glioblastoma. RMEL3 showed strong up-regulation in nevi and was lost in metastatic tumors. Interestingly, we found correlations of RMEL2 and RMEL3 expression with improved patient outcome, suggesting tumor and/or metastasis suppressor functions for these genes. The three genes are composed of multiple exons and map to 2q12.2, 1q25.3, and 5q11.2, respectively. They are well conserved throughout primates, but not other genomes, and were predicted as having no coding potential, although primate-conserved and human-specific short ORFs could be found. Hairpin RNA secondary structures were also predicted. Concluding, this work offers new melanoma-specific genes for future validation as prognostic markers or as targets for the development of therapeutic strategies to treat melanoma.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dermcidin (DCD) is a human gene mapped to chromosome 12q13 region, which is co-amplified with multiple oncogenes with a well-established role in the growth, survival and progression of breast cancers. Here, we present a summary of a DNA microarray-based study that identified the genes that are up- and down-regulated in a human MDA-361 pLKO control clone and three clones expressing short hairpin RNA against three different regions of DCD mRNA. A list of 235 genes was differentially expressed among independent clones (> 3-fold change and P < 0.005). The gene expression of 208 was reduced and of 27 was increased in the three DCD-RNAi clones compared to pLKO control clone. The expression of 77 genes (37%) encoding for enzymes involved in amino acid metabolism, glucose metabolism and oxidoreductase activity and several genes required for cell survival and DNA repair were decreased. The expression of EGFR/ErbB-1 gene, an important predictor of outcome in breast cancer, was reduced together with the genes for betacellulin and amphiregulin, two known ligands of EGFR/ErbB receptors. Many of the 27 genes up-regulated by DCD-RNAi expression have not yet been fully characterized; among those with known function, we identified the calcium-calmodulin-dependent protein kinase-II delta and calcineurin A alpha. We compared 132 up-regulated and 12 down-regulated genes in our dataset with those genes up- and down-regulated by inhibitors targeting various signaling pathway components. The analysis showed that the genes in the DCD pathway are aligned with those functionally influenced by the drugs sirolimus, LY-294002 and wortmannin. Therefore, DCD may exert its function by activating the PI3K/AKT/mTOR signaling pathway. Together, these bioinformatic approaches suggest the involvement of DCD in the regulation of genes for breast cancer cell metabolism, proliferation and survival.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

T cells recognize peptide epitopes bound to major histocompatibility complex molecules. Human T-cell epitopes have diagnostic and therapeutic applications in autoimmune diseases. However, their accurate definition within an autoantigen by T-cell bioassay, usually proliferation, involves many costly peptides and a large amount of blood, We have therefore developed a strategy to predict T-cell epitopes and applied it to tyrosine phosphatase IA-2, an autoantigen in IDDM, and HLA-DR4(*0401). First, the binding of synthetic overlapping peptides encompassing IA-2 was measured directly to purified DR4. Secondly, a large amount of HLA-DR4 binding data were analysed by alignment using a genetic algorithm and were used to train an artificial neural network to predict the affinity of binding. This bioinformatic prediction method was then validated experimentally and used to predict DR4 binding peptides in IA-2. The binding set encompassed 85% of experimentally determined T-cell epitopes. Both the experimental and bioinformatic methods had high negative predictive values, 92% and 95%, indicating that this strategy of combining experimental results with computer modelling should lead to a significant reduction in the amount of blood and the number of peptides required to define T-cell epitopes in humans.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: A variety of methods for prediction of peptide binding to major histocompatibility complex (MHC) have been proposed. These methods are based on binding motifs, binding matrices, hidden Markov models (HMM), or artificial neural networks (ANN). There has been little prior work on the comparative analysis of these methods. Materials and Methods: We performed a comparison of the performance of six methods applied to the prediction of two human MHC class I molecules, including binding matrices and motifs, ANNs, and HMMs. Results: The selection of the optimal prediction method depends on the amount of available data (the number of peptides of known binding affinity to the MHC molecule of interest), the biases in the data set and the intended purpose of the prediction (screening of a single protein versus mass screening). When little or no peptide data are available, binding motifs are the most useful alternative to random guessing or use of a complete overlapping set of peptides for selection of candidate binders. As the number of known peptide binders increases, binding matrices and HMM become more useful predictors. ANN and HMM are the predictive methods of choice for MHC alleles with more than 100 known binding peptides. Conclusion: The ability of bioinformatic methods to reliably predict MHC binding peptides, and thereby potential T-cell epitopes, has major implications for clinical immunology, particularly in the area of vaccine design.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Allergy is a major cause of morbidity worldwide. The number of characterized allergens and related information is increasing rapidly creating demands for advanced information storage, retrieval and analysis. Bioinformatics provides useful tools for analysing allergens and these are complementary to traditional laboratory techniques for the study of allergens. Specific applications include structural analysis of allergens, identification of B- and T-cell epitopes, assessment of allergenicity and cross-reactivity, and genome analysis. In this paper, the most important bioinformatic tools and methods with relevance to the study of allergy have been reviewed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term patholog to mean a homolog of a human disease-related gene encoding a product ( transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. Results: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity ( 70 - 85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool ( FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic ( 53%), hereditary ( 24%), immunological ( 5%), cardio-vascular (4%), or other (14%), disorders. Conclusions: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recombinant protein production in bacteria is efficient except that insoluble inclusion bodies form when some gene sequences are expressed. Such proteins must undergo renaturation, which is an inefficient process due to protein aggregation on dilution from concentrated denaturant. In this study, the protein-protein interactions of eight distinct inclusion-body proteins are quantified, in different solution conditions, by measurement of protein second virial coefficients (SVCs). Protein solubility is shown to decrease as the SVC is reduced (i.e., as protein interactions become more attractive). Plots of SVC versus denaturant concentration demonstrate two clear groupings of proteins: a more aggregative group and a group having higher SVC and better solubility. A correlation of the measured SVC with protein molecular weight and hydropathicity, that is able to predict which group each of the eight proteins falls into, is presented. The inclusion of additives known to inhibit aggregation during renaturation improves solubility and increases the SVC of both protein groups. Furthermore, an estimate of maximum refolding yield (or solubility) using high-performance liquid chromatography was obtained for each protein tested, under different environmental conditions, enabling a relationship between yield and SVC to be demonstrated. Combined, the results enable an approximate estimation of the maximum refolding yield that is attainable for each of the eight proteins examined, under a selected chemical environment. Although the correlations must be tested with a far larger set of protein sequences, this work represents a significant move beyond empirical approaches for optimizing renaturation conditions. The approach moves toward the ideal of predicting maximum refolding yield using simple bioinformatic metrics that can be estimated from the gene sequence. Such a capability could potentially screen, in silico, those sequences suitable for expression in bacteria from those that must be expressed in more complex hosts. (C) 2004 Wiley Periodicals, Inc.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Leptospirosis is a worldwide zoonosis caused by pathogenic Leptospira. The whole-genome sequence of Leptospira interrogans serovar Copenhageni together with bioinformatic tools allow us to search for novel antigen candidates suitable for improved vaccines against leptospirosis. This study focused on three genes encoding conserved hypothetical proteins predicted to be exported to the outer membrane. The genes were amplified by PCR from six predominant pathogenic serovars in Brazil. The genes were cloned and expressed in Escherichia coli strain BL21-SI using the expression vector pDEST17. The recombinant proteins tagged with N-terminal 6xHis were purified by metal-charged chromatography. The proteins were recognized by antibodies present in sera from hamsters that were experimentally infected. Immunization of hamsters followed by challenge with a lethal dose of a virulent strain of Leptospira showed that the recombinant protein rLIC12730 afforded statistically significant protection to animals (44 %), followed by rLIC10494 (40 %) and rLIC12922 (30 %). Immunization with these proteins produced an increase in antibody titres during subsequent boosters, suggesting the involvement of a T-helper 2 response. Although more studies are needed, these data suggest that rLIC12730 and rLIC10494 are promising candidates for a multivalent vaccine for the prevention of leptospirosis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

P>Context We previously described a six-generation family with G533C RET mutation and medullary thyroid carcinoma, in the largest family reported do date. Of particular interest, phenotype variability regarding the age of onset and clinical presentation of the disease, was observed. Objective We evaluate whether single SNPs within RET oncogene or haplotype comprising the RET variants (defined by Haploview) could predispose to early development of MTC in this family and influence the clinical manifestation. Design Eight SNPs were selected based on their previous association with the clinical course of hereditary or sporadic MTC, in particular promoting an early onset of disease. The variants were initially tested in 77 G533C-carriers and 100 controls using either PCR-direct sequencing or PCR-RFLP. Association between a SNP or haplotype and age at diagnosis or presence of lymph node metastasis was tested in 34 G533C-carries with MTC. Different bioinformatic tools were used to evaluate the potential effects on RNA splicing. Results An association was found between IVS1-126G > T and age at diagnosis. The variant [IVS8 +82A > G; 85-86 insC] was associated with the presence of lymph node metastases at diagnosis. In silico analysis suggested that this variant may induce abnormal splicing. This in silico analysis predicted that the [IVS8 +82A > G; 85-86 insC] could alter the splicing by disrupting and/or creating exonic splicing enhancer motifs. Conclusions We here identified two RET variants that were associated with phenotype variability in G533C-carriers, which highlights the fact that the modifier effect of a variant might depend on the type of mutation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

RNA silencing refers to a series of nuclear and cytoplasmatic processes involved in the post-transcriptional regulation of gene expression or post-transcriptional gene silencing (PTGS), either by sequence-specific mRNA degradation or by translational at-rest. The best characterized small RNAs are microRNAs (miRNAs), which predominantly perform gene silencing through post-transcriptional mechanisms. in this work we used bioinformatic approaches to identify the parasitic trematode Schistosoma Mansoni sequences that are similar to enzymes involved in the post-transcriptional gene silencing mediated by miRNA pathway. We used amino acid sequences of well-known proteins involved in the miRNA pathway against S. mansoni genome and transcriptome databases identifying a total of 13 Putative proteins in the parasite. In addition, the transcript levels of SinDicer1 and SmAgo2/3/4 were identified by qRT-PCR using cercariae, adult worms, eggs and in vitro Cultivated schistosomula. Our results showed that the SmDicer1 and SmAgo2/3/4 are differentially expressed during schistosomula development, suggesting that the miRNA pathway is regulated at the transcript level and therefore may control gene expression during the life cycle of S. mansoni. (C) 2008 Published by Elsevier Ireland Ltd.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A substantial number of GH regulated genes have been reported in mature hepatocytes. but genes involved in GH-initiated cell differentiation have not yet been identified. Here we have studied a, ell-characterised model of GH-dependent differentiation, adipogenesis of 3T3-F442A preadipocytes, to identify genes rapidly induced by GH. Using the suppression subtractive hybridisation technique, we have identified eight genes induced within 60 min of GH treatment, and verified these by northern analysis. Six were identifiable as Stat 2. Stat 3, thrombospondin-1. oncostatin M receptor beta chain. a DEAD box RNA helicase. and muscleblind. a developmental transcription factor. Bioinformatic approaches assigned one of the two remaining unknown genes as a novel 436 residue serine,threonine kinase. As each of the identified genes hake important developmental roles. they may be important in initiating GH-induced adipogenesis. (C) 2002 Elsevier Science Ireland Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O cancro da mama e o cancro colorretal constituem duas das principais causas de morte a nível mundial. Entre 5 a 10% destes casos estão associados a variantes germinais/hereditárias em genes de suscetibilidade para cancro. O objetivo deste trabalho consistiu em validar a utilização da sequenciação de nova geração (NGS) para identificar variantes previamente detetadas pelo método de Sanger em diversos genes de suscetibilidade para cancro da mama e colorretal. Foram sequenciadas por NGS 64 amostras de DNA de utentes com suspeita clínica de predisposição hereditária para cancro da mama ou colorretal, utilizando o painel de sequenciação TruSight Cancer e a plataforma MiSeq (Illumina). Estas amostras tinham sido previamente sequenciadas pelo método de Sanger para os genes BRCA1, BRCA2, TP53, APC, MUTYH, MLH1, MSH2 e STK11. A análise bioinformática dos resultados foi realizada com os softwares MiSeq Reporter, VariantStudio, Isaac Enrichment (Illumina) e Integrative Genomics Viewer (Broad Institute). A NGS demonstrou elevada sensibilidade e especificidade analíticas para a deteção de variantes de sequência em 8 genes de suscetibilidade para cancro colorretal e da mama, uma vez que permitiu identificar a totalidade das 412 variantes (93 únicas, incluindo 27 variantes patogénicas) previamente detetadas pelo método de Sanger. A utilização de painéis de sequenciação de genes de predisposição para cancro por NGS vem possibilitar um diagnóstico molecular mais abrangente, rápido e custo-eficiente, relativamente às metodologias convencionais.