74 resultados para Imprinted Genes


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure. RESULTS: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae. CONCLUSION: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Placental malaria is a special form of malaria that causes up to 200,000 maternal and infant deaths every year. Previous studies show that two receptor molecules, hyaluronic acid and chondroitin sulphate A, are mediating the adhesion of parasite-infected erythrocytes in the placenta of patients, which is believed to be a key step in the pathogenesis of the disease. In this study, we aimed at identifying sites of malaria-induced adaptation by scanning for signatures of natural selection in 24 genes in the complete biosynthesis pathway of these two receptor molecules. We analyzed a total of 24 Mb of publicly available polymorphism data from the International HapMap project for three human populations with European, Asian and African ancestry, with the African population from a region of presently and historically high malaria prevalence. Using the methods based on allele frequency distributions, genetic differentiation between populations, and on long-range haplotype structure, we found only limited evidence for malaria-induced genetic adaptation in this set of genes in the African population; however, we identified one candidate gene with clear evidence of selection in the Asian population. Although historical exposure to malaria in this population cannot be ruled out, we speculate that it might be caused by other pathogens, as there is growing evidence that these molecules are important receptors in a variety of host-pathogen interactions. We propose to use the present methods in a systematic way to help identify candidate regions under positive selection as a consequence of malaria.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A large proportion of the death toll associated with malaria is a consequence of malaria infection during pregnancy, causing up to 200,000 infant deaths annually. We previously published the first extensive genetic association study of placental malaria infection, and here we extend this analysis considerably, investigating genetic variation in over 9,000 SNPs in more than 1,000 genes involved in immunity and inflammation for their involvement in susceptibility to placental malaria infection. We applied a new approach incorporating results from both single gene analysis as well as gene-gene interactionson a protein-protein interaction network. We found suggestive associations of variants in the gene KLRK1 in the single geneanalysis, as well as evidence for associations of multiple members of the IL-7/IL-7R signalling cascade in the combined analysis. To our knowledge, this is the first large-scale genetic study on placental malaria infection to date, opening the door for follow-up studies trying to elucidate the genetic basis of this neglected form of malaria.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: An excess of caffeine is cytotoxic to all eukaryotic cell types. We aim to study how cells become tolerant to atoxic dose of this drug, and the relationship between caffeine and oxidative stress pathways.Methodology/Principal Findings: We searched for Schizosaccharomyces pombe mutants with inhibited growth on caffeinecontainingplates. We screened a collection of 2,700 haploid mutant cells, of which 98 were sensitive to caffeine. The genes mutated in these sensitive clones were involved in a number of cellular roles including the H2O2-induced Pap1 and Sty1 stress pathways, the integrity and calcineurin pathways, cell morphology and chromatin remodeling. We have investigated the role of the oxidative stress pathways in sensing and promoting survival to caffeine. The Pap1 and the Sty1 pathways are both required for normal tolerance to caffeine, but only the Sty1 pathway is activated by the drug. Cells lacking Pap1 aresensitive to caffeine due to the decreased expression of the efflux pump Hba2. Indeed, ?hba2 cells are sensitive to caffeine, and constitutive activation of the Pap1 pathway enhances resistance to caffeine in an Hba2-dependent manner. Conclusions/Significance: With our caffeine-sensitive, genome-wide screen of an S. pombe deletion collection, we havedemonstrated the importance of some oxidative stress pathway components on wild-type tolerance to the drug.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: It is well known that the pattern of linkage disequilibrium varies between human populations, with remarkable geographical stratification. Indirect association studies routinely exploit linkage disequilibrium around genes, particularly in isolated populations where it is assumed to be higher. Here, we explore both the amount and the decay of linkage disequilibrium with physical distance along 211 gene regions, most of them related to complex diseases, across 39 HGDP-CEPH population samples, focusing particularly on the populations defined as isolates. Within each gene region and population we use r2 between all possible single nucleotide polymorphism (SNP) pairs as a measure of linkage disequilibrium and focus on the proportion of SNP pairs with r2 greater than 0.8.Results: Although the average r2 was found to be significantly different both between and within continental regions, a much higher proportion of r2 variance could be attributed to differences between continental regions (2.8% vs. 0.5%, respectively). Similarly, while the proportion of SNP pairs with r2 > 0.8 was significantly different across continents for all distance classes, it was generally much more homogenous within continents, except in the case of Africa and the Americas. The only isolated populations with consistently higher LD in all distance classes with respect to their continent are the Kalash (Central South Asia) and the Surui (America). Moreover, isolated populations showed only slightly higher proportions of SNP pairs with r2 > 0.8 per gene region than non-isolated populations in the same continent. Thus, the number of SNPs in isolated populations that need to be genotyped may be only slightly less than in non-isolates. Conclusion: The "isolated population" label by itself does not guarantee a greater genotyping efficiency in association studies, and properties other than increased linkage disequilibrium may make these populations interesting in genetic epidemiology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Different regions in a genome evolve at different rates depending on structural and functional constraints. Some genomic regions are highly conserved during metazoan evolution, while other regions may evolve rapidly, either in all species or in a lineage-specific manner. A strong or even moderate change in constraints in functional regions, for example in coding regions, can have significant evolutionary consequences. Results: Here we discuss a novel framework, 'BaseDiver', to classify groups of genes in humans based on the patterns of evolutionary constraints on polymorphic positions in their coding regions. Comparing the nucleotide-level divergence among mammals with the extent of deviation from the ancestral base in the human lineage, we identify patterns of evolutionary pressure on nonsynonymous base-positions in groups of genes belonging to the same functional category. Focussing on groups of genes in functional categories, we find that transcription factors contain a significant excess of nonsynonymous base-positions that are conserved in other mammals but changed in human, while immunity related genes harbour mutations at base-positions that evolve rapidly in all mammals including humans due to strong preference for advantageous alleles. Genes involved in olfaction also evolve rapidly in all mammals, and in humans this appears to be due to weak negative selection. Conclusion: While recent studies have identified genes under positive selection in humans, our approach identifies evolutionary constraints on Gene Ontology groups identifying changes in humans relative to some of the other mammals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Cancer is a major medical problem in modern societies. However, the incidence of this disease in non-human primates is very low. To study whether genetic differences between human and chimpanzee could contribute to their distinct cancer susceptibility, we have examined in the chimpanzee genome the orthologous genes of a set of 333 human cancer genes. Results: This analysis has revealed that all examined human cancer genes are present in chimpanzee, contain intact open reading frames and show a high degree of conservation between both species. However, detailed analysis of this set of genes has shown some differences in genes of special relevance for human cancer. Thus, the chimpanzee gene encoding p53 contains a Pro residue at codon 72, while this codon is polymorphic in humans and can code for Arg or Pro, generating isoforms with different ability to induce apoptosis or interact with p73. Moreover, sequencing of the BRCA1 gene has shown an 8 Kb deletion in the chimpanzee sequence that prematurely truncates the co-regulated NBR2 gene. Conclusion: These data suggest that small differences in cancer genes, as those found in tumor suppressor genes, might influence the differences in cancer susceptibility between human and chimpanzee. Nevertheless, further analysis will be required to determine the exact contribution of the genetic changes identified in this study to the different cancer incidence in non-human primates.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: One of the main goals of cancer genetics is to identify the causative elements at the molecular level leading to cancer.Results: We have conducted an analysis of a set of genes known to be involved in cancer in order to unveil their unique features that can assist towards the identification of new candidate cancer genes. Conclusion: We have detected key patterns in this group of genes in terms of the molecular function or the biological process in which they are involved as well as sequence properties. Based on these features we have developed an accurate Bayesian classification model with which human genes have been scored for their likelihood of involvement in cancer.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Systematic approaches for identifying proteins involved in different types of cancer are needed. Experimental techniques such as microarrays are being used to characterize cancer, but validating their results can be a laborious task. Computational approaches are used to prioritize between genes putatively involved in cancer, usually based on further analyzing experimental data. Results: We implemented a systematic method using the PIANA software that predicts cancer involvement of genes by integrating heterogeneous datasets. Specifically, we produced lists of genes likely to be involved in cancer by relying on: (i) protein-protein interactions; (ii) differential expression data; and (iii) structural and functional properties of cancer genes. The integrative approach that combines multiple sources of data obtained positive predictive values ranging from 23% (on a list of 811 genes) to 73% (on a list of 22 genes), outperforming the use of any of the data sources alone. We analyze a list of 20 cancer gene predictions, finding that most of them have been recently linked to cancer in literature. Conclusion: Our approach to identifying and prioritizing candidate cancer genes can be used to produce lists of genes likely to be involved in cancer. Our results suggest that differential expression studies yielding high numbers of candidate cancer genes can be filtered using protein interaction networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.