1000 resultados para Universitat Pompeu Fabra
Resumo:
A report of the 6th Georgia Tech-Oak Ridge National Lab International Conference on Bioinformatics 'In silico Biology: Gene Discovery and Systems Genomics', Atlanta, USA, 15-17 November, 2007.
Resumo:
Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.
Resumo:
Selenoproteins are a diverse group of proteinsusually misidentified and misannotated in sequencedatabases. The presence of an in-frame UGA (stop)codon in the coding sequence of selenoproteingenes precludes their identification and correctannotation. The in-frame UGA codons are recodedto cotranslationally incorporate selenocysteine,a rare selenium-containing amino acid. The developmentof ad hoc experimental and, more recently,computational approaches have allowed the efficientidentification and characterization of theselenoproteomes of a growing number of species.Today, dozens of selenoprotein families have beendescribed and more are being discovered in recentlysequenced species, but the correct genomic annotationis not available for the majority of thesegenes. SelenoDB is a long-term project that aims toprovide, through the collaborative effort of experimentaland computational researchers, automaticand manually curated annotations of selenoproteingenes, proteins and SECIS elements. Version 1.0 ofthe database includes an initial set of eukaryoticgenomic annotations, with special emphasis on thehuman selenoproteome, for immediate inspectionby selenium researchers or incorporation into moregeneral databases. SelenoDB is freely available athttp://www.selenodb.org.
Resumo:
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
Resumo:
Selenoproteins contain the amino acid selenocysteine which is encoded by a UGA Sec codon. Recoding UGA Sec requires a complex mechanism, comprising the cis-acting SECIS RNA hairpin in the 3′UTR of selenoprotein mRNAs, and trans-acting factors. Among these, the SECIS Binding Protein 2 (SBP2) is central to the mechanism. SBP2 has been so far functionally characterized only in rats and humans. In this work, we report the characterization of the Drosophila melanogaster SBP2 (dSBP2). Despite its shorter length, it retained the same selenoprotein synthesis-promoting capabilities as the mammalian counterpart. However, a major difference resides in the SECIS recognition pattern: while human SBP2 (hSBP2) binds the distinct form 1 and 2 SECIS RNAs with similar affinities, dSBP2 exhibits high affinity toward form 2 only. In addition, we report the identification of a K (lysine)-rich domain in all SBP2s, essential for SECIS and 60S ribosomal subunit binding, differing from the well-characterized L7Ae RNA-binding domain. Swapping only five amino acids between dSBP2 and hSBP2 in the K-rich domain conferred reversed SECIS-binding properties to the proteins, thus unveiling an important sequence for form 1 binding.
Resumo:
The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.
Resumo:
The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.
Resumo:
Selenocysteine (Sec) is co-translationally inserted into selenoproteins in response to codon UGA with the help of the selenocysteine insertion sequence (SECIS) element. The number of selenoproteins in animals varies, with humans having 25 and mice having 24 selenoproteins. To date, however, only one selenoprotein, thioredoxin reductase, has been detected in Caenorhabditis elegans, and this enzyme contains only one Sec. Here, we characterize the selenoproteomes of C.elegans and Caenorhabditis briggsae with three independent algorithms, one searching for pairs of homologous nematode SECIS elements, another searching for Cys- or Sec-containing homologs of potential nematode selenoprotein genes and the third identifying Sec-containing homologs of annotated nematode proteins. These methods suggest that thioredoxin reductase is the only Sec-containing protein in the C.elegans and C.briggsae genomes. In contrast, we identified additional selenoproteins in other nematodes. Assuming that Sec insertion mechanisms are conserved between nematodes and other eukaryotes, the data suggest that nematode selenoproteomes were reduced during evolution, and that in an extreme reduction case Sec insertion systems probably decode only a single UGA codon in C.elegans and C.briggsae genomes. In addition, all detected genes had a rare form of SECIS element containing a guanosine in place of a conserved adenosine present in most other SECIS structures, suggesting that in organisms with small selenoproteomes SECIS elements may change rapidly.
Resumo:
Background: Despite the continuous production of genome sequence for a number of organisms,reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularlytrue for genomes for which there is not a large collection of known gene sequences, such as therecently published chicken genome. We used the chicken sequence to test comparative andhomology-based gene-finding methods followed by experimental validation as an effective genomeannotation method.Results: We performed experimental evaluation by RT-PCR of three different computational genefinders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram wascomputed and each component of it was evaluated. The results showed that de novo comparativemethods can identify up to about 700 chicken genes with no previous evidence of expression, andcan correctly extend about 40% of homology-based predictions at the 5' end.Conclusions: De novo comparative gene prediction followed by experimental verification iseffective at enhancing the annotation of the newly sequenced genomes provided by standardhomology-based methods.
Resumo:
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Resumo:
Background: Despite the fact that labour market flexibility has resulted in an expansion of precarious employment in industrialized countries, to date there is limited empirical evidence about its health consequences. The Employment Precariousness Scale (EPRES) is a newly developed, theory-based, multidimensional questionnaire specifically devised for epidemiological studies among waged and salaried workers. Objective: To assess acceptability, reliability and construct validity of EPRES in a sample of waged and salaried workers in Spain. Methods: Cross-sectional study, using a sub-sample of 6.968 temporary and permanent workers from a population-based survey carried out in 2004-2005. The survey questionnaire was interviewer administered and included the six EPRES subscales, measures of the psychosocial work environment (COPSOQ ISTAS21), and perceived general and mental health (SF-36). Results: A high response rate to all EPRES items indicated good acceptability; Cronbach’s alpha coefficients, over 0.70 for all subscales and the global score, demonstrated good internal consistency reliability; exploratory factor analysis using principal axis analysis and varimax rotation confirmed the six-subscale structure and the theoretical allocation of all items. Patterns across known groups and correlation coefficients with psychosocial work environment measures and perceived health demonstrated the expected relations, providing evidence of construct validity. Conclusions: Our results provide evidence in support of the psychometric properties of EPRES, which appears to be a promising tool for the measurement of employment precariousness in public health research.
Resumo:
Studies of large sets of SNP data have proven to be a powerful tool in the analysis of the genetic structure of human populations. In this work, we analyze genotyping data for 2,841 SNPs in 12 Sub-Saharan African populations, including a previously unsampled region of south-eastern Africa (Mozambique). We show that robust results in a world-wide perspective can be obtained when analyzing only 1,000 SNPs. Our main results both confirm the results of previous studies, and show new and interesting features in Sub-Saharan African genetic complexity. There is a strong differentiation of Nilo-Saharans, much beyond what would be expected by geography. Hunter-gatherer populations (Khoisan and Pygmies) show a clear distinctiveness with very intrinsic Pygmy (and not only Khoisan) genetic features. Populations of the West Africa present an unexpected similarity among them, possibly the result of a population expansion. Finally, we find a strong differentiation of the south-eastern Bantu population from Mozambique, which suggests an assimilation of a pre-Bantu substrate by Bantu speakers in the region.
Resumo:
Either calorie restriction, loss of function of the nutrient-dependent PKA or TOR/SCH9 pathways, or activation of stress defences improves longevity in different eukaryotes. However, the molecular links between glucose depletion, nutrient-dependent pathways and stress responses are unknown. Here we show that either calorie restriction or inactivation of nutrient-dependent pathways induces life-span extension in fission yeast, and that such effect is dependent on the activation of the stress-dependent Sty1 MAP kinase. During transition to stationary phase in glucose-limiting conditions, Sty1 becomes activated and triggers a transcriptional stress program, whereas such activation does not occur under glucose-rich conditions. Deletion of the genes coding for the SCH9-homologue Sck2 or the Pka1 kinases, or mutations leading to constitutive activation of the Sty1 stress pathway increase life span under glucose-rich conditions, and importantly such beneficial effects depend ultimately on Sty1. Furthermore, cells lacking Pka1 display enhanced oxygen consumption and Sty1 activation under glucose-rich conditions. We conclude that calorie restriction favours oxidative metabolism, reactive oxygen species production and Sty1 MAP kinase activation, and this stress pathway favours life-span extension.
Resumo:
Malaria in pregnancy forms a substantial part of the worldwide burden of malaria, with an estimated annual death toll of up to 200,000 infants, as well as increased maternal morbidity and mortality. Studies of genetic susceptibility to malaria have so far focused on infant malaria, with only a few studies investigating the genetic basis of placental malaria, focusing only on a limited number of candidate genes. The aim of this study therefore was to identify novel host genetic factors involved in placental malaria infection. To this end we carried out a nested case-control study on 180 Mozambican pregnant women with placental malaria infection, and 180 controls within an intervention trial of malaria prevention. We genotyped 880 SNPs in a set of 64 functionally related genes involved in glycosylation and innate immunity. A SNP located in the gene FUT9, rs3811070, was significantly associated with placental malaria infection (OR = 2.31, permutation p-value = 0.028). Haplotypic analysis revealed a similarly strong association of a common haplotype of four SNPs including rs3811070. FUT9 codes for a fucosyl-transferase that is catalyzing the last step in the biosynthesis of the Lewis-x antigen, which forms part of the Lewis blood group-related antigens. These results therefore suggest an involvement of this antigen in the pathogenesis of placental malaria infection.