1000 resultados para Francesc Raset Busquets -- Intervius
Resumo:
We have carried out an initial analysis of the dynamics of the recent evolution of the splice-sites sequences on a large collection of human, rodent (mouse and rat), and chicken introns. Our results indicate that the sequences of splice sites are largely homogeneous within tetrapoda. We have also found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites, but the additional conservation can be explained mostly by background intron conservation. In contrast, additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes seem to have evolved independently since the split of mammals and birds; we have not been able to find a convincing case of interconversion between these two classes in our collections of orthologous introns. Similarly, we have not found a single case of switching between AT-AC and GT-AG subtypes within U12 introns, suggesting that this event has been a rare occurrence in recent evolutionary times. Switching between GT-AG and the noncanonical GC-AG U2 subtypes, on the contrary, does not appear to be unusual; in particular, T to C mutations appear to be relatively well tolerated in GT-AG introns with very strong donor sites.
Resumo:
UEV proteins are enzymatically inactive variants of the E2 ubiquitin-conjugating enzymes that regulate noncanonical elongation of ubiquitin chains. In Saccharomyces cerevisiae, UEV is part of the RAD6-mediated error-free DNA repair pathway. In mammalian cells, UEV proteins can modulate c-FOS transcription and the G2-M transition of the cell cycle. Here we show that the UEV genes from phylogenetically distant organisms present a remarkable conservation in their exon–intron structure. We also show that the human UEV1 gene is fused with the previously unknown gene Kua. In Caenorhabditis elegans and Drosophila melanogaster, Kua and UEV are in separated loci, and are expressed as independent transcripts and proteins. In humans, Kua and UEV1 are adjacent genes, expressed either as separate transcripts encoding independent Kua and UEV1 proteins, or as a hybrid Kua–UEV transcript, encoding a two-domain protein. Kua proteins represent a novel class of conserved proteins with juxtamembrane histidine-rich motifs. Experiments with epitope-tagged proteins show that UEV1A is a nuclear protein, whereas both Kua and Kua–UEV localize to cytoplasmic structures, indicating that the Kua domain determines the cytoplasmic localization of Kua–UEV. Therefore, the addition of a Kua domain to UEV in the fused Kua–UEV protein confers new biological properties to this regulator of variant polyubiquitination.[Kua cDNAs isolated by RT-PCR and described in this paper have been deposited in the GenBank data library under accession nos. AF1155120 (H. sapiens) and AF152361 (D. melanogaster). Genomic clones containing UEV genes: S. cerevisiae, YGL087c (accession no. Z72609); S. pombe, c338 (accession no. AL023781); P. falciparum, MAL3P2 (accession no. AL034558); A. thaliana, F26F24 (accession no. AC005292); C. elegans, F39B2 (accession no. Z92834); D. melanogaster, AC014908; and H. sapiens, 1185N5 (accession no. AL034423). Accession numbers for Kua cDNAs in GenBank dbEST: M. musculus, AA7853; T. cruzi, AI612534. Other Kua-containing sequences: A. thaliana genomic clones F10M23 (accession no. AL035440), F19K23 (accession no. AC000375), and T20K9 (accession no. AC004786).
Resumo:
One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.
Resumo:
The completion of the sequencing of the mouse genome promises to help predict human genes with greater accuracy. While current ab initio gene prediction programs are remarkably sensitive (i.e., they predict at least a fragment of most genes), their specificity is often low, predicting a large number of false-positive genes in the human genome. Sequence conservation at the protein level with the mouse genome can help eliminate some of those false positives. Here we describe SGP2, a gene prediction program that combines ab initio gene prediction with TBLASTX searches between two genome sequences to provide both sensitive and specific gene predictions. The accuracy of SGP2 when used to predict genes by comparing the human and mouse genomes is assessed on a number of data sets, including single-gene data sets, the highly curated human chromosome 22 predictions, and entire genome predictions from ENSEMBL. Results indicate that SGP2 outperforms purely ab initio gene prediction methods. Results also indicate that SGP2 works about as well with 3x shotgun data as it does with fully assembled genomes. SGP2 provides a high enough specificity that its predictions can be experimentally verified at a reasonable cost. SGP2 was used to generate a complete set of gene predictions on both the human and mouse by comparing the genomes of these two species. Our results suggest that another few thousand human and mouse genes currently not in ENSEMBL are worth verifying experimentally.
Resumo:
The genetic characterization of Native Mexicans is important to understand multiethnic based features influencing the medical genetics of present Mexican populations, as well as to the reconstruct the peopling of the Americas. We describe the Y-chromosome genetic diversity of 197 Native Mexicans from 11 populations and 1,044 individuals from 44 Native American populations after combining with publicly available data. We found extensive heterogeneity among Native Mexican populations and ample segregation of Q-M242* (46%) and Q-M3 (54%) haplogroups within Mexico. The northernmost sampled populations falling outside Mesoamerica (Pima and Tarahumara) showed a clear differentiation with respect to the other populations, which is in agreement with previous results from mtDNA lineages. However, our results point toward a complex genetic makeup of Native Mexicans whose maternal and paternal lineages reveal different narratives of their population history, with sex-biased continental contributions and different admixture proportions. At a continental scale, we found that Arctic populations and the northernmost groups from North America cluster together, but we did not find a clear differentiation within Mesoamerica and the rest of the continent, which coupled with the fact that the majority of individuals from Central and South American samples are restricted to the Q-M3 branch, supports the notion that most Native Americans from Mesoamerica southwards are descendants from a single wave of migration. This observation is compatible with the idea that present day Mexico might have constituted an area of transition in the diversification of paternal lineages during the colonization of the Americas.
Resumo:
The human olfactory receptor repertoire is reduced in comparison to other mammalsand to other non-human primates. Nonetheless, this olfactory decline opens an opportunity forevolutionary innovation and improvement. In the present study, we focus on an olfactoryreceptor gene, OR5I1, which had previously been shown to present an excess of amino acidreplacement substitutions between humans and chimpanzees. We analyze the geneticvariation in OR5I1 in a large worldwide human panel and find an excess of derived allelessegregating at relatively high frequencies in all populations. Additional evidence for selectionincludes departures from neutrality in allele frequency spectra tests but no unusually extendedhaplotype structure. Moreover, molecular structural inference suggests that one of thenonsynonymous polymorphisms defining the presumably adaptive protein form of OR5I1may alter the functional binding properties of the olfactory receptor. These results arecompatible with positive selection having modeled the pattern of variation found in the OR5I1gene and with a relatively ancient, mild selective sweep predating the “Out of Africa”expansion of modern humans.
Resumo:
Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genomesequence. The experiment had two major goals: the assessment of the accuracy of computationalmethods to predict protein coding genes; and the overall assessment of the completeness of thecurrent human genome annotations as represented in the ENCODE regions. For thecomputational prediction assessment, eighteen groups contributed gene predictions. Weevaluated these submissions against each other based on a ‘reference set’ of annotationsgenerated as part of the GENCODE project. These annotations were not available to theprediction groups prior to the submission deadline, so that their predictions were blind and anexternal advisory committee could perform a fair assessment.Results: The best methods had at least one gene transcript correctly predicted for close to 70%of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into accountalternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotidelevel, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programsrelying on mRNA and protein sequences were the most accurate in reproducing the manuallycurated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could beverified.Conclusions: This is the first such experiment in human DNA, and we have followed thestandards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe theresults presented here contribute to the value of ongoing large-scale annotation projects and shouldguide further experimental methods when being scaled up to the entire human genome sequence.
Resumo:
Studies of large sets of SNP data have proven to be a powerful tool in the analysis of the genetic structure of human populations. In this work, we analyze genotyping data for 2,841 SNPs in 12 Sub-Saharan African populations, including a previously unsampled region of south-eastern Africa (Mozambique). We show that robust results in a world-wide perspective can be obtained when analyzing only 1,000 SNPs. Our main results both confirm the results of previous studies, and show new and interesting features in Sub-Saharan African genetic complexity. There is a strong differentiation of Nilo-Saharans, much beyond what would be expected by geography. Hunter-gatherer populations (Khoisan and Pygmies) show a clear distinctiveness with very intrinsic Pygmy (and not only Khoisan) genetic features. Populations of the West Africa present an unexpected similarity among them, possibly the result of a population expansion. Finally, we find a strong differentiation of the south-eastern Bantu population from Mozambique, which suggests an assimilation of a pre-Bantu substrate by Bantu speakers in the region.
Resumo:
Mesoamerica, defined as the broad linguistic and cultural area from middle southern Mexico to Costa Rica, might have played a pivotal role during the colonization of theAmerican continent. It has been suggested that the Mesoamerican isthmus could have played an important role in severely restricting prehistorically gene flow between North and SouthAmerica. Although the Native American component has been already described in admixedMexican populations, few studies have been carried out in native Mexican populations. In thisstudy we present mitochondrial DNA (mtDNA) sequence data for the first hypervariable region (HVR-I) in 477 unrelated individuals belonging to eleven different native populations from Mexico. Almost all the Native Mexican mtDNAs could be classified into the four pan-Amerindian haplogroups (A2, B2, C1 and D1); only three of them could be allocated to the rare Native American lineage D4h3. Their haplogroup phylogenies are clearly star-like, as expected from relatively young populations that have experienced diverse episodes of genetic drift (e.g. extensive isolation, genetic drift and founder effects) and posterior population expansions. In agreement with this observation is the fact that Native Mexican populations show a high degree of heterogeneity in their patterns of haplogroup frequencies. HaplogroupX2a was absent in our samples, supporting previous observations where this clade was only detected in the American northernmost areas. The search for identical sequences in the American continent shows that, although Native Mexican populations seem to show a closer relationship to North American populations, they cannot be related to a single geographical region within the continent. Finally, we did not find significant population structure on the maternal lineages when considering the four main and distinct linguistic groups represented in our Mexican samples (Oto-Manguean, Uto-Aztecan, Tarascan, and Mayan), suggesting that genetic divergence predates linguistic diversification in Mexico.
Resumo:
Background: It is well known that the pattern of linkage disequilibrium varies between human populations, with remarkable geographical stratification. Indirect association studies routinely exploit linkage disequilibrium around genes, particularly in isolated populations where it is assumed to be higher. Here, we explore both the amount and the decay of linkage disequilibrium with physical distance along 211 gene regions, most of them related to complex diseases, across 39 HGDP-CEPH population samples, focusing particularly on the populations defined as isolates. Within each gene region and population we use r2 between all possible single nucleotide polymorphism (SNP) pairs as a measure of linkage disequilibrium and focus on the proportion of SNP pairs with r2 greater than 0.8.Results: Although the average r2 was found to be significantly different both between and within continental regions, a much higher proportion of r2 variance could be attributed to differences between continental regions (2.8% vs. 0.5%, respectively). Similarly, while the proportion of SNP pairs with r2 > 0.8 was significantly different across continents for all distance classes, it was generally much more homogenous within continents, except in the case of Africa and the Americas. The only isolated populations with consistently higher LD in all distance classes with respect to their continent are the Kalash (Central South Asia) and the Surui (America). Moreover, isolated populations showed only slightly higher proportions of SNP pairs with r2 > 0.8 per gene region than non-isolated populations in the same continent. Thus, the number of SNPs in isolated populations that need to be genotyped may be only slightly less than in non-isolates. Conclusion: The "isolated population" label by itself does not guarantee a greater genotyping efficiency in association studies, and properties other than increased linkage disequilibrium may make these populations interesting in genetic epidemiology.
Resumo:
Background: Before the arrival of Europeans to Cuba, the island was inhabited by two Native American groups, the Tainos and the Ciboneys. Most of the present archaeological, linguistic and ancient DNA evidence indicates a South American origin for these populations. In colonial times, Cuban Native American people were replaced by European settlers and slaves from Africa. It is still unknown however, to what extent their genetic pool intermingled with and was 'diluted' by the arrival of newcomers. In order to investigate the demographic processes that gave rise to the current Cuban population, we analyzed the hypervariable region I (HVS-I) and five single nucleotide polymorphisms (SNPs) in the mitochondrial DNA (mtDNA) coding region in 245 individuals, and 40 Y-chromosome SNPs in 132 male individuals. Results: The Native American contribution to present-day Cubans accounted for 33% of the maternal lineages, whereas Africa and Eurasia contributed 45% and 22% of the lineages, respectively. This Native American substrate in Cuba cannot be traced back to a single origin within the American continent, as previously suggested by ancient DNA analyses. Strikingly, no Native American lineages were found for the Y-chromosome, for which the Eurasian and African contributions were around 80% and 20%, respectively. Conclusion: While the ancestral Native American substrate is still appreciable in the maternal lineages, the extensive process of population admixture in Cuba has left no trace of the paternal Native American lineages, mirroring the strong sexual bias in the admixture processes taking place during colonial times.
Resumo:
Background: We address the problem of studying recombinational variations in (human) populations. In this paper, our focus is on one computational aspect of the general task: Given two networks G1 and G2, with both mutation and recombination events, defined on overlapping sets of extant units the objective is to compute a consensus network G3 with minimum number of additional recombinations. We describe a polynomial time algorithm with a guarantee that the number of computed new recombination events is within ϵ = sz(G1, G2) (function sz is a well-behaved function of the sizes and topologies of G1 and G2) of the optimal number of recombinations. To date, this is the best known result for a network consensus problem.Results: Although the network consensus problem can be applied to a variety of domains, here we focus on structure of human populations. With our preliminary analysis on a segment of the human Chromosome X data we are able to infer ancient recombinations, population-specific recombinations and more, which also support the widely accepted 'Out of Africa' model. These results have been verified independently using traditional manual procedures. To the best of our knowledge, this is the first recombinations-based characterization of human populations. Conclusion: We show that our mathematical model identifies recombination spots in the individual haplotypes; the aggregate of these spots over a set of haplotypes defines a recombinational landscape that has enough signal to detect continental as well as population divide based on a short segment of Chromosome X. In particular, we are able to infer ancient recombinations, population-specific recombinations and more, which also support the widely accepted 'Out of Africa' model. The agreement with mutation-based analysis can be viewed as an indirect validation of our results and the model. Since the model in principle gives us more information embedded in the networks, in our future work, we plan to investigate more non-traditional questions via these structures computed by our methodology.
Resumo:
Background: The human FOXI1 gene codes for a transcription factor involved in the physiology of the inner ear, testis, and kidney. Using three interspecies comparisons, it has been suggested that this may be a gene underhuman-specific selection. We sought to confirm this finding by using an extended set of orthologous sequences.Additionally, we explored for signals of natural selection within humans by sequencing the gene in 20 Europeans,20 East Asians and 20 Yorubas and by analysing SNP variation in a 2 Mb region centered on FOXI1 in 39worldwide human populations from the HGDP-CEPH diversity panel.Results: The genome sequences recently available from other primate and non-primate species showed that FOXI1divergence patterns are compatible with neutral evolution. Sequence-based neutrality tests were not significant inEuropeans, East Asians or Yorubas. However, the Long Range Haplotype (LRH) test, as well as the iHS and XP-Rsbstatistics revealed significantly extended tracks of homozygosity around FOXI1 in Africa, suggesting a recentepisode of positive selection acting on this gene. A functionally relevant SNP, as well as several SNPs either on theputatively selected core haplotypes or with significant iHS or XP-Rsb values, displayed allele frequencies stronglycorrelated with the absolute geographical latitude of the populations sampled.Conclusions: We present evidence for recent positive selection in the FOXI1 gene region in Africa. Climate mightbe related to this recent adaptive event in humans. Of the multiple functions of FOXI1, its role in kidney-mediatedwater-electrolyte homeostasis is the most obvious candidate for explaining a climate-related adaptation.
Resumo:
Background: Data provided by the social sciences as well as genetic research suggest that the 8-10 million Roma (Gypsies) who live in Europe today are best described as a conglomerate of genetically isolated founder populations. The relationship between the traditional social structure observed by the Roma, where the Group is the primary unit, and the boundaries, demographic history and biological relatedness of the diverse founder populations appears complex and has not been addressed by population genetic studies. Results: Recent medical genetic research has identified a number of novel, or previously known but rare conditions, caused by private founder mutations. A summary of the findings, provided in this review, should assist diagnosis and counselling in affected families, and promote future collaborative research. The available incomplete epidemiological data suggest a non-random distribution of disease-causing mutations among Romani groups.Conclusion: Although far from systematic, the published information indicates that medical genetics has an important role to play in improving the health of this underprivileged and forgotten people of Europe. Reported carrier rates for some Mendelian disorders are in the range of 5 -15%, sufficient to justify newborn screening and early treatment, or community-based education and carrier testing programs for disorders where no therapy is currently available. To be most productive, future studies of the epidemiology of single gene disorders should take social organisation and cultural anthropology into consideration, thus allowing the targeting of public health programs and contributing to the understanding of population structure and demographic history of the Roma.
Resumo:
Background: A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners. Results: The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST. Conclusion: Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.