957 resultados para human genome variation
Resumo:
We analyze here the relation between alternative splicing and gene duplication in light of recent genomic data, with a focus on the human genome. We show that the previously reported negative correlation between level of alternative splicing and family size no longer holds true. We clarify this pattern and show that it is sufficiently explained by two factors. First, genes progressively gain new splice variants with time. The gain is consistent with a selectively relaxed regime, until purifying selection slows it down as aging genes accumulate a large number of variants. Second, we show that duplication does not lead to a loss of splice forms, but rather that genes with low levels of alternative splicing tend to duplicate more frequently. This leads us to reconsider the role of alternative splicing in duplicate retention.
Current millennium biotechniques for biomedical research on parasites and host-parasite interactions
Resumo:
The development of biotechnology in the last three decades has generated the feeling that the newest scientific achievements will deliver high standard quality of life through abundance of food and means for successfully combating diseases. Where the new biotechnologies give access to genetic information, there is a common belief that physiological and pathological processes result from subtle modifications of gene expression. Trustfully, modern genetics has produced genetic maps, physical maps and complete nucleotide sequences from 141 viruses, 51 organelles, two eubacteria, one archeon and one eukaryote (Saccharomices cerevisiae). In addition, during the Centennial Commemoration of the Oswaldo Cruz Institute the nearly complete human genome map was proudly announced, whereas the latest Brazilian key stone contribution to science was the publication of the Shillela fastidiosa genomic sequence highlythed on a Nature cover issue. There exists a belief among the populace that further scientific accomplishments will rapidly lead to new drugs and methodological approaches to cure genetic diseases and other incurable ailments. Yet, much evidence has been accumulated, showing that a large information gap exists between the knowledge of genome sequence and our knowledge of genome function. Now that many genome maps are available, people wish to know what are we going to do with them. Certainly, all these scientific accomplishments will shed light on many more secrets of life. Nevertheless, parsimony in the weekly announcements of promising scientific achievements is necessary. We also need many more creative experimental biologists to discover new, as yet un-envisaged biotechnological approaches, and the basic resource needed for carrying out mile stone research necessary for leading us to that "promised land"often proclaimed by the mass media.
Resumo:
Desde el inicio del proyecto del genoma humano y su éxito en el año 2001 se han secuenciado genomas de multitud de especies. La mejora en las tecnologías de secuenciación ha generado volúmenes de datos con un crecimiento exponencial. El proyecto Análisis bioinformáticos sobre la tecnología Hadoop abarca la computación paralela de datos biológicos como son las secuencias de ADN. El estudio ha sido encauzado por la naturaleza del problema a resolver. El alineamiento de secuencias genéticas con el paradigma MapReduce.
Resumo:
An essential step of the life cycle of retroviruses is the stable insertion of a copy of their DNA genome into the host cell genome, and lentiviruses are no exception. This integration step, catalyzed by the viral-encoded integrase, ensures long-term expression of the viral genes, thus allowing a productive viral replication and rendering retroviral vectors also attractive for the field of gene therapy. At the same time, this ability to integrate into the host genome raises safety concerns regarding the use of retroviral-based gene therapy vectors, due to the genomic locations of integration sites. The availability of the human genome sequence made possible the analysis of the integration site preferences, which revealed to be nonrandom and retrovirus-specific, i.e. all lentiviruses studied so far favor integration in active transcription units, while other retroviruses have a different integration site distribution. Several mechanisms have been proposed that may influence integration targeting, which include (i) chromatin accessibility, (ii) cell cycle effects, and (iii) tethering proteins. Recent data provide evidence that integration site selection can occur via a tethering mechanism, through the recruitment of the lentiviral integrase by the cellular LEDGF/p75 protein, both proteins being the two major players in lentiviral integration targeting.
Resumo:
The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (approximately 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome.
Resumo:
To provide a novel resource for analysis of the genome of Biomphalaria glabrata, members of the international Biomphalaria glabrata Genome Initiative (biology.unm.edu/biomphalaria-genome.html), working with the Arizona Genomics Institute (AGI) and supported by the National Human Genome Research Institute (NHGRI), produced a high quality bacterial artificial chromosome (BAC) library. The BB02 strain B. glabrata, a field isolate (Belo Horizonte, Minas Gerais, Brasil) that is susceptible to several strains of Schistosoma mansoni, was selfed for two generations to reduce haplotype diversity in the offspring. High molecular weight DNA was isolated from ovotestes of 40 snails, partially digested with HindIII, and ligated into pAGIBAC1 vector. The resulting B. glabrata BAC library (BG_BBa) consists of 61824 clones (136.3 kb average insert size) and provides 9.05 × coverage of the 931 Mb genome. Probing with single/low copy number genes from B. glabrata and fingerprinting of selected BAC clones indicated that the BAC library sufficiently represents the gene complement. BAC end sequence data (514 reads, 299860 nt) indicated that the genome of B. glabrata contains ~ 63% AT, and disclosed several novel genes, transposable elements, and groups of high frequency sequence elements. This BG_BBa BAC library, available from AGI at cost to the research community, gains in relevance because BB02 strain B. glabrata is targeted whole genome sequencing by NHGRI.
Resumo:
Analysing human genetic variation provides a powerful tool in understanding risk factors for disease. Toxoplasma gondii acquired by the mother can be transmitted to the fetus. Infants with the most severe clinical signs in brain and eye are those infected early in pregnancy when fetal immunity is least well developed. Genetic analysis could provide unique insight into events in utero that are otherwise difficult to determine. We tested the hypothesis that propensity for T. gondii to cause eye disease is associated with genes previously implicated in congenital or juvenile onset ocular disease. Using mother-child pairs from Europe (EMSCOT) and child/parent trios from North America (NCCCTS), we demonstrated that ocular and brain disease in congenital toxoplasmosis associate with polymorphisms in ABCA4 encoding ATP-binding cassette transporter, subfamily A, member 4 previously associated with juvenile onset retinal dystrophies including Stargardt's disease. Polymorphisms at COL2A1 encoding type II collagen, previously associated with Stickler syndrome, associated only with ocular disease in congenital toxoplasmosis. Experimental studies showed that both ABCA4 and COL2A1 show isoform-specific epigenetic modifications consistent with imprinting, which provided an explanation for the patterns of inheritance observed. These genetic and epigenetic risk factors provide unique insight into molecular pathways in the pathogenesis of disease.
Resumo:
Several epidemiological studies have related an increase of lipids in the postprandial state to an individual risk for the development of CVD, possibly due to the increased plasma levels of TAG and fatty acids (FA) through enzymes of FA metabolism. The interaction between nutrition and the human genome determines gene expression and metabolic response. The aim of the present study was to evaluate the influence of a fat overload on the gene mRNA levels of lipogenic regulators in peripheral blood mononuclear cells (PBMC) from patients with the metabolic syndrome. The study included twenty-one patients with criteria for the metabolic syndrome who underwent a fat overload. Measurements were made before and after the fat overload of anthropometric and biochemical variables and also the gene mRNA levels of lipogenic factors. The main results were that the fat overload led to an increased mRNA levels of sterol regulatory element binding protein-1 (SREBP1), retinoid X receptor α (RXRα) and liver X receptor α (LXRα) in PBMC, and this increase was associated with the FA synthase (FASN) mRNA levels. We also found that TAG levels correlated with FASN mRNA levels. In addition, there was a positive correlation of SREBP1 with RXRα and of LXRα with the plasma lipoperoxide concentration. The fat overload led to an increase in regulators of lipogenesis in PBMC from patients with the metabolic syndrome.
Resumo:
The stable insertion of a copy of their genome into the host cell genome is an essential step of the life cycle of retroviruses. The site of viral DNA integration, mediated by the viral-encoded integrase enzyme, has important consequences for both the virus and the host cell. The analysis of retroviral integration site distribution was facilitated by the availability of the human genome sequence, revealing the non-random feature of integration site selection and identifying different favored and disfavored genomic locations for individual retroviruses. This review will summarize the current knowledge about retroviral differences in their integration site preferences as well as the mechanisms involved in this process.
Resumo:
SUMMARY Following the complete sequencing of the human genome, the field of nutrition has begun utilizing this vast quantity of information to comprehensively explore the interactions between diet and genes. This approach, coined nutrigenomics, aims to determine the influence of common dietary ingredients on the genome, and attempts to relate the resulting different phenotypes to differences in the cellular and/or genetic response of the biological system. However, complementary to defining the biological outcomes of dietary ingredients, we must also understand the influence of the multiple factors (such as the microbiota, bile, and function of transporters) that may contribute to the bioavailability, and ultimately bioefficacy, of these ingredients. The gastrointestinal tract (GIT) is the body's foremost tissue boundary, interacting with nutrients, exogenous compounds and microbiota, and whose condition is influenced by the complex interplay between these environmental factors and genetic elements. In order to understand GIT nutrient-gene interactions, our goal was to comprehensively elucidate the region-specific gene expression underlying intestinal functions. We found important regional differences in the expression of members of the ATP-binding cassette family of transporters in the mouse intestine, suggesting that absorption of dietary compounds may vary along the GIT. Furthermore, the influence of the microbiota on host gene expression indicated that this luminal factor predominantly influences immune function and water transport throughout the GIT; however, the identification of region-specific functions suggest distinct host-bacterial interactions along the GIT. Thus, these findings reinforce that to understand nutrient bioavailability and GIT function, one must consider the physiologically distinct regions of the gut. Nutritional molecules absorbed by the enterocytes of the GIT enter circulation and will be selectively absorbed and metabolised by tissues throughout the body; however, their bioefficacy in the body will depend on the unique and shared molecular mechanisms of the various tissues. Using a nutrigenomic approach, the biological responses of the liver and hippocampus of mice fed different long chain-polyunsaturated fatty acids diets revealed tissue-specific responses. Furthermore, we identified stearoyl-CoA desaturase as a hepatic target for arachidonic acid, suggesting a potentially novel molecular mechanism that may protect against diet-induced obesity. In summary, this work begins to unveil the fundamentally important role that nutrigenomics will play in unravelling the molecular mechanisms, and those exogenous factors capable of influencing these mechanisms, that regulate the bioefficacy of nutritional molecules. RÉSUMÉ Suite au séquençage complet du génome humain, le domaine de la nutrition a commencé à utiliser cette vaste quantité d'information pour explorer de manière globale les interactions entre la nourriture et les gènes. Cette approche, appelée « nutrigenomics », a pour but de déterminer l'influence d'ingrédients couramment utilisés dans l'alimentation sur le génome, et d'essayer de relier ces différents phénotypes, ainsi révélés, à des différences de réponses cellulaires et/ou génétiques. Cependant, en plus de définir les effets biologiques d'ingrédients alimentaires, il est important de comprendre l'influence des multiples facteurs (telle que la microflore, la bile et la fonction des transporteurs) pouvant contribuer à la bio- disponibilité et par conséquent à l'efficacité de ces ingrédients. Le tractus gastro-intestinal (TGI), qui est la première barrière vers les tissus, interagit avec les nutriments, les composés exogènes et la microflore. La fonction de cet organe est influencée par les interactions complexes entre les facteurs environnementaux et les éléments génétiques. Dans le but de comprendre les interactions entre les nutriments et les gènes au niveau du TGI, notre objectif a été de décrire de manière globale l'expression génique spécifique de chaque région de l'intestin définissant leurs fonctions. Nous avons trouvé d'importantes différences régionales dans l'expression des transporteurs de la famille des « ATP-binding cassette transporter » dans l'intestin de souris, suggérant que l'absorption des composés alimentaires puisse varier le long de l'intestin. De plus, l'étude des effets de la microflore sur l'expression des gènes hôtes a indiqué que ce facteur de la lumière intestinale influence surtout la fonction immunitaire et le transport de l'eau à travers l'intestin. Cependant, l'identification des fonctions spécifiques de chaque région suggère des interactions distinctes entre l'hôte et les bactéries le long de l'intestin. Ainsi, ces résultats renforcent l'idée que la compréhension de la bio-disponibilité des nutriments, et par conséquent la fonction du TGI, doit prendre en considération les différences régionales. Les molécules nutritionnelles transportées par les entérocytes jusqu'à la circulation sanguine, sont ensuite sélectivement absorbées et métabolisées par les différents tissus de l'organisme. Cependant, leur efficacité biologique dépendra du mécanisme commun ou spécifique de chaque tissu. En utilisant une approche « nutriogenomics », nous avons pu mettre en évidence les réponses biologiques spécifiques du foie et de l'hippocampe de souris nourris avec des régimes supplémentés avec différents acides gras poly-insaturés à chaîne longue. De plus, nous avons identifié la stearoyl-CoA desaturase comme une cible hépatique pour l'acide arachidonique, suggérant un nouveau mécanisme moléculaire pouvant potentiellement protéger contre le développement de l'obésité. En résumé, ce travail a permis de dévoiler le rôle fondamental qu'une approche telle que la « nutrigenomics » peut jouer dans le décryptage des mécanismes moléculaires et de leur régulation par des facteurs exogènes, qui ensemble vont contrôler l'efficacité biologique des nutriments.
Resumo:
Extracellular calcium participates in several key physiological functions, such as control of blood coagulation, bone calcification or muscle contraction. Calcium homeostasis in humans is regulated in part by genetic factors, as illustrated by rare monogenic diseases characterized by hypo or hypercalcaemia. Both serum calcium and urinary calcium excretion are heritable continuous traits in humans. Serum calcium levels are tightly regulated by two main hormonal systems, i.e. parathyroid hormone and vitamin D, which are themselves also influenced by genetic factors. Recent technological advances in molecular biology allow for the screening of the human genome at an unprecedented level of detail and using hypothesis-free approaches, such as genome-wide association studies (GWAS). GWAS identified novel loci for calcium-related phenotypes (i.e. serum calcium and 25-OH vitamin D) that shed new light on the biology of calcium in humans. The substantial overlap (i.e. CYP24A1, CASR, GATA3; CYP2R1) between genes involved in rare monogenic diseases and genes located within loci identified in GWAS suggests a genetic and phenotypic continuum between monogenic diseases of calcium homeostasis and slight disturbances of calcium homeostasis in the general population. Future studies using whole-exome and whole-genome sequencing will further advance our understanding of the genetic architecture of calcium homeostasis in humans. These findings will likely provide new insight into the complex mechanisms involved in calcium homeostasis and hopefully lead to novel preventive and therapeutic approaches. Keyword: calcium, monogenic, genome-wide association studies, genetics.
Resumo:
BACKGROUND: It is unknown why patients with extensive ulcerative colitis (UC) have a higher risk of colorectal cancer compared with patients with left-sided UC. This study characterizes the inflammatory processes in left-sided UC, pancolitis, and UC-associated dysplasia at the transcriptional level to identify potential biomarkers and transcripts of importance for the carcinogenic behavior of chronic inflammation. METHODS: The Affymetrix GeneChip Human Genome U133 Plus 2.0 was applied on colonic biopsies from UC patients with left-sided UC, pancolitis, dysplasia, and controls. Reverse transcription polymerase chain reaction and immunohistochemistry were performed for validating selected transcripts in the initial cohort and in 2 independent cohorts of patients with UC. Microarray data were analyzed by principal component analysis, and reverse transcription polymerase chain reaction and immunohistochemistry data by the Wilcoxon's rank-sum test. RESULTS: The principal component analysis results revealed separate clusters for left-sided UC, pancolitis, dysplasia, and controls. Close clustering of dysplastic and pancolitic samples indicated similarities in gene expression. Indeed, 101 and 656 parallel upregulated and downregulated transcripts, respectively, were identified in specimens from dysplasia and pancolitis. Validation of selected transcripts hereof identified insulin receptor alpha (INSRA) and MAP kinase interacting serine/threonine kinase 2 (MKNK2) with an enhanced expression in dysplasia compared with left-sided UC and controls, whereas laminin γ2 (LAMC2) was found with a lower expression in dysplasia compared with the remaining 3 groups. CONCLUSIONS: This study demonstrates pancolitis and left-sided UC as distinct inflammatory processes at the transcriptional level, and identifies INSRA, MKNK2, and LAMC2 as potential critical transcripts in the inflammation-driven preneoplastic process of UC.
Resumo:
One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.
Resumo:
For the ∼1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of “unannotated transcription.” We use a number of disparate features to classify the 6988 novel TARs—array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that ∼14% of the novel TARs can be associated with known genes, while ∼21% can be clustered into ∼200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.
Resumo:
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.