931 resultados para wide genome sequencing
Resumo:
Embryonic stem cells (ESCs) possess two unique characteristics: infinite self-renewal and the potential to differentiate into almost every cell type (pluripotency). Recently, global expression analyses of metastatic breast and lung cancers revealed an ESC-like expression program or signature, specifically for cancers that are mutant for p53 function. Surprisingly, although p53 is widely recognized as the guardian of the genome, due to its roles in cell cycle checkpoints, programmed cell death or senescence, relatively little is known about p53 functions in normal cells, especially in ESCs. My hypothesis is that p53 has specific transcription regulatory functions in human ESCs (hESCs) that a) oppose pluripotency and b) protect the stem cell genome in response to DNA damage and stress signaling. In mouse ESCs, these roles are believed to coincide, as p53 promotes differentiation in response to DNA damage, but this is unexplored in hESCs. To determine the biological roles of p53, specifically in hESCs, we mapped genome-wide chromatin interactions of p53 by chromatin immunoprecipitation and massively parallel tag sequencing (ChIP-Seq), and did so under three VIdifferent conditions of hESC status: pluripotency, differentiation-initiated and DNA-damage-induced. ChIP-Seq showed that p53 is enriched at distinct, induction-specific gene loci during each of these different conditions. Microarray gene expression analysis and functional annotation of the distinct p53-target genes revealed that p53 regulates specific genes encoding developmental regulators, which are expressed in differentiation-initiated but not DNA- damaged hESCs. We further discovered that, in response to differentiation signaling, p53 binds regions of chromatin that are repressed but also poised for rapid activation by core pluripotency factors OCT4 and NANOG in pluripotent hESCs. In response to DNA damage, genes associated with migration and motility are targeted by p53; whereas, the prime targets of p53 in control of cell death are conserved for p53 regulation in both differentiation and DNA damage. Our genome-wide profiling and bioinformatics analyses show that p53 occupies a special set of developmental regulatory genes during early differentiation of hESCs and functions in an induction-specific manner. In conclusion, our research unveiled previously unknown functions of p53 in ESC biology, which augments our understanding of one of the most deregulated proteins in human cancers.
Resumo:
The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^
Resumo:
Background: Copy number variations (CNVs) have been shown to account for substantial portions of observed genomic variation and have been associated with qualitative and quantitative traits and the onset of disease in a number of species. Information from high-resolution studies to detect, characterize and estimate population-specific variant frequencies will facilitate the incorporation of CNVs in genomic studies to identify genes affecting traits of importance. Results: Genome-wide CNVs were detected in high-density single nucleotide polymorphism (SNP) genotyping data from 1,717 Nelore (Bos indicus) cattle, and in NGS data from eight key ancestral bulls. A total of 68,007 and 12,786 distinct CNVs were observed, respectively. Cross-comparisons of results obtained for the eight resequenced animals revealed that 92 % of the CNVs were observed in both datasets, while 62 % of all detected CNVs were observed to overlap with previously validated cattle copy number variant regions (CNVRs). Observed CNVs were used for obtaining breed-specific CNV frequencies and identification of CNVRs, which were subsequently used for gene annotation. A total of 688 of the detected CNVRs were observed to overlap with 286 non-redundant QTLs associated with important production traits in cattle. All of 34 CNVs previously reported to be associated with milk production traits in Holsteins were also observed in Nelore cattle. Comparisons of estimated frequencies of these CNVs in the two breeds revealed 14, 13, 6 and 14 regions in high (>20 %), low (<20 %) and divergent (NEL > HOL, NEL < HOL) frequencies, respectively. Conclusions: Obtained results significantly enriched the bovine CNV map and enabled the identification of variants that are potentially associated with traits under selection in Nelore cattle, particularly in genome regions harboring QTLs affecting production traits.
Resumo:
Background : In tropical countries, losses caused by bovine tick Rhipicephalus (Boophilus) microplus infestation have a tremendous economic impact on cattle production systems. Genetic variation between Bos taurus and Bos indicus to tick resistance and molecular biology tools might allow for the identification of molecular markers linked to resistance traits that could be used as an auxiliary tool in selection programs. The objective of this work was to identify QTL associated with tick resistance/susceptibility in a bovine F2 population derived from the Gyr (Bos indicus) x Holstein (Bos taurus) cross. Results: Through a whole genome scan with microsatellite markers, we were able to map six genomic regions associated with bovine tick resistance. For most QTL, we have found that depending on the tick evaluation season (dry and rainy) different sets of genes could be involved in the resistance mechanism. We identified dry season specific QTL on BTA 2 and 10, rainy season specific QTL on BTA 5, 11 and 27. We also found a highly significant genome wide QTL for both dry and rainy seasons in the central region of BTA 23. Conclusions: The experimental F2 population derived from Gyr x Holstein cross successfully allowed the identification of six highly significant QTL associated with tick resistance in cattle. QTL located on BTA 23 might be related with the bovine histocompatibility complex. Further investigation of these QTL will help to isolate candidate genes involved with tick resistance in cattle.
Resumo:
Background: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. Results: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10(-5) for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. Conclusions: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.
Resumo:
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation involved in susceptibility to etiologically complex disease. We conducted a GWAS to identify common genetic variation involved in susceptibility to upper aero-digestive tract (UADT) cancers. Genome-wide genotyping was carried out using the Illumina HumanHap300 beadchips in 2,091 UADT cancer cases and 3,513 controls from two large European multi-centre UADT cancer studies, as well as 4,821 generic controls. The 19 top-ranked variants were investigated further in an additional 6,514 UADT cancer cases and 7,892 controls of European descent from an additional 13 UADT cancer studies participating in the INHANCE consortium. Five common variants presented evidence for significant association in the combined analysis (p <= 5 x 10(-7)). Two novel variants were identified, a 4q21 variant (rs1494961, p = 1 x 10(-8)) located near DNA repair related genes HEL308 and FAM175A (or Abraxas) and a 12q24 variant (rs4767364, p = 2 x 10(-8)) located in an extended linkage disequilibrium region that contains multiple genes including the aldehyde dehydrogenase 2 (ALDH2) gene. Three remaining variants are located in the ADH gene cluster and were identified previously in a candidate gene study involving some of these samples. The association between these three variants and UADT cancers was independently replicated in 5,092 UADT cancer cases and 6,794 controls non-overlapping samples presented here (rs1573496-ADH7, p = 5 x 10(-8); rs1229984-ADH1B, p = 7 x 10(-9); and rs698-ADH1C, p = 0.02). These results implicate two variants at 4q21 and 12q24 and further highlight three ADH variants in UADT cancer susceptibility.
Resumo:
Background: The ideal malaria parasite populations for initial mapping of genomic regions contributing to phenotypes such as drug resistance and virulence, through genome-wide association studies, are those with high genetic diversity, allowing for numerous informative markers, and rare meiotic recombination, allowing for strong linkage disequilibrium (LD) between markers and phenotype-determining loci. However, levels of genetic diversity and LD in field populations of the major human malaria parasite P. vivax remain little characterized. Results: We examined single-nucleotide polymorphisms (SNPs) and LD patterns across a 100-kb chromosome segment of P. vivax in 238 field isolates from areas of low to moderate malaria endemicity in South America and Asia, where LD tends to be more extensive than in holoendemic populations, and in two monkey-adapted strains (Salvador-I, from El Salvador, and Belem, from Brazil). We found varying levels of SNP diversity and LD across populations, with the highest diversity and strongest LD in the area of lowest malaria transmission. We found several clusters of contiguous markers with rare meiotic recombination and characterized a relatively conserved haplotype structure among populations, suggesting the existence of recombination hotspots in the genome region analyzed. Both silent and nonsynonymous SNPs revealed substantial between-population differentiation, which accounted for similar to 40% of the overall genetic diversity observed. Although parasites clustered according to their continental origin, we found evidence for substructure within the Brazilian population of P. vivax. We also explored between-population differentiation patterns revealed by loci putatively affected by natural selection and found marked geographic variation in frequencies of nucleotide substitutions at the pvmdr-1 locus, putatively associated with drug resistance. Conclusion: These findings support the feasibility of genome-wide association studies in carefully selected populations of P. vivax, using relatively low densities of markers, but underscore the risk of false positives caused by population structure at both local and regional levels.
Resumo:
Serpentine receptors comprise a large family of membrane receptors distributed over diverse organisms, such as bacteria, fungi, plants and all metazoans. However, the presence of serpentine receptors in protozoan parasites is largely unknown so far. In the present study we performed a genome-wide search for proteins containing seven transmembrane domains (7TM) in the human malaria parasite Plasmodium falciparum and identified four serpentine receptor-like proteins. These proteins, denoted PfSR1, PfSR10, PfSR12 and PfSR25, show membrane topologies that resemble those exhibited by members belonging to different families of serpentine receptors. Expression of the pfsrs genes was detected by Real Time PCR in P. falciparum intraerythrocytic stages, indicating that they potentially code for functional proteins. We also found corresponding homologues for the PfSRs in five other Plasmodium species, two primate and three rodent parasites. PfSR10 and 25 are the most conserved receptors among the different species, while PfSR1 and 12 are more divergent. Interestingly, we found that PfSR10 and PfSR12 possess similarity to orphan serpentine receptors of other organisms. The identification of potential parasite membrane receptors raises a new perspective for essential aspects of malaria parasite host cell infection.
Resumo:
Background Meta-analysis is increasingly being employed as a screening procedure in large-scale association studies to select promising variants for follow-up studies. However, standard methods for meta-analysis require the assumption of an underlying genetic model, which is typically unknown a priori. This drawback can introduce model misspecifications, causing power to be suboptimal, or the evaluation of multiple genetic models, which augments the number of false-positive associations, ultimately leading to waste of resources with fruitless replication studies. We used simulated meta-analyses of large genetic association studies to investigate naive strategies of genetic model specification to optimize screenings of genome-wide meta-analysis signals for further replication. Methods Different methods, meta-analytical models and strategies were compared in terms of power and type-I error. Simulations were carried out for a binary trait in a wide range of true genetic models, genome-wide thresholds, minor allele frequencies (MAFs), odds ratios and between-study heterogeneity (tau(2)). Results Among the investigated strategies, a simple Bonferroni-corrected approach that fits both multiplicative and recessive models was found to be optimal in most examined scenarios, reducing the likelihood of false discoveries and enhancing power in scenarios with small MAFs either in the presence or in absence of heterogeneity. Nonetheless, this strategy is sensitive to tau(2) whenever the susceptibility allele is common (MAF epsilon 30%), resulting in an increased number of false-positive associations compared with an analysis that considers only the multiplicative model. Conclusion Invoking a simple Bonferroni adjustment and testing for both multiplicative and recessive models is fast and an optimal strategy in large meta-analysis-based screenings. However, care must be taken when examined variants are common, where specification of a multiplicative model alone may be preferable.
Resumo:
2016
Resumo:
Metabolic traits are molecular phenotypes that can drive clinical phenotypes and may predict disease progression. Here, we report results from a metabolome- and genome-wide association study on (1)H-NMR urine metabolic profiles. The study was conducted within an untargeted approach, employing a novel method for compound identification. From our discovery cohort of 835 Caucasian individuals who participated in the CoLaus study, we identified 139 suggestively significant (P<5×10(-8)) and independent associations between single nucleotide polymorphisms (SNP) and metabolome features. Fifty-six of these associations replicated in the TasteSensomics cohort, comprising 601 individuals from São Paulo of vastly diverse ethnic background. They correspond to eleven gene-metabolite associations, six of which had been previously identified in the urine metabolome and three in the serum metabolome. Our key novel findings are the associations of two SNPs with NMR spectral signatures pointing to fucose (rs492602, P = 6.9×10(-44)) and lysine (rs8101881, P = 1.2×10(-33)), respectively. Fine-mapping of the first locus pinpointed the FUT2 gene, which encodes a fucosyltransferase enzyme and has previously been associated with Crohn's disease. This implicates fucose as a potential prognostic disease marker, for which there is already published evidence from a mouse model. The second SNP lies within the SLC7A9 gene, rare mutations of which have been linked to severe kidney damage. The replication of previous associations and our new discoveries demonstrate the potential of untargeted metabolomics GWAS to robustly identify molecular disease markers.
Resumo:
Dramatic improvements in DNA sequencing technologies have led to amore than 1,000-fold reduction in sequencing costs over the past five years.Genome-wide research approaches can thus now be applied beyond medicallyrelevant questions to examine the molecular-genetic basis of behavior,development and unique life histories in almost any organism. A first step foran emerging model organism is usually establishing a reference genomesequence. I offer insight gained from the fire ant genome project. First, I detailhow the project came to be and how sequencing, assembly and annotationstrategies were chosen. Subsequently, I describe some of the issues linked toworking with data from recently sequenced genomes. Finally, I discuss anapproach undertaken in a follow-up project based on the fire ant genomesequence.
Resumo:
BACKGROUND & AIMS: Hepatitis C virus (HCV) induces chronic infection in 50% to 80% of infected persons; approximately 50% of these do not respond to therapy. We performed a genome-wide association study to screen for host genetic determinants of HCV persistence and response to therapy. METHODS: The analysis included 1362 individuals: 1015 with chronic hepatitis C and 347 who spontaneously cleared the virus (448 were coinfected with human immunodeficiency virus [HIV]). Responses to pegylated interferon alfa and ribavirin were assessed in 465 individuals. Associations between more than 500,000 single nucleotide polymorphisms (SNPs) and outcomes were assessed by multivariate logistic regression. RESULTS: Chronic hepatitis C was associated with SNPs in the IL28B locus, which encodes the antiviral cytokine interferon lambda. The rs8099917 minor allele was associated with progression to chronic HCV infection (odds ratio [OR], 2.31; 95% confidence interval [CI], 1.74-3.06; P = 6.07 x 10(-9)). The association was observed in HCV mono-infected (OR, 2.49; 95% CI, 1.64-3.79; P = 1.96 x 10(-5)) and HCV/HIV coinfected individuals (OR, 2.16; 95% CI, 1.47-3.18; P = 8.24 x 10(-5)). rs8099917 was also associated with failure to respond to therapy (OR, 5.19; 95% CI, 2.90-9.30; P = 3.11 x 10(-8)), with the strongest effects in patients with HCV genotype 1 or 4. This risk allele was identified in 24% of individuals with spontaneous HCV clearance, 32% of chronically infected patients who responded to therapy, and 58% who did not respond (P = 3.2 x 10(-10)). Resequencing of IL28B identified distinct haplotypes that were associated with the clinical phenotype. CONCLUSIONS: The association of the IL28B locus with natural and treatment-associated control of HCV indicates the importance of innate immunity and interferon lambda in the pathogenesis of HCV infection.
Resumo:
Rigorous organization and quality control (QC) are necessary to facilitate successful genome-wide association meta-analyses (GWAMAs) of statistics aggregated across multiple genome-wide association studies. This protocol provides guidelines for (i) organizational aspects of GWAMAs, and for (ii) QC at the study file level, the meta-level across studies and the meta-analysis output level. Real-world examples highlight issues experienced and solutions developed by the GIANT Consortium that has conducted meta-analyses including data from 125 studies comprising more than 330,000 individuals. We provide a general protocol for conducting GWAMAs and carrying out QC to minimize errors and to guarantee maximum use of the data. We also include details for the use of a powerful and flexible software package called EasyQC. Precise timings will be greatly influenced by consortium size. For consortia of comparable size to the GIANT Consortium, this protocol takes a minimum of about 10 months to complete.