973 resultados para Human Genome
Resumo:
BACKGROUND: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. RESULTS: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. CONCLUSION: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.
Resumo:
Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.
Resumo:
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Resumo:
Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.
Resumo:
Huntington's disease (HD) pathology is well understood at a histological level but a comprehensive molecular analysis of the effect of the disease in the human brain has not previously been available. To elucidate the molecular phenotype of HD on a genome-wide scale, we compared mRNA profiles from 44 human HD brains with those from 36 unaffected controls using microarray analysis. Four brain regions were analyzed: caudate nucleus, cerebellum, prefrontal association cortex [Brodmann's area 9 (BA9)] and motor cortex [Brodmann's area 4 (BA4)]. The greatest number and magnitude of differentially expressed mRNAs were detected in the caudate nucleus, followed by motor cortex, then cerebellum. Thus, the molecular phenotype of HD generally parallels established neuropathology. Surprisingly, no mRNA changes were detected in prefrontal association cortex, thereby revealing subtleties of pathology not previously disclosed by histological methods. To establish that the observed changes were not simply the result of cell loss, we examined mRNA levels in laser-capture microdissected neurons from Grade 1 HD caudate compared to control. These analyses confirmed changes in expression seen in tissue homogenates; we thus conclude that mRNA changes are not attributable to cell loss alone. These data from bona fide HD brains comprise an important reference for hypotheses related to HD and other neurodegenerative diseases.
Resumo:
BACKGROUND: Small RNAs (sRNAs) are widespread among bacteria and have diverse regulatory roles. Most of these sRNAs have been discovered by a combination of computational and experimental methods. In Pseudomonas aeruginosa, a ubiquitous Gram-negative bacterium and opportunistic human pathogen, the GacS/GacA two-component system positively controls the transcription of two sRNAs (RsmY, RsmZ), which are crucial for the expression of genes involved in virulence. In the biocontrol bacterium Pseudomonas fluorescens CHA0, three GacA-controlled sRNAs (RsmX, RsmY, RsmZ) regulate the response to oxidative stress and the expression of extracellular products including biocontrol factors. RsmX, RsmY and RsmZ contain multiple unpaired GGA motifs and control the expression of target mRNAs at the translational level, by sequestration of translational repressor proteins of the RsmA family. RESULTS: A combined computational and experimental approach enabled us to identify 14 intergenic regions encoding sRNAs in P. aeruginosa. Eight of these regions encode newly identified sRNAs. The intergenic region 1698 was found to specify a novel GacA-controlled sRNA termed RgsA. GacA regulation appeared to be indirect. In P. fluorescens CHA0, an RgsA homolog was also expressed under positive GacA control. This 120-nt sRNA contained a single GGA motif and, unlike RsmX, RsmY and RsmZ, was unable to derepress translation of the hcnA gene (involved in the biosynthesis of the biocontrol factor hydrogen cyanide), but contributed to the bacterium's resistance to hydrogen peroxide. In both P. aeruginosa and P. fluorescens the stress sigma factor RpoS was essential for RgsA expression. CONCLUSION: The discovery of an additional sRNA expressed under GacA control in two Pseudomonas species highlights the complexity of this global regulatory system and suggests that the mode of action of GacA control may be more elaborate than previously suspected. Our results also confirm that several GGA motifs are required in an sRNA for sequestration of the RsmA protein.
Resumo:
The genotyping of human papillomaviruses (HPV) is essential for the surveillance of HPV vaccines. We describe and validate a low-cost PGMY-based PCR assay (PGMY-CHUV) for the genotyping of 31 HPV by reverse blotting hybridization (RBH). Genotype-specific detection limits were 50 to 500 genome equivalents per reaction. RBH was 100% specific and 98.61% sensitive using DNA sequencing as the gold standard (n = 1,024 samples). PGMY-CHUV was compared to the validated and commercially available linear array (Roche) on 200 samples. Both assays identified the same positive (n = 182) and negative samples (n = 18). Seventy-six percent of the positives were fully concordant after restricting the comparison to the 28 genotypes shared by both assays. At the genotypic level, agreement was 83% (285/344 genotype-sample combinations; κ of 0.987 for single infections and 0.853 for multiple infections). Fifty-seven of the 59 discordant cases were associated with multiple infections and with the weakest genotypes within each sample (P < 0.0001). PGMY-CHUV was significantly more sensitive for HPV56 (P = 0.0026) and could unambiguously identify HPV52 in mixed infections. PGMY-CHUV was reproducible on repeat testing (n = 275 samples; 392 genotype-sample combinations; κ of 0.933) involving different reagents lots and different technicians. Discordant results (n = 47) were significantly associated with the weakest genotypes in samples with multiple infections (P < 0.0001). Successful participation in proficiency testing also supported the robustness of this assay. The PGMY-CHUV reagent costs were estimated at $2.40 per sample using the least expensive yet proficient genotyping algorithm that also included quality control. This assay may be used in low-resource laboratories that have sufficient manpower and PCR expertise.
Resumo:
Adenovirus serotype 5 (Ad5) vectors and specific neutralizing antibodies (NAbs) generate immune complexes (ICs) which are potent inducers of dendritic cell (DC) maturation. Here we show that ICs generated with rare Ad vector serotypes, such as Ad26 and Ad35, which are lead candidates in HIV vaccine development, are poor inducers of DC maturation and that their potency in inducing DC maturation strongly correlated with the number of Toll-like receptor 9 (TLR9)-agonist motifs present in the Ad vector's genome. In addition, we showed that antihexon but not antifiber antibodies are responsible for the induction of Ad IC-mediated DC maturation.
Resumo:
BACKGROUND: LDL cholesterol has a causal role in the development of cardiovascular disease. Improved understanding of the biological mechanisms that underlie the metabolism and regulation of LDL cholesterol might help to identify novel therapeutic targets. We therefore did a genome-wide association study of LDL-cholesterol concentrations. METHODS: We used genome-wide association data from up to 11,685 participants with measures of circulating LDL-cholesterol concentrations across five studies, including data for 293 461 autosomal single nucleotide polymorphisms (SNPs) with a minor allele frequency of 5% or more that passed our quality control criteria. We also used data from a second genome-wide array in up to 4337 participants from three of these five studies, with data for 290,140 SNPs. We did replication studies in two independent populations consisting of up to 4979 participants. Statistical approaches, including meta-analysis and linkage disequilibrium plots, were used to refine association signals; we analysed pooled data from all seven populations to determine the effect of each SNP on variations in circulating LDL-cholesterol concentrations. FINDINGS: In our initial scan, we found two SNPs (rs599839 [p=1.7x10(-15)] and rs4970834 [p=3.0x10(-11)]) that showed genome-wide statistical association with LDL cholesterol at chromosomal locus 1p13.3. The second genome screen found a third statistically associated SNP at the same locus (rs646776 [p=4.3x10(-9)]). Meta-analysis of data from all studies showed an association of SNPs rs599839 (combined p=1.2x10(-33)) and rs646776 (p=4.8x10(-20)) with LDL-cholesterol concentrations. SNPs rs599839 and rs646776 both explained around 1% of the variation in circulating LDL-cholesterol concentrations and were associated with about 15% of an SD change in LDL cholesterol per allele, assuming an SD of 1 mmol/L. INTERPRETATION: We found evidence for a novel locus for LDL cholesterol on chromosome 1p13.3. These results potentially provide insight into the biological mechanisms that underlie the regulation of LDL cholesterol and might help in the discovery of novel therapeutic targets for cardiovascular disease.
Resumo:
Recently, a locus centred on rs9273349 in the HLA-DQ region emerged from genome-wide association studies of adult-onset asthma. We aimed to further investigate the role of human leukocyte antigen (HLA) class II in adult-onset asthma and a possible interaction with occupational exposures. We imputed classical HLA-II alleles from 7579 single-nucleotide polymorphisms in 6025 subjects (1202 with adult-onset asthma) from European cohorts: ECRHS, SAPALDIA, EGEA and B58C, and from surveys of bakers and agricultural workers. Based on an asthma-specific job-exposure matrix, 2629 subjects had ever been exposed to high molecular weight (HMW) allergens. We explored associations between 23 common HLA-II alleles and adult-onset asthma, and tested for gene-environment interaction with occupational exposure to HMW allergens. Interaction was also tested for rs9273349. Marginal associations of classical HLA-II alleles and adult-onset asthma were not statistically significant. Interaction was detected between the DPB1*03:01 allele and exposure to HMW allergens (p = 0.009), in particular to latex (p = 0.01). In the unexposed group, the DPB1*03:01 allele was associated with adult-onset asthma (OR 0.67, 95%CI 0.53-0.86). HMW allergen exposures did not modify the association of rs9273349 with adult-onset asthma. Common classical HLA-II alleles were not marginally associated with adult-onset asthma. The association of latex exposure and adult-onset asthma may be modified by DPB1*03:01.
Resumo:
OBJECTIVE: Proinsulin is a precursor of mature insulin and C-peptide. Higher circulating proinsulin levels are associated with impaired β-cell function, raised glucose levels, insulin resistance, and type 2 diabetes (T2D). Studies of the insulin processing pathway could provide new insights about T2D pathophysiology. RESEARCH DESIGN AND METHODS: We have conducted a meta-analysis of genome-wide association tests of ∼2.5 million genotyped or imputed single nucleotide polymorphisms (SNPs) and fasting proinsulin levels in 10,701 nondiabetic adults of European ancestry, with follow-up of 23 loci in up to 16,378 individuals, using additive genetic models adjusted for age, sex, fasting insulin, and study-specific covariates. RESULTS: Nine SNPs at eight loci were associated with proinsulin levels (P < 5 × 10(-8)). Two loci (LARP6 and SGSM2) have not been previously related to metabolic traits, one (MADD) has been associated with fasting glucose, one (PCSK1) has been implicated in obesity, and four (TCF7L2, SLC30A8, VPS13C/C2CD4A/B, and ARAP1, formerly CENTD2) increase T2D risk. The proinsulin-raising allele of ARAP1 was associated with a lower fasting glucose (P = 1.7 × 10(-4)), improved β-cell function (P = 1.1 × 10(-5)), and lower risk of T2D (odds ratio 0.88; P = 7.8 × 10(-6)). Notably, PCSK1 encodes the protein prohormone convertase 1/3, the first enzyme in the insulin processing pathway. A genotype score composed of the nine proinsulin-raising alleles was not associated with coronary disease in two large case-control datasets. CONCLUSIONS: We have identified nine genetic variants associated with fasting proinsulin. Our findings illuminate the biology underlying glucose homeostasis and T2D development in humans and argue against a direct role of proinsulin in coronary artery disease pathogenesis.
Resumo:
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Resumo:
To identify genetic loci influencing central obesity and fat distribution, we performed a meta-analysis of 16 genome-wide association studies (GWAS, N = 38,580) informative for adult waist circumference (WC) and waist-hip ratio (WHR). We selected 26 SNPs for follow-up, for which the evidence of association with measures of central adiposity (WC and/or WHR) was strong and disproportionate to that for overall adiposity or height. Follow-up studies in a maximum of 70,689 individuals identified two loci strongly associated with measures of central adiposity; these map near TFAP2B (WC, P = 1.9x10(-11)) and MSRA (WC, P = 8.9x10(-9)). A third locus, near LYPLAL1, was associated with WHR in women only (P = 2.6x10(-8)). The variants near TFAP2B appear to influence central adiposity through an effect on overall obesity/fat-mass, whereas LYPLAL1 displays a strong female-only association with fat distribution. By focusing on anthropometric measures of central obesity and fat distribution, we have identified three loci implicated in the regulation of human adiposity.
Resumo:
Pneumocystis jirovecii is a fungal parasite that colonizes specifically humans and turns into an opportunistic pathogen in immunodeficient individuals. The fungus is able to reproduce extracellularly in host lungs without eliciting massive cellular death. The molecular mechanisms that govern this process are poorly understood, in part because of the lack of an in vitro culture system for Pneumocystis spp. In this study, we explored the origin and evolution of the putative biotrophy of P. jirovecii through comparative genomics and reconstruction of ancestral gene repertoires. We used the maximum parsimony method and genomes of related fungi of the Taphrinomycotina subphylum. Our results suggest that the last common ancestor of Pneumocystis spp. lost 2,324 genes in relation to the acquisition of obligate biotrophy. These losses may result from neutral drift and affect the biosyntheses of amino acids and thiamine, the assimilation of inorganic nitrogen and sulfur, and the catabolism of purines. In addition, P. jirovecii shows a reduced panel of lytic proteases and has lost the RNA interference machinery, which might contribute to its genome plasticity. Together with other characteristics, that is, a sex life cycle within the host, the absence of massive destruction of host cells, difficult culturing, and the lack of virulence factors, these gene losses constitute a unique combination of characteristics which are hallmarks of both obligate biotrophs and animal parasites. These findings suggest that Pneumocystis spp. should be considered as the first described obligate biotrophs of animals, whose evolution has been marked by gene losses.
Resumo:
In vitro and in vivo analyses identified a significant component of heritability in cellular or host susceptibility to HIV-1. The bases for susceptibility can be traced to genetic differences (inter-species) resulting from evolutionary adaptation to exogenous (and endogenous) retroviral infections, and to intra-species and inter-individual (human) differences associated with genetic variation. We have completed large scale evolutionary analysis of genes involved in HIV life cycle and pathogenesis, as well as participating and conducting genome-wide association studies, linkage analysis, and transcriptome analysis. These studies allowed a better understanding of the influence of common human variants in HIV-1 susceptibility and define a number of experimental challenges in the filed: understanding of the role of rare and private mutations in susceptibility, and the development of better tools for the integration of data from large-scale studies.