948 resultados para Genome-Wide Association
Resumo:
Background: Prolificacy is the most important trait influencing the reproductive efficiency of pig production systems. The low heritability and sex-limited expression of prolificacy have hindered to some extent the improvement of this trait through artificial selection. Moreover, the relative contributions of additive, dominant and epistatic QTL to the genetic variance of pig prolificacy remain to be defined. In this work, we have undertaken this issue by performing one-dimensional and bi-dimensional genome scans for number of piglets born alive (NBA) and total number of piglets born (TNB) in a three generation Iberian by Meishan F2 intercross. Results: The one-dimensional genome scan for NBA and TNB revealed the existence of two genome-wide highly significant QTL located on SSC13 (P < 0.001) and SSC17 (P < 0.01) with effects on both traits. This relative paucity of significant results contrasted very strongly with the wide array of highly significant epistatic QTL that emerged in the bi-dimensional genome-wide scan analysis. As much as 18 epistatic QTL were found for NBA (four at P < 0.01 and five at P < 0.05) and TNB (three at P < 0.01 and six at P < 0.05), respectively. These epistatic QTL were distributed in multiple genomic regions, which covered 13 of the 18 pig autosomes, and they had small individual effects that ranged between 3 to 4% of the phenotypic variance. Different patterns of interactions (a × a, a × d, d × a and d × d) were found amongst the epistatic QTL pairs identified in the current work.Conclusions: The complex inheritance of prolificacy traits in pigs has been evidenced by identifying multiple additive (SSC13 and SSC17), dominant and epistatic QTL in an Iberian × Meishan F2 intercross. Our results demonstrate that a significant fraction of the phenotypic variance of swine prolificacy traits can be attributed to first-order gene-by-gene interactions emphasizing that the phenotypic effects of alleles might be strongly modulated by the genetic background where they segregate.
Resumo:
SUMMARY: Research into the evolution of subdivided plant populations has long involved the study of phenotypic variation across plant geographic ranges and the genetic details underlying that variation. Genetic polymorphism at different marker loci has also allowed us to infer the long- and short-term histories of gene flow within and among populations, including range expansions and colonization-extinction dynamics. However, the advent of affordable genome-wide sequences for large numbers of individuals is opening up new possibilities for the study of subdivided populations. In this review, we consider what the new tools and technologies may allow us to do. In particular, we encourage researchers to look beyond the description of variation and to use genomic tools to address new hypotheses, or old ones afresh. Because subdivided plant populations are complex structures, we caution researchers away from adopting simplistic interpretations of their data, and to consider the patterns they observe in terms of the population genetic processes that have given rise to them; here, the genealogical framework of the coalescent will continue to be conceptually and analytically useful.
Resumo:
BACKGROUND: The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas.RESULTS: We successfully identified the causal genetic variant for Snowflake's albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake's parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla.CONCLUSIONS: In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.
Resumo:
Variation in body iron is associated with or causes diseases, including anaemia and iron overload. Here, we analyse genetic association data on biochemical markers of iron status from 11 European-population studies, with replication in eight additional cohorts (total up to 48,972 subjects). We find 11 genome-wide-significant (P<5 × 10(-8)) loci, some including known iron-related genes (HFE, SLC40A1, TF, TFR2, TFRC, TMPRSS6) and others novel (ABO, ARNTL, FADS2, NAT2, TEX14). SNPs at ARNTL, TF, and TFR2 affect iron markers in HFE C282Y homozygotes at risk for hemochromatosis. There is substantial overlap between our iron loci and loci affecting erythrocyte and lipid phenotypes. These results will facilitate investigation of the roles of iron in disease.
Resumo:
The contribution of genes, environment and gene-environment interactions to sleep disorders is increasingly recognized. Well-documented familial and twin sleep disorder studies suggest an important influence of genetic factors. However, only few sleep disorders have an established genetic basis including four rare diseases that may result from a single gene mutation: fatal familial insomnia, familial advanced sleep-phase syndrome, chronic primary insomnia, and narcolepsy with cataplexy. However, most sleep disorders are complex in terms of their genetic susceptibility together with the variable expressivity of the phenotype even within a same family. Recent linkage, genome-wide and candidate gene association studies resulted in the identification of gene mutations, gene localizations, or evidence for susceptibility genes and/or loci in several sleep disorders. Molecular techniques including mainly genome-wide linkage and association studies are further required to identify the contribution of new genes. These identified susceptibility genetic determinants will provide clues to better understand pathogenesis of sleep disorders, to assess the risk for diseases and also to find new drug targets to treat and to prevent the underlying conditions. We reviewed here the role of genetic basis in most of key sleep disorders.
Resumo:
Background/Purpose: Gout is a common and excruciatingly painful inflammatory arthritis caused by hyperuricemia. In addition to various lifestyle risk factors, a substantial genetic predisposition to gout has long been recognized. The Global Urate Genetics Consortium (GUGC) has aimed to comprehensively investigate the genetics of serum uric acid and gout using data from _ 140,000 individuals of European-ancestry, 8,340 individuals of Indian ancestry, 5,820 African-Americans, and 15,286 Japanese. Methods: We performed discovery GWAS meta-analyses of serum urate levels (n_110,347 individuals) followed by replication analyses (n_32,813 different individuals). Our gout analysis involved 3,151 cases and 68,350 controls, including 1,036 incident gout cases that met the American College of Rheumatology Criteria. We also examined the association of gout with fractional excretion of uric acid (n_6,799). A weighted genetic urate score was constructed based on the number of risk alleles across urate-associated loci, and their association with the risk of gout was evaluated. Furthermore, we examined implicated transcript expression in cis (expression quantitative trait loci databases) for potential insights into the gene underlying the association signal. Finally, in order to further identify urate-associated genomic regions, we performed functional network analyses that incorporated prior knowledge on molecular interactions in which the gene products of implicated genes operate. Results: We identified and replicated 28 genome-wide significant loci in association with serum urate (P 5_10_8), including all previously-reported loci as well as 18 novel genetic loci. Unlike the majority of previouslyidentified loci, none of the novel loci appeared to be obvious candidates for urate transport. Rather, they were mapped to genes that encode for purine production, transcription, or growth factors with broad downstream responses. Besides SLC2A9 and ABCG2, no additional regions contained SNPs that differed significantly (P _ 5_10_8) between sexes. Urateincreasing alleles were associated with an increased risk of gout for all loci. The urate genetic risk score (ranging from 10 to 45) was significantly associated with an increased odds of prevalent gout (OR per unit increase, 1.11; 95% CI, 1.09-1.14) and incident gout (OR, 1.10; 95% CI, 1.08-1.13). Associations for many of the loci were of similar magnitude in individuals of non-European ancestry. Detailed characterization of the loci revealed associations with transcript expression and the fractional excretion of urate. Network analyses implicated the inhibins-activins signaling pathways and glucose metabolism in systemic urate control. Conclusion: The novel genetic candidates identified in this urate/gout consortium study, the largest to date, highlight the importance of metabolic control of urate production and urate excretion. The modulation by signaling processes that influence metabolic pathways such as glycolysis and the pentose phosphate pathway appear to be central mechanisms underpinned by the novel GWAS candidates. These findings may have implications for further research into urate-lowering drugs to treat and prevent gout.
Resumo:
Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.
Resumo:
Huntington's disease (HD) pathology is well understood at a histological level but a comprehensive molecular analysis of the effect of the disease in the human brain has not previously been available. To elucidate the molecular phenotype of HD on a genome-wide scale, we compared mRNA profiles from 44 human HD brains with those from 36 unaffected controls using microarray analysis. Four brain regions were analyzed: caudate nucleus, cerebellum, prefrontal association cortex [Brodmann's area 9 (BA9)] and motor cortex [Brodmann's area 4 (BA4)]. The greatest number and magnitude of differentially expressed mRNAs were detected in the caudate nucleus, followed by motor cortex, then cerebellum. Thus, the molecular phenotype of HD generally parallels established neuropathology. Surprisingly, no mRNA changes were detected in prefrontal association cortex, thereby revealing subtleties of pathology not previously disclosed by histological methods. To establish that the observed changes were not simply the result of cell loss, we examined mRNA levels in laser-capture microdissected neurons from Grade 1 HD caudate compared to control. These analyses confirmed changes in expression seen in tissue homogenates; we thus conclude that mRNA changes are not attributable to cell loss alone. These data from bona fide HD brains comprise an important reference for hypotheses related to HD and other neurodegenerative diseases.
Resumo:
Uromodulin is expressed exclusively in the thick ascending limb and is the most abundant protein excreted in normal urine. Variants in UMOD, which encodes uromodulin, are associated with renal function, and urinary uromodulin levels may be a biomarker for kidney disease. However, the genetic factors regulating uromodulin excretion are unknown. We conducted a meta-analysis of urinary uromodulin levels to identify associated common genetic variants in the general population. We included 10,884 individuals of European descent from three genetic isolates and three urban cohorts. Each study measured uromodulin indexed to creatinine and conducted linear regression analysis of approximately 2.5 million single nucleotide polymorphisms using an additive model. We also tested whether variants in genes expressed in the thick ascending limb associate with uromodulin levels. rs12917707, located near UMOD and previously associated with renal function and CKD, had the strongest association with urinary uromodulin levels (P<0.001). In all cohorts, carriers of a G allele of this variant had higher uromodulin levels than noncarriers did (geometric means 10.24, 14.05, and 17.67 μg/g creatinine for zero, one, or two copies of the G allele). rs12446492 in the adjacent gene PDILT (protein disulfide isomerase-like, testis expressed) also reached genome-wide significance (P<0.001). Regarding genes expressed in the thick ascending limb, variants in KCNJ1, SORL1, and CAB39 associated with urinary uromodulin levels. These data indicate that common variants in the UMOD promoter region may influence urinary uromodulin levels. They also provide insights into uromodulin biology and the association of UMOD variants with renal function.
Resumo:
Previous studies in the lab of Dr. Liliane Michalik, have shown thai the nuclear hormone receptor Peroxisome Proliferator Activated Receptor beta/delta (PPARß/ö) is an important regulator of skin homeostasis, being involved in the regulation of keratinocyte differentiation, inflammation, apoptosis, arid mouse skin wound healing. Studies of PPARß/ö knock out mice have suggested a possible role for this receptor in cancer. However, contradictory observations of the role for PPARß/ö on tumor growth have been published, depending on cellular contexts and biological models. Given the controversial role of PPARß/ö in skin carcinoma development, the main aim of this PhD work has been to further explore the implication of PPARß/ö in skin response to UV and skin tumor growth. This PhD dissertation is divided in four chapters. The first chapter describes the core part of the project, where I explored the changes in miRNA expression in the skin upon chronic UV irradiation of PPARß/ö wild type and knock-out mice. This analysis shed light on a miRNA- PPARß/ö signature and also predicted thai miR-21-3p (previously named miR-21*) is a key regulator of the PPARß/ö-dependent UV response in the pre-lesiona! skin. Using mice acutely UV-irradiated, ! further demonstrated that miR-21-3p is indirectly regulated by PPARß/ö through activation of Transforming Growth Factor (TGFß)-1 under UV exposure. I also show that miR-21-3p is deregulated in human cutaneous squamous celi carcinoma. In cultured keratinocytes, application of a miR-21 -3p mimic oligonucleotide sequence leads to the regulation of lipid metabolism-related pathway. In the second chapter, I demonstrate that the usage of an mRNA/miRNA combined bioinformatics analysis leads to the discovery of important pathways involved in the PPARß/ö-miRNA response of the skin to chronic UV irradiation, indeed, I validated angiogenesis and lipid metabolism as important functions regulated by PPARß/ö in this context. In the third chapter, we demonstrate that PPARß/5 knockout mice have decreased cutaneous squamous cell carcinomas incidence compared to wild type mice and that PPARß/5 directly activates the cSrc kinase gene. In the last chapter, we review novel insights into PPAR functions in keratinocytes and liver, with emphasis on PPARß/ö but also on PPARa. In summary, this PhD study shows that i) PPARß/5 is able to regulate biological function through regulation of miRNAs, and specifically through miR-21-3p, the passenger miRNA of the oncomiR miR-21, and that ii) the PPARß/5-dependent skin response to UV involves the regulation of angiogenesis and lipid metabolism. Furthermore, the bioinformatics study highlights the relevance of performing integrated mRNA and miRNA genome-wide studies in order to better screen mRNAs and/or miRNAs of interest in the biological context of diseases. - Des études préalables dans le laboratoire du Dr. Liliane Michalik ont démontré que le récepteur nucléaire PPARß/5 est un régulateur important de l'homéostasie de la peau, étant impliqué dans la régulation de la différenciation des keratinocytes, dans l'inflammation, dans l'apoptose et dans la cicatrisation de la peau chez !a souris. L'étude de souris knock-out pour le gène PPARß/5, ont suggérées un rôle possible de ce récepteur dans le cancer. Cependant, des observations opposées ont été publiées suggérant un rôle pro- ou anti- cancer selon le tissue impliqué et le type- cellulaire. En considérant cette controverse autour du rôle de PPARß/5 dans le développement des cancers de la peau, le but principal de mon projet de recherche aura été d'approfondir l'exploration du rôle de PPARß/5 dans la réponse de la peau aux UVs et dans le développement du cancer. Cette dissertation de thèse est divisée en quatre parties. Une première partie, représentant le coeur de mon travail de recherche, décrit la découverte de l'implication des microRNAs (rniRNAs) dans la réponse aux UVs de PPARß/ö et plus spécifiquement l'implication du miRNA miR- 21 -3p (précédemment nommé miR-21*). En étudiant un modèle de souris irradiées de manière aigüe aux UVs, nous montrons que ia régulation de miR-21-3p est PPARß/ö-däpenaante et que cette régulation à lieu par l'intermédiaire du facteur de transcription TGFß-1. Dans des cultures de keratinocytes Humains, la transfecticn d'une séquence oligonucléotidique similaire à celle de miR-21-3p (mimic), montre l'implication de rniR-21-3p dans des fonctions importantes pour le développement des cancers telles que le métabolisme des lipides. Dans un second chapitre, nous montrons que l'usage d'une méthode bioinformatique combinant l'expression des ARN messagers et des miRNAs permet de mettre en évidence des fonctions biologiques importantes lors de ia réponse de PPARß/ö à l'irradiation chronique. L'angiogenèse, le stress oxydatif et le métabolisme des lipides font partie de ces fonctions régulées par PPARß/5 dans la peau irradiée aux UVs. Nous mettons également en évidence la régulation du gène LpcatS par PPARß/5 dans la peau irradiée aux UV ainsi que dans des keratinocytes humains suggérant un rôle pour PPARß/5 dans le remodelage des lipides membranaires. Dans une troisième partie, nous établissons un lien entre la régulation de l'oncogène Src et l'activation de PPARß/5 dans les carcinomes spinocellulaires de la peau. Finalement dans un quatrième chapitre, nous faisons une revue des dernières recherches portées sur le rôle de PPARß/5 et de PPARa dans le foie et ia peau. En résumé ce projet de thèse représente un avancement pour la recherche sur rimplication de PPARß/5 dans la réponse aux UVs de la peau. Pour la première fois, un lien est établi entre ce facteur de transcription et la régulation de microRNAs dans le cadre du carcinome spinocellulare. Jusqu'alors resté dans l'ombre de rniR-21-5p, miR-21-3p est en fait fortement augmenté à la fois dans un modèle de souris d'irradiation aux UVs ainsi que dans ie carcinome spinocellulare chez i'humain. De nouvelles fonctions biologiques pour PPARß/5 ont été également mises en évidence dans ce travail, comme la régulation de l'angiogenèse ou du métabolisme des lipides dans Sa peau. De plus cette dissertation valorise l'intérêt d'une association entre le travail de laboratoire et celui de la bioinformatique.
Resumo:
Cells are subjected to dramatic changes of gene expression upon environmental changes. Stresscauses a general down-regulation of gene expression together with the induction of a set of stress-responsivegenes. The p38-related stress-activated protein kinase Hog1 is an important regulator of transcription uponosmostress in yeast. Genome-wide localization studies of RNA polymerase II (RNA Pol II) and Hog1 showed that stress induced major changes in RNA Pol II localization, with a shift toward stress-responsive genes relative to housekeeping genes. RNA Pol II relocalization required Hog1, which was also localized to stress-responsive loci. In addition to RNA Pol II-bound genes, Hog1 also localized to RNA polymerase III-bound genes, pointing to a wider role for Hog1 in transcriptional control than initially expected. Interestingly, an increasing association of Hog1 with stressresponsive genes was strongly correlated with chromatin remodeling and increased gene expression. Remarkably, MNase-Seq analysis showed that although chromatin structure was not significantly altered at a genome-wide level in response to stress, there was pronounced chromatin remodeling for those genes that displayed Hog1 association. Hog1 serves to bypass the general down-regulation of gene expression that occurs in response to osmostress, and does so both by targeting RNA Pol II machinery and by inducing chromatin remodeling at stressresponsive loci.
Resumo:
We propose a novel multifactor dimensionality reduction method for epistasis detection in small or extended pedigrees, FAM-MDR. It combines features of the Genome-wide Rapid Association using Mixed Model And Regression approach (GRAMMAR) with Model-Based MDR (MB-MDR). We focus on continuous traits, although the method is general and can be used for outcomes of any type, including binary and censored traits. When comparing FAM-MDR with Pedigree-based Generalized MDR (PGMDR), which is a generalization of Multifactor Dimensionality Reduction (MDR) to continuous traits and related individuals, FAM-MDR was found to outperform PGMDR in terms of power, in most of the considered simulated scenarios. Additional simulations revealed that PGMDR does not appropriately deal with multiple testing and consequently gives rise to overly optimistic results. FAM-MDR adequately deals with multiple testing in epistasis screens and is in contrast rather conservative, by construction. Furthermore, simulations show that correcting for lower order (main) effects is of utmost importance when claiming epistasis. As Type 2 Diabetes Mellitus (T2DM) is a complex phenotype likely influenced by gene-gene interactions, we applied FAM-MDR to examine data on glucose area-under-the-curve (GAUC), an endophenotype of T2DM for which multiple independent genetic associations have been observed, in the Amish Family Diabetes Study (AFDS). This application reveals that FAM-MDR makes more efficient use of the available data than PGMDR and can deal with multi-generational pedigrees more easily. In conclusion, we have validated FAM-MDR and compared it to PGMDR, the current state-of-the-art MDR method for family data, using both simulations and a practical dataset. FAM-MDR is found to outperform PGMDR in that it handles the multiple testing issue more correctly, has increased power, and efficiently uses all available information.
Resumo:
Copy number variation (CNV) has recently gained considerable interest as a source of genetic variation likely to play a role in phenotypic diversity and evolution. Much effort has been put into the identification and mapping of regions that vary in copy number among seemingly normal individuals in humans and a number of model organisms, using bioinformatics or hybridization-based methods. These have allowed uncovering associations between copy number changes and complex diseases in whole-genome association studies, as well as identify new genomic disorders. At the genome-wide scale, however, the functional impact of CNV remains poorly studied. Here we review the current catalogs of CNVs, their association with diseases and how they link genotype and phenotype. We describe initial evidence which revealed that genes in CNV regions are expressed at lower and more variable levels than genes mapping elsewhere, and also that CNV not only affects the expression of genes varying in copy number, but also have a global influence on the transcriptome. Further studies are warranted for complete cataloguing and fine mapping of CNVs, as well as to elucidate the different mechanisms by which they influence gene expression.
Resumo:
AbstractAlthough the genomes from any two human individuals are more than 99.99% identical at the sequence level, some structural variation can be observed. Differences between genomes include single nucleotide polymorphism (SNP), inversion and copy number changes (gain or loss of DNA). The latter can range from submicroscopic events (CNVs, at least 1kb in size) to complete chromosomal aneuploidies. Small copy number variations have often no (lethal) consequences to the cell, but a few were associated to disease susceptibility and phenotypic variations. Larger re-arrangements (i.e. complete chromosome gain) are frequently associated with more severe consequences on health such as genomic disorders and cancer. High-throughput technologies like DNA microarrays enable the detection of CNVs in a genome-wide fashion. Since the initial catalogue of CNVs in the human genome in 2006, there has been tremendous interest in CNVs both in the context of population and medical genetics. Understanding CNV patterns within and between human populations is essential to elucidate their possible contribution to disease. But genome analysis is a challenging task; the technology evolves rapidly creating needs for novel, efficient and robust analytical tools which need to be compared with existing ones. Also, while the link between CNV and disease has been established, the relative CNV contribution is not fully understood and the predisposition to disease from CNVs of the general population has not been yet investigated.During my PhD thesis, I worked on several aspects related to CNVs. As l will report in chapter 3, ! was interested in computational methods to detect CNVs from the general population. I had access to the CoLaus dataset, a population-based study with more than 6,000 participants from the Lausanne area. All these individuals were analysed on SNP arrays and extensive clinical information were available. My work explored existing CNV detection methods and I developed a variety of metrics to compare their performance. Since these methods were not producing entirely satisfactory results, I implemented my own method which outperformed two existing methods. I also devised strategies to combine CNVs from different individuals into CNV regions.I was also interested in the clinical impact of CNVs in common disease (chapter 4). Through an international collaboration led by the Centre Hospitalier Universitaire Vaudois (CHUV) and the Imperial College London I was involved as a main data analyst in the investigation of a rare deletion at chromosome 16p11 detected in obese patients. Specifically, we compared 8,456 obese patients and 11,856 individuals from the general population and we found that the deletion was accounting for 0.7% of the morbid obesity cases and was absent in healthy non- obese controls. This highlights the importance of rare variants with strong impact and provides new insights in the design of clinical studies to identify the missing heritability in common disease.Furthermore, I was interested in the detection of somatic copy number alterations (SCNA) and their consequences in cancer (chapter 5). This project was a collaboration initiated by the Ludwig Institute for Cancer Research and involved other groups from the Swiss Institute of Bioinformatics, the CHUV and Universities of Lausanne and Geneva. The focus of my work was to identify genes with altered expression levels within somatic copy number alterations (SCNA) in seven metastatic melanoma ceil lines, using CGH and SNP arrays, RNA-seq, and karyotyping. Very few SCNA genes were shared by even two melanoma samples making it difficult to draw any conclusions at the individual gene level. To overcome this limitation, I used a network-guided analysis to determine whether any pathways, defined by amplified or deleted genes, were common among the samples. Six of the melanoma samples were potentially altered in four pathways and five samples harboured copy-number and expression changes in components of six pathways. In total, this approach identified 28 pathways. Validation with two external, large melanoma datasets confirmed all but three of the detected pathways and demonstrated the utility of network-guided approaches for both large and small datasets analysis.RésuméBien que le génome de deux individus soit similaire à plus de 99.99%, des différences de structure peuvent être observées. Ces différences incluent les polymorphismes simples de nucléotides, les inversions et les changements en nombre de copies (gain ou perte d'ADN). Ces derniers varient de petits événements dits sous-microscopiques (moins de 1kb en taille), appelés CNVs (copy number variants) jusqu'à des événements plus large pouvant affecter des chromosomes entiers. Les petites variations sont généralement sans conséquence pour la cellule, toutefois certaines ont été impliquées dans la prédisposition à certaines maladies, et à des variations phénotypiques dans la population générale. Les réarrangements plus grands (par exemple, une copie additionnelle d'un chromosome appelée communément trisomie) ont des répercutions plus grave pour la santé, comme par exemple dans certains syndromes génomiques et dans le cancer. Les technologies à haut-débit telle les puces à ADN permettent la détection de CNVs à l'échelle du génome humain. La cartographie en 2006 des CNV du génome humain, a suscité un fort intérêt en génétique des populations et en génétique médicale. La détection de différences au sein et entre plusieurs populations est un élément clef pour élucider la contribution possible des CNVs dans les maladies. Toutefois l'analyse du génome reste une tâche difficile, la technologie évolue très rapidement créant de nouveaux besoins pour le développement d'outils, l'amélioration des précédents, et la comparaison des différentes méthodes. De plus, si le lien entre CNV et maladie a été établit, leur contribution précise n'est pas encore comprise. De même que les études sur la prédisposition aux maladies par des CNVs détectés dans la population générale n'ont pas encore été réalisées.Pendant mon doctorat, je me suis concentré sur trois axes principaux ayant attrait aux CNV. Dans le chapitre 3, je détaille mes travaux sur les méthodes d'analyses des puces à ADN. J'ai eu accès aux données du projet CoLaus, une étude de la population de Lausanne. Dans cette étude, le génome de plus de 6000 individus a été analysé avec des puces SNP et de nombreuses informations cliniques ont été récoltées. Pendant mes travaux, j'ai utilisé et comparé plusieurs méthodes de détection des CNVs. Les résultats n'étant pas complètement satisfaisant, j'ai implémenté ma propre méthode qui donne de meilleures performances que deux des trois autres méthodes utilisées. Je me suis aussi intéressé aux stratégies pour combiner les CNVs de différents individus en régions.Je me suis aussi intéressé à l'impact clinique des CNVs dans le cas des maladies génétiques communes (chapitre 4). Ce projet fut possible grâce à une étroite collaboration avec le Centre Hospitalier Universitaire Vaudois (CHUV) et l'Impérial College à Londres. Dans ce projet, j'ai été l'un des analystes principaux et j'ai travaillé sur l'impact clinique d'une délétion rare du chromosome 16p11 présente chez des patients atteints d'obésité. Dans cette collaboration multidisciplinaire, nous avons comparés 8'456 patients atteint d'obésité et 11 '856 individus de la population générale. Nous avons trouvés que la délétion était impliquée dans 0.7% des cas d'obésité morbide et était absente chez les contrôles sains (non-atteint d'obésité). Notre étude illustre l'importance des CNVs rares qui peuvent avoir un impact clinique très important. De plus, ceci permet d'envisager une alternative aux études d'associations pour améliorer notre compréhension de l'étiologie des maladies génétiques communes.Egalement, j'ai travaillé sur la détection d'altérations somatiques en nombres de copies (SCNA) et de leurs conséquences pour le cancer (chapitre 5). Ce projet fut une collaboration initiée par l'Institut Ludwig de Recherche contre le Cancer et impliquant l'Institut Suisse de Bioinformatique, le CHUV et les Universités de Lausanne et Genève. Je me suis concentré sur l'identification de gènes affectés par des SCNAs et avec une sur- ou sous-expression dans des lignées cellulaires dérivées de mélanomes métastatiques. Les données utilisées ont été générées par des puces ADN (CGH et SNP) et du séquençage à haut débit du transcriptome. Mes recherches ont montrées que peu de gènes sont récurrents entre les mélanomes, ce qui rend difficile l'interprétation des résultats. Pour contourner ces limitations, j'ai utilisé une analyse de réseaux pour définir si des réseaux de signalisations enrichis en gènes amplifiés ou perdus, étaient communs aux différents échantillons. En fait, parmi les 28 réseaux détectés, quatre réseaux sont potentiellement dérégulés chez six mélanomes, et six réseaux supplémentaires sont affectés chez cinq mélanomes. La validation de ces résultats avec deux larges jeux de données publiques, a confirmée tous ces réseaux sauf trois. Ceci démontre l'utilité de cette approche pour l'analyse de petits et de larges jeux de données.Résumé grand publicL'avènement de la biologie moléculaire, en particulier ces dix dernières années, a révolutionné la recherche en génétique médicale. Grâce à la disponibilité du génome humain de référence dès 2001, de nouvelles technologies telles que les puces à ADN sont apparues et ont permis d'étudier le génome dans son ensemble avec une résolution dite sous-microscopique jusque-là impossible par les techniques traditionnelles de cytogénétique. Un des exemples les plus importants est l'étude des variations structurales du génome, en particulier l'étude du nombre de copies des gènes. Il était établi dès 1959 avec l'identification de la trisomie 21 par le professeur Jérôme Lejeune que le gain d'un chromosome supplémentaire était à l'origine de syndrome génétique avec des répercussions graves pour la santé du patient. Ces observations ont également été réalisées en oncologie sur les cellules cancéreuses qui accumulent fréquemment des aberrations en nombre de copies (telles que la perte ou le gain d'un ou plusieurs chromosomes). Dès 2004, plusieurs groupes de recherches ont répertorié des changements en nombre de copies dans des individus provenant de la population générale (c'est-à-dire sans symptômes cliniques visibles). En 2006, le Dr. Richard Redon a établi la première carte de variation en nombre de copies dans la population générale. Ces découvertes ont démontrées que les variations dans le génome était fréquentes et que la plupart d'entre elles étaient bénignes, c'est-à-dire sans conséquence clinique pour la santé de l'individu. Ceci a suscité un très grand intérêt pour comprendre les variations naturelles entre individus mais aussi pour mieux appréhender la prédisposition génétique à certaines maladies.Lors de ma thèse, j'ai développé de nouveaux outils informatiques pour l'analyse de puces à ADN dans le but de cartographier ces variations à l'échelle génomique. J'ai utilisé ces outils pour établir les variations dans la population suisse et je me suis consacré par la suite à l'étude de facteurs pouvant expliquer la prédisposition aux maladies telles que l'obésité. Cette étude en collaboration avec le Centre Hospitalier Universitaire Vaudois a permis l'identification d'une délétion sur le chromosome 16 expliquant 0.7% des cas d'obésité morbide. Cette étude a plusieurs répercussions. Tout d'abord elle permet d'effectuer le diagnostique chez les enfants à naître afin de déterminer leur prédisposition à l'obésité. Ensuite ce locus implique une vingtaine de gènes. Ceci permet de formuler de nouvelles hypothèses de travail et d'orienter la recherche afin d'améliorer notre compréhension de la maladie et l'espoir de découvrir un nouveau traitement Enfin notre étude fournit une alternative aux études d'association génétique qui n'ont eu jusqu'à présent qu'un succès mitigé.Dans la dernière partie de ma thèse, je me suis intéressé à l'analyse des aberrations en nombre de copies dans le cancer. Mon choix s'est porté sur l'étude de mélanomes, impliqués dans le cancer de la peau. Le mélanome est une tumeur très agressive, elle est responsable de 80% des décès des cancers de la peau et est souvent résistante aux traitements utilisés en oncologie (chimiothérapie, radiothérapie). Dans le cadre d'une collaboration entre l'Institut Ludwig de Recherche contre le Cancer, l'Institut Suisse de Bioinformatique, le CHUV et les universités de Lausanne et Genève, nous avons séquencés l'exome (les gènes) et le transcriptome (l'expression des gènes) de sept mélanomes métastatiques, effectués des analyses du nombre de copies par des puces à ADN et des caryotypes. Mes travaux ont permis le développement de nouvelles méthodes d'analyses adaptées au cancer, d'établir la liste des réseaux de signalisation cellulaire affectés de façon récurrente chez le mélanome et d'identifier deux cibles thérapeutiques potentielles jusqu'alors ignorées dans les cancers de la peau.
Resumo:
Context : It is now clearly shown that genetic factors in association with environment play a key role in obesity and eating disorders. This project studies the clinical symptoms and molecular abnormalities in patients carrying a strong hereditary predisposition to obesity and eating behavior disorders. We have previously published the association between the 16:29.5-30.1 deletion and a very penetrant form of morbid obesity and macrocephaly. We have also demonstrated the association between the reciprocal 16:29.5-30.1 duplication and underweight and small head circumference. These 2 studies demonstrate that gene dosage of one or several genes in this region regulates BMI as well as brain growth. At present, there are no data pointing towards particular candidate genes. We are currently investigating a second non-overlapping recurrent CNV encompassing SH2B1, upstream of the aforementioned rearrangement. SNPs in this gene have been associated with BMI in GWAS studies and mice models confirmed this association. Bokuchova et al have reported an association between deletions encompassing this gene and severe early onset obesity, as well as insulin resistance. We are currently collecting and analyzing data to fully characterize the phenotype and the transcriptional patterns associated with this rearrangement. Aims : 1. Identify carriers of any CNVs in the greater 16p11.2 region (between 16:28MB and 32MB) in the EGG consortium. 2. Perform association studies between SNPs in the greater 16p11.2 region (16:28-32MB) and anthropometric measures with adjusted "locus-wide significance", to identify or prioritize candidate genes potentially driving the association observed in patients with the CNVs (and thus worthy of further validation and sequencing). 3. Explore associations between GSV genome-wide and brain volume. 4. Explore relationship between brain volumes (whole brain and regional for those who underwent brain MRI), head circumference and BMI. 5. Extrapolate this procedure to other regions covered by the Metabochip. Methods : - Examine and collect clinical informations, as well as molecular informations in these patients. - Analysis of MRI data in children and adults with BMI > 2SD. Compare changes to MRI data obtained in patients with monogenic forms of obesity (data from Lausanne study) and to underweight (BMI<-2SD) individuals from EGG. - Test whether opposite extremes of the phenotypic distribution may be highly informative Expected results : This is a highly focused study, pertaining to approximately 1 0/00 of the human genome. Yet it is clear that if successful, the lessons learned from this study could be extrapolated to other segments of the genome and would need validation and replication by additional studies. Altogether they will contribute to further explore the missing heritability and point to etiologic genes and pathways underlying these important health burdens.