971 resultados para Genetic clustering analysis
Resumo:
Studies of large sets of SNP data have proven to be a powerful tool in the analysis of the genetic structure of human populations. In this work, we analyze genotyping data for 2,841 SNPs in 12 Sub-Saharan African populations, including a previously unsampled region of south-eastern Africa (Mozambique). We show that robust results in a world-wide perspective can be obtained when analyzing only 1,000 SNPs. Our main results both confirm the results of previous studies, and show new and interesting features in Sub-Saharan African genetic complexity. There is a strong differentiation of Nilo-Saharans, much beyond what would be expected by geography. Hunter-gatherer populations (Khoisan and Pygmies) show a clear distinctiveness with very intrinsic Pygmy (and not only Khoisan) genetic features. Populations of the West Africa present an unexpected similarity among them, possibly the result of a population expansion. Finally, we find a strong differentiation of the south-eastern Bantu population from Mozambique, which suggests an assimilation of a pre-Bantu substrate by Bantu speakers in the region.
Resumo:
Background: Germline genetic variation is associated with the differential expression of many human genes. The phenotypic effects of this type of variation may be important when considering susceptibility to common genetic diseases. Three regions at 8q24 have recently been identified to independently confer risk of prostate cancer. Variation at 8q24 has also recently been associated with risk of breast and colorectal cancer. However, none of the risk variants map at or relatively close to known genes, with c-MYC mapping a few hundred kilobases distally. Results: This study identifies cis-regulators of germline c-MYC expression in immortalized lymphocytes of HapMap individuals. Quantitative analysis of c-MYC expression in normal prostate tissues suggests an association between overexpression and variants in Region 1 of prostate cancer risk. Somatic c-MYC overexpression correlates with prostate cancer progression and more aggressive tumor forms, which was also a pathological variable associated with Region 1. Expression profiling analysis and modeling of transcriptional regulatory networks predicts a functional association between MYC and the prostate tumor suppressor KLF6. Analysis of MYC/Myc-driven cell transformation and tumorigenesis substantiates a model in which MYC overexpression promotes transformation by down-regulating KLF6. In this model, a feedback loop through E-cadherin down-regulation causes further transactivation of c-MYC.Conclusion: This study proposes that variation at putative 8q24 cis-regulator(s) of transcription can significantly alter germline c-MYC expression levels and, thus, contribute to prostate cancer susceptibility by down-regulating the prostate tumor suppressor KLF6 gene.
Resumo:
Malaria has occurred in the Cabo Verde archipelago with epidemic characteristics since its colonization. Nowadays, it occurs in Santiago Island alone and though prophylaxis is not recommended by the World Health Organization, studies have highlight the prospect of malaria becoming a serious public health problem as a result of the presence of antimalarial drug resistance associated with mutations in the parasite populations and underscore the need for tighter surveillance. Despite the presumptive weak immune status of the population, severe symptoms of malaria are not observed and many people present a subclinical course of the disease. No data on the prevalence of sicklecell trait and red cell glucose-6-phosphate dehydrogenase deficiency (two classical genetic factors associated with resistance to severe malaria) were available for the Cabo Verde archipelago and, therefore, we studied the low morbidity from malaria in relation to the particular genetic characteristics of the human host population. We also included the analysis of the pyruvate kinase deficiency associated gene, reported as putatively associated with resistance to the disease. Allelic frequencies of the polymorphisms examined are closer to European than to African populations and no malaria selection signatures were found. No association was found between the analyzed human factors and infection but one result is of high interest: a linkage disequilibrium test revealed an association of distant loci in the PKLR gene and adjacent regions, only in non-infected individuals. This could mean a more conserved gene region selected in association to protection against the infection and/or the disease.
Resumo:
Abstract en FrançaisCTCFL a d'abord été identifié comme un paralogue de la protéine ubiquitaire CTCF en raison de sa forte homologie entre leurs onze « zinc fingers », un domaine de liaison à l'ADN. Parmi ses nombreux rôles, la liaison des zinc fingers de CTCF à la région de contrôle de l'empreinte (ICR) maternelle non-méthylée Igf2/H19, contrôle l'expression empreinte (monoallélique) de H19 et IGF2 dans les cellules somatiques. La méthylation de l'ICR Igf2/H19 paternelle est nécessaire à l'expression empreinte de ces deux gènes. Bien que le mécanisme par lequel l'ICR est méthylé soit mal compris, il est connu que l'établissement de la méthylation se produit pendant le développement des cellules germinales mâles et que les ADN méthyltransférases de novo DNMT3A et DNMT3L sont essentiels. Par conséquent, CTCFL fournit un bon candidat pour un rôle dans la méthylation de l'ICR paternelle Igf2/H19 en raison de son expression restreinte à certains types de cellules où la méthylation de l'ICR a lieu (spermatogonies et spermatocytes) ainsi qu'en raison sa capacité à lier les ICR lgf2/HÎ9 dans ces cellules. Les premiers travaux expérimentaux de cette thèse portent sur le rôle possible des mutations de CTCFL chez les patients atteints du syndrome de Silver-Russell (SRS), où une diminution de la méthylation de l'ICR IGF2/H19 a été observée chez 60% d'entre eux. Admettant que CTCFL pourrait être muté chez ces patients, j'ai examiné les mutations possibles de CTCFL chez 35 d'entre eux par séquençage de l'ADN et analyse du nombre de copies d'exons. N'ayant trouvé aucune mutation chez ces patients, cela suggère que les mutations de CTCFL ne sont pas associées au SRS. Les travaux expérimentaux suivants ont porté sur les modifications post-traductionnelles de CTCFL par la protéine SU MO « small ubiquitin-like modifier » (SUMO). La modification de protéines par SU MO change les interactions avec d'autres molécules (ADN ou protéines). Comme CTCFL régule sans doute l'expression d'un certain nombre de gènes dans le cancer et que plusieurs facteurs de transcription sont régulés par SUMO, j'ai mené des expériences pour déterminer si CTCFL est sumoylé. En effet, j'ai observé que CTCFL est sumoylated in vitro et in vivo et j'ai déterminé les deux résidus d'attachement de SUMO aux lysines 181 et 645. Utilisant les mutants de CTCFL K181R et K645R ne pouvant pas être sumoylated, j'ai évalué les conséquences fonctionnelles de la modification par SUMO. Je n'ai trouvé aucun changement significatif dans la localisation subcellulaire, la demi-vie ou la liaison à l'ADN, mais ai constaté que la sumoylation module à la fois {'activation CTCFL-dépendante et la répression de l'expression génique. Il s'agit de la première modification post-traductionnelle décrite pour CTCFL et les conséquences possibles de cette modification sont discutées pour le cancer et les testicules normaux. Avec cette thèse, j'espère avoir ajouté des résultats importants à l'étude de CTCFL et donné quelques idées pour de futures recherches.AbstractJeremiah Bernier-Latmani, Institute of Pathology, University of Lausanne, CHUVCTCFL was first identified as a paralog of the ubiquitous protein CTCF because of high homology between their respective eleven zinc fingers, a DNA binding domain. Among its many roles, CTCF zinc finger-mediated binding to the unmethylated maternal Igf2/H19 imprinting control region (ICR), controls the imprinted (monoallelic) expression of Igf2 and H19 in somatic cells. Methylation of the paternal Igf2/H19 ICR is necessary for the imprinted expression of the two genes. Although the mechanism by which the ICR is methylated is incompletely understood, it is known that establishment of methylation occurs during male germ cell development and the de novo DNA methyltransferases DNMT3A and DNMT3L are essential. Therefore, CTCFL provided a good candidate to play a role in methylation of the paternal Igf2/H19 ICR because of its restricted expression to cell types where ICR methylation takes place (spermatogonia and spermatocytes) and its ability to bind the Igf2/H19 ICR in these cells. The first experimental work of this thesis investigated the possible role of CTCFL mutations in Silver-Russell syndrome (SRS) patients, where it has been observed that 60% of the patients have reduced methylation of the IGF2/HÎ9 ICR. Reasoning that CTCFL could be mutated in these patients, I screened 35 patients for mutations in CTCFL by DNA sequencing and exon copy number analysis, I did not find any mutations in these patients suggesting that mutations of CTCFL are not associated with SRS. The next experimental work of my thesis focused on posttranslational modification of CTCFL by small ubiquitin-like modifier (SUMO) protein. SUMO modification of proteins changes the interactions with other molecules (DNA or protein). As CTCFL arguably regulates the expression of a number of genes in cancer and many transcription factors are regulated by SUMO, I conducted experiments to assess whether CTCFL is sumoylated. I found that CTCFL is sumoylated in vitro and in vivo and determined the two residues of SUMO attachment to be lysines 181 and 645. Using K181R, K645R mutated CTCFL- which cannot be detected to be sumoylated-1 assessed the functional consequences of SUMO modification. I found no significant changes in subcellular localization, half-life or DNA binding, but found that sumoylation modulates both CTCFL-dependent activation and repression of gene expression. This is the first posttranslational modification described for CTCFL and possible consequences of this modification are discussed in both cancer and normal testis. With this thesis, I hope I have added important findings to the study of CTCFL and provide some ideas for future research.
Resumo:
The emergence of host-races within aphids may constitute an obstacle to pest management by means of plant resistance. There are examples of host-races within cereals aphids, but their occurrence in Rose Grain Aphid, Metopolophium dirhodum (Walker, 1849), has not been reported yet. In this work, RAPD markers were used to assess effects of the hosts and geographic distance on the genetic diversity of M. dirhodum lineages. Twenty-three clones were collected on oats and wheat in twelve localitites of southern Brazil. From twenty-seven primers tested, only four primers showed polymorphisms. Fourteen different genotypes were revealed by cluster analysis. Five genotypes were collected only on wheat; seven only on oats and two were collected in both hosts. Genetic and geographical distances among all clonal lineages were not correlated. Analysis of molecular variance showed that some molecular markers are not randomly distributed among clonal lineages collected on oats and on wheat. These results suggest the existence of host-races within M. dirhodum, which should be further investigated using a combination of ecological and genetic data.
Resumo:
Background/Purpose: Gout is a common and excruciatingly painful inflammatory arthritis caused by hyperuricemia. In addition to various lifestyle risk factors, a substantial genetic predisposition to gout has long been recognized. The Global Urate Genetics Consortium (GUGC) has aimed to comprehensively investigate the genetics of serum uric acid and gout using data from _ 140,000 individuals of European-ancestry, 8,340 individuals of Indian ancestry, 5,820 African-Americans, and 15,286 Japanese. Methods: We performed discovery GWAS meta-analyses of serum urate levels (n_110,347 individuals) followed by replication analyses (n_32,813 different individuals). Our gout analysis involved 3,151 cases and 68,350 controls, including 1,036 incident gout cases that met the American College of Rheumatology Criteria. We also examined the association of gout with fractional excretion of uric acid (n_6,799). A weighted genetic urate score was constructed based on the number of risk alleles across urate-associated loci, and their association with the risk of gout was evaluated. Furthermore, we examined implicated transcript expression in cis (expression quantitative trait loci databases) for potential insights into the gene underlying the association signal. Finally, in order to further identify urate-associated genomic regions, we performed functional network analyses that incorporated prior knowledge on molecular interactions in which the gene products of implicated genes operate. Results: We identified and replicated 28 genome-wide significant loci in association with serum urate (P 5_10_8), including all previously-reported loci as well as 18 novel genetic loci. Unlike the majority of previouslyidentified loci, none of the novel loci appeared to be obvious candidates for urate transport. Rather, they were mapped to genes that encode for purine production, transcription, or growth factors with broad downstream responses. Besides SLC2A9 and ABCG2, no additional regions contained SNPs that differed significantly (P _ 5_10_8) between sexes. Urateincreasing alleles were associated with an increased risk of gout for all loci. The urate genetic risk score (ranging from 10 to 45) was significantly associated with an increased odds of prevalent gout (OR per unit increase, 1.11; 95% CI, 1.09-1.14) and incident gout (OR, 1.10; 95% CI, 1.08-1.13). Associations for many of the loci were of similar magnitude in individuals of non-European ancestry. Detailed characterization of the loci revealed associations with transcript expression and the fractional excretion of urate. Network analyses implicated the inhibins-activins signaling pathways and glucose metabolism in systemic urate control. Conclusion: The novel genetic candidates identified in this urate/gout consortium study, the largest to date, highlight the importance of metabolic control of urate production and urate excretion. The modulation by signaling processes that influence metabolic pathways such as glycolysis and the pentose phosphate pathway appear to be central mechanisms underpinned by the novel GWAS candidates. These findings may have implications for further research into urate-lowering drugs to treat and prevent gout.
Resumo:
With the trend in molecular epidemiology towards both genome-wide association studies and complex modelling, the need for large sample sizes to detect small effects and to allow for the estimation of many parameters within a model continues to increase. Unfortunately, most methods of association analysis have been restricted to either a family-based or a case-control design, resulting in the lack of synthesis of data from multiple studies. Transmission disequilibrium-type methods for detecting linkage disequilibrium from family data were developed as an effective way of preventing the detection of association due to population stratification. Because these methods condition on parental genotype, however, they have precluded the joint analysis of family and case-control data, although methods for case-control data may not protect against population stratification and do not allow for familial correlations. We present here an extension of a family-based association analysis method for continuous traits that will simultaneously test for, and if necessary control for, population stratification. We further extend this method to analyse binary traits (and therefore family and case-control data together) and accurately to estimate genetic effects in the population, even when using an ascertained family sample. Finally, we present the power of this binary extension for both family-only and joint family and case-control data, and demonstrate the accuracy of the association parameter and variance components in an ascertained family sample.
Resumo:
A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/.
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10(-12)). All associated SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was assessed using VL results. We identified two critical advantages to the use of viral variation for identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than VL, reflecting the 'intermediate phenotype' nature of viral variation; (2) association testing can be run without any clinical data. The proposed genome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host-pathogen interaction. DOI:http://dx.doi.org/10.7554/eLife.01123.001.
Resumo:
Selostus: Tyrnin geneettisen monimuotoisuuden arviointi RAPD analyysillä
Resumo:
Adiponectin serum concentrations are an important biomarker in cardiovascular epidemiology with heritability etimates of 30-70%. However, known genetic variants in the adiponectin gene locus (ADIPOQ) account for only 2%-8% of its variance. As transcription factors are thought to play an under-acknowledged role in carrying functional variants, we hypothesized that genetic polymorphisms in genes coding for the main transcription factors for the ADIPOQ promoter influence adiponectin levels. Single nucleotide polymorphisms (SNPs) at these genes were selected based on the haplotype block structure and previously published evidence to be associated with adiponectin levels. We performed association analyses of the 24 selected SNPs at forkhead box O1 (FOXO1), sterol-regulatory-element-binding transcription factor 1 (SREBF1), sirtuin 1 (SIRT1), peroxisome-proliferator-activated receptor gamma (PPARG) and transcription factor activating enhancer binding protein 2 beta (TFAP2B) gene loci with adiponectin levels in three different European cohorts: SAPHIR (n = 1742), KORA F3 (n = 1636) and CoLaus (n = 5355). In each study population, the association of SNPs with adiponectin levels on log-scale was tested using linear regression adjusted for age, sex and body mass index, applying both an additive and a recessive genetic model. A pooled effect size was obtained by meta-analysis assuming a fixed effects model. We applied a significance threshold of 0.0033 accounting for the multiple testing situation. A significant association was only found for variants within SREBF1 applying an additive genetic model (smallest p-value for rs1889018 on log(adiponectin) = 0.002, β on original scale = -0.217 µg/ml), explaining ∼0.4% of variation of adiponectin levels. Recessive genetic models or haplotype analyses of the FOXO1, SREBF1, SIRT1, TFAPB2B genes or sex-stratified analyses did not reveal additional information on the regulation of adiponectin levels. The role of genetic variations at the SREBF1 gene in regulating adiponectin needs further investigation by functional studies.
Resumo:
Whole-grain foods are touted for multiple health benefits, including enhancing insulin sensitivity and reducing type 2 diabetes risk. Recent genome-wide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs) associated with fasting glucose and insulin concentrations in individuals free of diabetes. We tested the hypothesis that whole-grain food intake and genetic variation interact to influence concentrations of fasting glucose and insulin. Via meta-analysis of data from 14 cohorts comprising ∼ 48,000 participants of European descent, we studied interactions of whole-grain intake with loci previously associated in GWAS with fasting glucose (16 loci) and/or insulin (2 loci) concentrations. For tests of interaction, we considered a P value <0.0028 (0.05 of 18 tests) as statistically significant. Greater whole-grain food intake was associated with lower fasting glucose and insulin concentrations independent of demographics, other dietary and lifestyle factors, and BMI (β [95% CI] per 1-serving-greater whole-grain intake: -0.009 mmol/l glucose [-0.013 to -0.005], P < 0.0001 and -0.011 pmol/l [ln] insulin [-0.015 to -0.007], P = 0.0003). No interactions met our multiple testing-adjusted statistical significance threshold. The strongest SNP interaction with whole-grain intake was rs780094 (GCKR) for fasting insulin (P = 0.006), where greater whole-grain intake was associated with a smaller reduction in fasting insulin concentrations in those with the insulin-raising allele. Our results support the favorable association of whole-grain intake with fasting glucose and insulin and suggest a potential interaction between variation in GCKR and whole-grain intake in influencing fasting insulin concentrations.