11 resultados para Whole genome mapping
em Université de Lausanne, Switzerland
Resumo:
With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.
Resumo:
Adiponectin has a variety of metabolic effects on obesity, insulin sensitivity, and atherosclerosis. To identify genes influencing variation in plasma adiponectin levels, we performed genome-wide linkage and association scans of adiponectin in two cohorts of subjects recruited in the Genetic Epidemiology of Metabolic Syndrome Study. The genome-wide linkage scan was conducted in families of Turkish and southern European (TSE, n = 789) and Northern and Western European (NWE, N = 2,280) origin. A whole genome association (WGA) analysis (500K Affymetrix platform) was carried out in a set of unrelated NWE subjects consisting of approximately 1,000 subjects with dyslipidemia and 1,000 overweight subjects with normal lipids. Peak evidence for linkage occurred at chromosome 8p23 in NWE subjects (lod = 3.10) and at chromosome 3q28 near ADIPOQ, the adiponectin structural gene, in TSE subjects (lod = 1.70). In the WGA analysis, the single-nucleotide polymorphisms (SNPs) most strongly associated with adiponectin were rs3774261 and rs6773957 (P < 10(-7)). These two SNPs were in high linkage disequilibrium (r(2) = 0.98) and located within ADIPOQ. Interestingly, our fourth strongest region of association (P < 2 x 10(-5)) was to an SNP within CDH13, whose protein product is a newly identified receptor for high-molecular-weight species of adiponectin. Through WGA analysis, we confirmed previous studies showing SNPs within ADIPOQ to be strongly associated with variation in adiponectin levels and further observed these to have the strongest effects on adiponectin levels throughout the genome. We additionally identified a second gene (CDH13) possibly influencing variation in adiponectin levels. The impact of these SNPs on health and disease has yet to be determined.
Resumo:
A stringent branch-site codon model was used to detect positive selection in vertebrate evolution. We show that the test is robust to the large evolutionary distances involved. Positive selection was detected in 77% of 884 genes studied. Most positive selection concerns a few sites on a single branch of the phylogenetic tree: Between 0.9% and 4.7% of sites are affected by positive selection depending on the branches. No functional category was overrepresented among genes under positive selection. Surprisingly, whole genome duplication had no effect on the prevalence of positive selection, whether the fish-specific genome duplication or the two rounds at the origin of vertebrates. Thus positive selection has not been limited to a few gene classes, or to specific evolutionary events such as duplication, but has been pervasive during vertebrate evolution.
Resumo:
Extracellular calcium participates in several key physiological functions, such as control of blood coagulation, bone calcification or muscle contraction. Calcium homeostasis in humans is regulated in part by genetic factors, as illustrated by rare monogenic diseases characterized by hypo or hypercalcaemia. Both serum calcium and urinary calcium excretion are heritable continuous traits in humans. Serum calcium levels are tightly regulated by two main hormonal systems, i.e. parathyroid hormone and vitamin D, which are themselves also influenced by genetic factors. Recent technological advances in molecular biology allow for the screening of the human genome at an unprecedented level of detail and using hypothesis-free approaches, such as genome-wide association studies (GWAS). GWAS identified novel loci for calcium-related phenotypes (i.e. serum calcium and 25-OH vitamin D) that shed new light on the biology of calcium in humans. The substantial overlap (i.e. CYP24A1, CASR, GATA3; CYP2R1) between genes involved in rare monogenic diseases and genes located within loci identified in GWAS suggests a genetic and phenotypic continuum between monogenic diseases of calcium homeostasis and slight disturbances of calcium homeostasis in the general population. Future studies using whole-exome and whole-genome sequencing will further advance our understanding of the genetic architecture of calcium homeostasis in humans. These findings will likely provide new insight into the complex mechanisms involved in calcium homeostasis and hopefully lead to novel preventive and therapeutic approaches. Keyword: calcium, monogenic, genome-wide association studies, genetics.
Resumo:
Despite the development of novel typing methods based on whole genome sequencing, most laboratories still rely on classical molecular methods for outbreak investigation or surveillance. Reference methods for Clostridium difficile include ribotyping and pulsed-field gel electrophoresis, which are band-comparing methods often difficult to establish and which require reference strain collections. Here, we present the double locus sequence typing (DLST) scheme as a tool to analyse C. difficile isolates. Using a collection of clinical C. difficile isolates recovered during a 1-year period, we evaluated the performance of DLST and compared the results to multilocus sequence typing (MLST), a sequence-based method that has been used to study the structure of bacterial populations and highlight major clones. DLST had a higher discriminatory power compared to MLST (Simpson's index of diversity of 0.979 versus 0.965) and successfully identified all isolates of the study (100 % typeability). Previous studies showed that the discriminatory power of ribotyping was comparable to that of MLST; thus, DLST might be more discriminatory than ribotyping. DLST is easy to establish and provides several advantages, including absence of DNA extraction [polymerase chain reaction (PCR) is performed on colonies], no specific instrumentation, low cost and unambiguous definition of types. Moreover, the implementation of a DLST typing scheme on an Internet database, such as that previously done for Staphylococcus aureus and Pseudomonas aeruginosa ( http://www.dlst.org ), will allow users to easily obtain the DLST type by submitting directly sequencing files and will avoid problems associated with multiple databases.
Resumo:
UNLABELLED: Whole-genome sequencing (WGS) of 228 isolates was used to elucidate the origin and dynamics of a long-term outbreak of methicillin-resistant Staphylococcus aureus (MRSA) sequence type 228 (ST228) SCCmec I that involved 1,600 patients in a tertiary care hospital between 2008 and 2012. Combining of the sequence data with detailed metadata on patient admission and movement confirmed that the outbreak was due to the transmission of a single clonal variant of ST228, rather than repeated introductions of this clone into the hospital. We note that this clone is significantly more frequently recovered from groin and rectal swabs than other clones (P < 0.0001) and is also significantly more transmissible between roommates (P < 0.01). Unrecognized MRSA carriers, together with movements of patients within the hospital, also seem to have played a major role. These atypical colonization and transmission dynamics can help explain how the outbreak was maintained over the long term. This "stealthy" asymptomatic colonization of the gut, combined with heightened transmissibility (potentially reflecting a role for environmental reservoirs), means the dynamics of this outbreak share some properties with enteric pathogens such as vancomycin-resistant enterococci or Clostridium difficile. IMPORTANCE: Using whole-genome sequencing, we showed that a large and prolonged outbreak of methicillin-resistant Staphylococcus aureus was due to the clonal spread of a specific strain with genetic elements adapted to the hospital environment. Unrecognized MRSA carriers, the movement of patients within the hospital, and the low detection with clinical specimens were also factors that played a role in this occurrence. The atypical colonization of the gut means the dynamics of this outbreak may share some properties with enteric pathogens.
Resumo:
Metabolic traits are molecular phenotypes that can drive clinical phenotypes and may predict disease progression. Here, we report results from a metabolome- and genome-wide association study on (1)H-NMR urine metabolic profiles. The study was conducted within an untargeted approach, employing a novel method for compound identification. From our discovery cohort of 835 Caucasian individuals who participated in the CoLaus study, we identified 139 suggestively significant (P<5×10(-8)) and independent associations between single nucleotide polymorphisms (SNP) and metabolome features. Fifty-six of these associations replicated in the TasteSensomics cohort, comprising 601 individuals from São Paulo of vastly diverse ethnic background. They correspond to eleven gene-metabolite associations, six of which had been previously identified in the urine metabolome and three in the serum metabolome. Our key novel findings are the associations of two SNPs with NMR spectral signatures pointing to fucose (rs492602, P = 6.9×10(-44)) and lysine (rs8101881, P = 1.2×10(-33)), respectively. Fine-mapping of the first locus pinpointed the FUT2 gene, which encodes a fucosyltransferase enzyme and has previously been associated with Crohn's disease. This implicates fucose as a potential prognostic disease marker, for which there is already published evidence from a mouse model. The second SNP lies within the SLC7A9 gene, rare mutations of which have been linked to severe kidney damage. The replication of previous associations and our new discoveries demonstrate the potential of untargeted metabolomics GWAS to robustly identify molecular disease markers.
Resumo:
Many disorders are associated with altered serum protein concentrations, including malnutrition, cancer, and cardiovascular, kidney, and inflammatory diseases. Although these protein concentrations are highly heritable, relatively little is known about their underlying genetic determinants. Through transethnic meta-analysis of European-ancestry and Japanese genome-wide association studies, we identified six loci at genome-wide significance (p < 5 × 10(-8)) for serum albumin (HPN-SCN1B, GCKR-FNDC4, SERPINF2-WDR81, TNFRSF11A-ZCCHC2, FRMD5-WDR76, and RPS11-FCGRT, in up to 53,190 European-ancestry and 9,380 Japanese individuals) and three loci for total protein (TNFRS13B, 6q21.3, and ELL2, in up to 25,539 European-ancestry and 10,168 Japanese individuals). We observed little evidence of heterogeneity in allelic effects at these loci between groups of European and Japanese ancestry but obtained substantial improvements in the resolution of fine mapping of potential causal variants by leveraging transethnic differences in the distribution of linkage disequilibrium. We demonstrated a functional role for the most strongly associated serum albumin locus, HPN, for which Hpn knockout mice manifest low plasma albumin concentrations. Other loci associated with serum albumin harbor genes related to ribosome function, protein translation, and proteasomal degradation, whereas those associated with serum total protein include genes related to immune function. Our results highlight the advantages of transethnic meta-analysis for the discovery and fine mapping of complex trait loci and have provided initial insights into the underlying genetic architecture of serum protein concentrations and their association with human disease.
Resumo:
Whole-grain foods are touted for multiple health benefits, including enhancing insulin sensitivity and reducing type 2 diabetes risk. Recent genome-wide association studies (GWAS) have identified several single nucleotide polymorphisms (SNPs) associated with fasting glucose and insulin concentrations in individuals free of diabetes. We tested the hypothesis that whole-grain food intake and genetic variation interact to influence concentrations of fasting glucose and insulin. Via meta-analysis of data from 14 cohorts comprising ∼ 48,000 participants of European descent, we studied interactions of whole-grain intake with loci previously associated in GWAS with fasting glucose (16 loci) and/or insulin (2 loci) concentrations. For tests of interaction, we considered a P value <0.0028 (0.05 of 18 tests) as statistically significant. Greater whole-grain food intake was associated with lower fasting glucose and insulin concentrations independent of demographics, other dietary and lifestyle factors, and BMI (β [95% CI] per 1-serving-greater whole-grain intake: -0.009 mmol/l glucose [-0.013 to -0.005], P < 0.0001 and -0.011 pmol/l [ln] insulin [-0.015 to -0.007], P = 0.0003). No interactions met our multiple testing-adjusted statistical significance threshold. The strongest SNP interaction with whole-grain intake was rs780094 (GCKR) for fasting insulin (P = 0.006), where greater whole-grain intake was associated with a smaller reduction in fasting insulin concentrations in those with the insulin-raising allele. Our results support the favorable association of whole-grain intake with fasting glucose and insulin and suggest a potential interaction between variation in GCKR and whole-grain intake in influencing fasting insulin concentrations.
Resumo:
Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to population genomics analyses, it is important to assess the reliability of NGS data. Here, we evaluate the reliability of genotype calls and allele frequency estimates of the single-nucleotide polymorphisms (SNPs) reported by 1000G (phase I) at five HLA genes (HLA-A, -B, -C, -DRB1, and -DQB1). We take advantage of the availability of HLA Sanger sequencing of 930 of the 1092 1000G samples and use this as a gold standard to benchmark the 1000G data. We document that 18.6% of SNP genotype calls in HLA genes are incorrect and that allele frequencies are estimated with an error greater than ±0.1 at approximately 25% of the SNPs in HLA genes. We found a bias toward overestimation of reference allele frequency for the 1000G data, indicating mapping bias is an important cause of error in frequency estimation in this dataset. We provide a list of sites that have poor allele frequency estimates and discuss the outcomes of including those sites in different kinds of analyses. Because the HLA region is the most polymorphic in the human genome, our results provide insights into the challenges of using of NGS data at other genomic regions of high diversity.