13 resultados para bacteria genome nucleotide usage
em Duke University
Resumo:
BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.
Resumo:
Genome-wide association studies (GWAS) have now identified at least 2,000 common variants that appear associated with common diseases or related traits (http://www.genome.gov/gwastudies), hundreds of which have been convincingly replicated. It is generally thought that the associated markers reflect the effect of a nearby common (minor allele frequency >0.05) causal site, which is associated with the marker, leading to extensive resequencing efforts to find causal sites. We propose as an alternative explanation that variants much less common than the associated one may create "synthetic associations" by occurring, stochastically, more often in association with one of the alleles at the common site versus the other allele. Although synthetic associations are an obvious theoretical possibility, they have never been systematically explored as a possible explanation for GWAS findings. Here, we use simple computer simulations to show the conditions under which such synthetic associations will arise and how they may be recognized. We show that they are not only possible, but inevitable, and that under simple but reasonable genetic models, they are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies. We also illustrate the behavior of synthetic associations in real datasets by showing that rare causal mutations responsible for both hearing loss and sickle cell anemia create genome-wide significant synthetic associations, in the latter case extending over a 2.5-Mb interval encompassing scores of "blocks" of associated variants. In conclusion, uncommon or rare genetic variants can easily create synthetic associations that are credited to common variants, and this possibility requires careful consideration in the interpretation and follow up of GWAS signals.
Resumo:
In many bacteria, there is a genome-wide bias towards co-orientation of replication and transcription, with essential and/or highly-expressed genes further enriched co-directionally. We previously found that reversing this bias in the bacterium Bacillus subtilis slows replication elongation, and we proposed that this effect contributes to the evolutionary pressure selecting the transcription-replication co-orientation bias. This selection might have been based purely on selection for speedy replication; alternatively, the slowed replication might actually represent an average of individual replication-disruption events, each of which is counter-selected independently because genome integrity is selected. To differentiate these possibilities and define the precise forces driving this aspect of genome organization, we generated new strains with inversions either over approximately 1/4 of the chromosome or at ribosomal RNA (rRNA) operons. Applying mathematical analysis to genomic microarray snapshots, we found that replication rates vary dramatically within the inverted genome. Replication is moderately impeded throughout the inverted region, which results in a small but significant competitive disadvantage in minimal medium. Importantly, replication is strongly obstructed at inverted rRNA loci in rich medium. This obstruction results in disruption of DNA replication, activation of DNA damage responses, loss of genome integrity, and cell death. Our results strongly suggest that preservation of genome integrity drives the evolution of co-orientation of replication and transcription, a conserved feature of genome organization.
Resumo:
To investigate the underlying mechanisms of T2D pathogenesis, we looked for diabetes susceptibility genes that increase the risk of type 2 diabetes (T2D) in a Han Chinese population. A two-stage genome-wide association (GWA) study was conducted, in which 995 patients and 894 controls were genotyped using the Illumina HumanHap550-Duo BeadChip for the first genome scan stage. This was further replicated in 1,803 patients and 1,473 controls in stage 2. We found two loci not previously associated with diabetes susceptibility in and around the genes protein tyrosine phosphatase receptor type D (PTPRD) (P = 8.54x10(-10); odds ratio [OR] = 1.57; 95% confidence interval [CI] = 1.36-1.82), and serine racemase (SRR) (P = 3.06x10(-9); OR = 1.28; 95% CI = 1.18-1.39). We also confirmed that variants in KCNQ1 were associated with T2D risk, with the strongest signal at rs2237895 (P = 9.65x10(-10); OR = 1.29, 95% CI = 1.19-1.40). By identifying two novel genetic susceptibility loci in a Han Chinese population and confirming the involvement of KCNQ1, which was previously reported to be associated with T2D in Japanese and European descent populations, our results may lead to a better understanding of differences in the molecular pathogenesis of T2D among various populations.
Resumo:
Lipoprotein-associated phospholipase A(2) (Lp-PLA(2)) is an emerging risk factor and therapeutic target for cardiovascular disease. The activity and mass of this enzyme are heritable traits, but major genetic determinants have not been explored in a systematic, genome-wide fashion. We carried out a genome-wide association study of Lp-PLA(2) activity and mass in 6,668 Caucasian subjects from the population-based Framingham Heart Study. Clinical data and genotypes from the Affymetrix 550K SNP array were obtained from the open-access Framingham SHARe project. Each polymorphism that passed quality control was tested for associations with Lp-PLA(2) activity and mass using linear mixed models implemented in the R statistical package, accounting for familial correlations, and controlling for age, sex, smoking, lipid-lowering-medication use, and cohort. For Lp-PLA(2) activity, polymorphisms at four independent loci reached genome-wide significance, including the APOE/APOC1 region on chromosome 19 (p = 6 x 10(-24)); CELSR2/PSRC1 on chromosome 1 (p = 3 x 10(-15)); SCARB1 on chromosome 12 (p = 1x10(-8)) and ZNF259/BUD13 in the APOA5/APOA1 gene region on chromosome 11 (p = 4 x 10(-8)). All of these remained significant after accounting for associations with LDL cholesterol, HDL cholesterol, or triglycerides. For Lp-PLA(2) mass, 12 SNPs achieved genome-wide significance, all clustering in a region on chromosome 6p12.3 near the PLA2G7 gene. Our analyses demonstrate that genetic polymorphisms may contribute to inter-individual variation in Lp-PLA(2) activity and mass.
Resumo:
There is great interindividual variability in HIV-1 viral setpoint after seroconversion, some of which is known to be due to genetic differences among infected individuals. Here, our focus is on determining, genome-wide, the contribution of variable gene expression to viral control, and to relate it to genomic DNA polymorphism. RNA was extracted from purified CD4+ T-cells from 137 HIV-1 seroconverters, 16 elite controllers, and 3 healthy blood donors. Expression levels of more than 48,000 mRNA transcripts were assessed by the Human-6 v3 Expression BeadChips (Illumina). Genome-wide SNP data was generated from genomic DNA using the HumanHap550 Genotyping BeadChip (Illumina). We observed two distinct profiles with 260 genes differentially expressed depending on HIV-1 viral load. There was significant upregulation of expression of interferon stimulated genes with increasing viral load, including genes of the intrinsic antiretroviral defense. Upon successful antiretroviral treatment, the transcriptome profile of previously viremic individuals reverted to a pattern comparable to that of elite controllers and of uninfected individuals. Genome-wide evaluation of cis-acting SNPs identified genetic variants modulating expression of 190 genes. Those were compared to the genes whose expression was found associated with viral load: expression of one interferon stimulated gene, OAS1, was found to be regulated by a SNP (rs3177979, p = 4.9E-12); however, we could not detect an independent association of the SNP with viral setpoint. Thus, this study represents an attempt to integrate genome-wide SNP signals with genome-wide expression profiles in the search for biological correlates of HIV-1 control. It underscores the paradox of the association between increasing levels of viral load and greater expression of antiviral defense pathways. It also shows that elite controllers do not have a fully distinctive mRNA expression pattern in CD4+ T cells. Overall, changes in global RNA expression reflect responses to viral replication rather than a mechanism that might explain viral control.
Resumo:
Alzheimer's disease is a complex and progressive neurodegenerative disease leading to loss of memory, cognitive impairment, and ultimately death. To date, six large-scale genome-wide association studies have been conducted to identify SNPs that influence disease predisposition. These studies have confirmed the well-known APOE epsilon4 risk allele, identified a novel variant that influences disease risk within the APOE epsilon4 population, found a SNP that modifies the age of disease onset, as well as reported the first sex-linked susceptibility variant. Here we report a genome-wide scan of Alzheimer's disease in a set of 331 cases and 368 controls, extending analyses for the first time to include assessments of copy number variation. In this analysis, no new SNPs show genome-wide significance. We also screened for effects of copy number variation, and while nothing was significant, a duplication in CHRNA7 appears interesting enough to warrant further investigation.
Resumo:
The humoral immune system plays a critical role in the clearance of numerous pathogens. In the setting of HIV-1 infection, the virus infects, integrates its genome into the host's cells, replicates, and establishes a reservoir of virus-infected cells. The initial antibody response to HIV-1 infection is targeted to non-neutralizing epitopes on HIV-1 Env gp41, and when a neutralizing response does develop months after transmission, it is specific for the autologous founder virus and the virus escapes rapidly. After continuous waves of antibody mediated neutralization and viral escape, a small subset of infected individuals eventually develop broad and potent heterologous neutralizing antibodies years after infection. In this dissertation, I have studied the ontogeny of mucosal and systemic antibody responses to HIV-1 infection by means of three distinct aims: 1. Determine the origin of the initial antibody response to HIV-1 infection. 2. Characterize the role of restricted VH and VL gene segment usage in shaping the antibody response to HIV-1 infection. 3. Determine the role of persistence of B cell clonal lineages in shaping the mutation frequencies of HIV-1 reactive antibodies.
After the introduction (Chapter 1) and methods (Chapter 2), Chapter 3 of this dissertation describes a study of the antibody response of terminal ileum B cells to HIV-1 envelope (Env) in early and chronic HIV-1 infection and provides evidence for the role of environmental antigens in shaping the repertoire of B cells that respond to HIV-1 infection. Previous work by Liao et al. demonstrated that the initial plasma cell response in the blood to acute HIV-1 infection is to gp41 and is derived from a polyreactive memory B cell pool. Many of these antibodies cross-reacted with commensal bacteria, Therefore, in Chapter 3, the relationship of intestinal B cell reactivity with commensal bacteria to HIV-1 infection-induced antibody response was probed using single B cell sorting, reverse transcription and nested polymerase chain reaction (RT- PCR) methods, and recombinant antibody technology. The dominant B cell response in the terminal ileum was to HIV-1 envelope (Env) gp41, and 82% of gp41- reactive antibodies cross-reacted with commensal bacteria whole cell lysates. Pyrosequencing of blood B cells revealed HIV-1 antibody clonal lineages shared between ileum and blood. Mutated IgG antibodies cross-reactive with both Env gp41 and commensal bacteria could also be isolated from the terminal ileum of HIV-1 uninfected individuals. Thus, the antibody response to HIV-1 can be shaped by intestinal B cells stimulated by commensal bacteria prior to HIV-1 infection to develop a pre-infection pool of memory B cells cross-reactive with HIV-1 gp41.
Chapter 4 details the study of restricted VH and VL gene segment usage for gp41 and gp120 antibody induction following acute HIV-1 infection; mutations in gp41 lead to virus enhanced neutralization sensitivity. The B cell repertoire of antibodies induced in a HIV-1 infected African individual, CAP206, who developed broadly neutralizing antibodies (bnAbs) directed to the HIV-1 envelope gp41 membrane proximal external region (MPER), is characterized. Understanding the selection of virus mutants by neutralizing antibodies is critical to understanding the role of antibodies in control of HIV-1 replication and prevention from HIV-1 infection. Previously, an MPER neutralizing antibody, CAP206-CH12, with the binding footprint identical to that of MPER broadly neutralizing antibody 4E10, that like 4E10 utilized the VH1-69 and VK3-20 variable gene segments was isolated from this individual (Morris et al., 2011). Using single B cell sorting, RT- PCR methods, and recombinant antibody technology, Chapter 4 describes the isolation of a VH1-69, Vk3-20 glycan-dependent clonal lineage from CAP206, targeted to gp120, that has the property of neutralizing a neutralization sensitive CAP206 transmitted/founder (T/F) and heterologous viruses with mutations at amino acids 680 or 681 in the MPER 4E10/CH12 binding site. These data demonstrate sites within the MPER bnAb epitope (aa 680-681) in which mutations can be selected that lead to viruses with enhanced sensitivity to autologous and heterologous neutralizing antibodies.
In Chapter 5, I have completed a comparison of evolution of B cell clonal lineages in two HIV-1 infected individuals who have a predominant VH1-69 response to HIV-1 infection--one who produces broadly neutralizing MPER-reactive mAbs and one who does not. Autologous neutralization in the plasma takes ~12 weeks to develop (Gray et al., 2007; Tomaras et al., 2008b). Only a small subset of HIV-1 infected individuals develops high plasma levels of broad and potent heterologous neutralization, and when it does occur, it typically takes 3-4 years to develop (Euler et al., 2010; Gray et al., 2007; 2011; Tomaras et al., 2011). The HIV-1 bnAbs that have been isolated to date have a number of unusual characteristics including, autoreactivity and high levels of somatic hypermutations, which are typically tightly regulated by immune control mechanisms (Haynes et al., 2005; 2012b; Kwong and Mascola, 2012; Scheid et al., 2009a). The VH mutation frequencies of bnAbs average ~15% but have been shown to be as high as 32% (reviewed in Mascola and Haynes, 2013; Kwong and Mascola, 2012). The high frequency of somatic hypermutations suggests that the B cell clonal lineages that eventually produce bnAbs undergo high-levels of affinity maturation, implying prolonged germinal center (GC) reactions and high levels of T cell help. To study the duration of HIV-1- reactive B cell clonal persistence, HIV-1 reactive and non HIV-1- reactive B cell clonal lineages were isolated from an HIV-1 infected individual that produces bnAbs, CAP206, and an HIV-1 infected individual who does not produce bnAbs, 004-0. Single B cell sorting, RT-PCR and recombinant antibody technology was used to isolate and produce monoclonal antibodies from multiple time points from each individual. B cell sequences clonally related to mAbs isolated by single cell PCR were identified within pyrosequences of longitudinal samples of these two individuals. Both individuals produced long-lived B cell clones that persisted from 0-232 weeks in CAP206, and 0-238 weeks in 004-0. The average length of persistence of clones containing members isolated from two separate time points was 91.5 weeks both individuals. Examples of the continued evolution of clonal lineages were observed in both the bnAb and non-bnAb individual. These data indicated that the ability to generate persistent and evolving B cell clonal lineages occurs in both bnAb and non-bnAb individuals, suggesting that some alternative host or viral factor is critical for the generation of highly mutated broadly neutralizing antibodies.
Together the studies described in Chapter 3-5 show that multiple factors influence the antibody response to HIV-1 infection. The initial antibody response to HIV-1 Env gp41 can be shaped by a B cell response to intestinal commensal bacteria prior to HIV-1 infection. VH and VL gene segment restriction can impact the B cell response to multiple HIV-1 antigens, and virus escape mutations in the MPER can confer enhanced neutralization sensitivity to autologous and heterologous antibodies. Finally, the ability to generate long-lived HIV-1 clonal lineages in and of itself does not confer on the host the ability to produce bnAbs.
Resumo:
BACKGROUND: While effective population size (Ne) and life history traits such as generation time are known to impact substitution rates, their potential effects on base composition evolution are less well understood. GC content increases with decreasing body mass in mammals, consistent with recombination-associated GC biased gene conversion (gBGC) more strongly impacting these lineages. However, shifts in chromosomal architecture and recombination landscapes between species may complicate the interpretation of these results. In birds, interchromosomal rearrangements are rare and the recombination landscape is conserved, suggesting that this group is well suited to assess the impact of life history on base composition. RESULTS: Employing data from 45 newly and 3 previously sequenced avian genomes covering a broad range of taxa, we found that lineages with large populations and short generations exhibit higher GC content. The effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage. Consistent with recombination driving base composition, GC content and heterogeneity were positively correlated with the rate of recombination. Moreover, we observed ongoing increases in GC in the majority of lineages. CONCLUSIONS: Our results provide evidence that gBGC may drive patterns of nucleotide composition in avian genomes and are consistent with more effective gBGC in large populations and a greater number of meioses per unit time; that is, a shorter generation time. Thus, in accord with theoretical predictions, base composition evolution is substantially modulated by species life history.
Resumo:
OBJECTIVES: Identification of patient subpopulations susceptible to develop myocardial infarction (MI) or, conversely, those displaying either intrinsic cardioprotective phenotypes or highly responsive to protective interventions remain high-priority knowledge gaps. We sought to identify novel common genetic variants associated with perioperative MI in patients undergoing coronary artery bypass grafting using genome-wide association methodology. SETTING: 107 secondary and tertiary cardiac surgery centres across the USA. PARTICIPANTS: We conducted a stage I genome-wide association study (GWAS) in 1433 ethnically diverse patients of both genders (112 cases/1321 controls) from the Genetics of Myocardial Adverse Outcomes and Graft Failure (GeneMAGIC) study, and a stage II analysis in an expanded population of 2055 patients (225 cases/1830 controls) combined from the GeneMAGIC and Duke Perioperative Genetics and Safety Outcomes (PEGASUS) studies. Patients undergoing primary non-emergent coronary bypass grafting were included. PRIMARY AND SECONDARY OUTCOME MEASURES: The primary outcome variable was perioperative MI, defined as creatine kinase MB isoenzyme (CK-MB) values ≥10× upper limit of normal during the first postoperative day, and not attributable to preoperative MI. Secondary outcomes included postoperative CK-MB as a quantitative trait, or a dichotomised phenotype based on extreme quartiles of the CK-MB distribution. RESULTS: Following quality control and adjustment for clinical covariates, we identified 521 single nucleotide polymorphisms in the stage I GWAS analysis. Among these, 8 common variants in 3 genes or intergenic regions met p<10(-5) in stage II. A secondary analysis using CK-MB as a quantitative trait (minimum p=1.26×10(-3) for rs609418), or a dichotomised phenotype based on extreme CK-MB values (minimum p=7.72×10(-6) for rs4834703) supported these findings. Pathway analysis revealed that genes harbouring top-scoring variants cluster in pathways of biological relevance to extracellular matrix remodelling, endoplasmic reticulum-to-Golgi transport and inflammation. CONCLUSIONS: Using a two-stage GWAS and pathway analysis, we identified and prioritised several potential susceptibility loci for perioperative MI.
Resumo:
Determination of copy number variants (CNVs) inferred in genome wide single nucleotide polymorphism arrays has shown increasing utility in genetic variant disease associations. Several CNV detection methods are available, but differences in CNV call thresholds and characteristics exist. We evaluated the relative performance of seven methods: circular binary segmentation, CNVFinder, cnvPartition, gain and loss of DNA, Nexus algorithms, PennCNV and QuantiSNP. Tested data included real and simulated Illumina HumHap 550 data from the Singapore cohort study of the risk factors for Myopia (SCORM) and simulated data from Affymetrix 6.0 and platform-independent distributions. The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization before enacting full analysis. We used 10 SCORM samples for optimizing parameter settings for each method and then evaluated method performance at optimal parameters using 100 SCORM samples. The statistical power, false positive rates, and receiver operating characteristic (ROC) curve residuals were evaluated by simulation studies. Optimal parameters, as determined by NSR and ROC curve residuals, were consistent across datasets. QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs.
Resumo:
Limited data are available regarding the molecular epidemiology of Mycobacterium tuberculosis (Mtb) strains circulating in Guatemala. Beijing-lineage Mtb strains have gained prevalence worldwide and are associated with increased virulence and drug resistance, but there have been only a few cases reported in Central America. Here we report the first whole genome sequencing of Central American Beijing-lineage strains of Mtb. We find that multiple Beijing-lineage strains, derived from independent founding events, are currently circulating in Guatemala, but overall still represent a relatively small proportion of disease burden. Finally, we identify a specific Beijing-lineage outbreak centered on a poor neighborhood in Guatemala City.
Resumo:
A previous genome-wide association study (GWAS) of more than 100,000 individuals identified molecular-genetic predictors of educational attainment. We undertook in-depth life-course investigation of the polygenic score derived from this GWAS using the four-decade Dunedin Study (N = 918). There were five main findings. First, polygenic scores predicted adult economic outcomes even after accounting for educational attainments. Second, genes and environments were correlated: Children with higher polygenic scores were born into better-off homes. Third, children's polygenic scores predicted their adult outcomes even when analyses accounted for their social-class origins; social-mobility analysis showed that children with higher polygenic scores were more upwardly mobile than children with lower scores. Fourth, polygenic scores predicted behavior across the life course, from early acquisition of speech and reading skills through geographic mobility and mate choice and on to financial planning for retirement. Fifth, polygenic-score associations were mediated by psychological characteristics, including intelligence, self-control, and interpersonal skill. Effect sizes were small. Factors connecting DNA sequence with life outcomes may provide targets for interventions to promote population-wide positive development.