961 resultados para Genome-wide linkage
Resumo:
Genome-wide association studies (GWAS) are used to discover genes underlying complex, heritable disorders for which less powerful study designs have failed in the past. The number of GWAS has skyrocketed recently with findings reported in top journals and the mainstream media. Mircorarrays are the genotype calling technology of choice in GWAS as they permit exploration of more than a million single nucleotide polymorphisms (SNPs)simultaneously. The starting point for the statistical analyses used by GWAS, to determine association between loci and disease, are genotype calls (AA, AB, or BB). However, the raw data, microarray probe intensities, are heavily processed before arriving at these calls. Various sophisticated statistical procedures have been proposed for transforming raw data into genotype calls. We find that variability in microarray output quality across different SNPs, different arrays, and different sample batches has substantial inuence on the accuracy of genotype calls made by existing algorithms. Failure to account for these sources of variability, GWAS run the risk of adversely affecting the quality of reported findings. In this paper we present solutions based on a multi-level mixed model. Software implementation of the method described in this paper is available as free and open source code in the crlmm R/BioConductor.
Resumo:
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).
Resumo:
Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Resumo:
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.
Resumo:
Little is known about the genes and proteins involved in the process of human memory. To identify genetic factors related to human episodic memory performance, we conducted an ultra-high-density genome-wide screen at > 500 000 single nucleotide polymorphisms (SNPs) in a sample of normal young adults stratified for performance on an episodic recall memory test. Analysis of this data identified SNPs within the calmodulin-binding transcription activator 1 (CAMTA1) gene that were significantly associated with memory performance. A follow up study, focused on the CAMTA1 locus in an independent cohort consisting of cognitively normal young adults, singled out SNP rs4908449 with a P-value of 0.0002 as the most significant associated SNP in the region. These validated genetic findings were further supported by the identification of CAMTA1 transcript enrichment in memory-related human brain regions and through a functional magnetic resonance imaging experiment on individuals matched for memory performance that identified CAMTA1 allele-specific upregulation of medial temporal lobe brain activity in those individuals harboring the 'at-risk' allele for poorer memory performance. The CAMTA1 locus encodes a purported transcription factor that interfaces with the calcium-calmodulin system of the cell to alter gene expression patterns. Our validated genomic and functional biological findings described herein suggest a role for CAMTA1 in human episodic memory.
Resumo:
To identify components of the copper homeostatic mechanism of Lactococcus lactis, we employed two-dimensional gel electrophoresis to detect changes in the proteome in response to copper. Three proteins upregulated by copper were identified: glyoxylase I (YaiA), a nitroreductase (YtjD), and lactate oxidase (LctO). The promoter regions of these genes feature cop boxes of consensus TACAnnTGTA, which are the binding site of CopY-type copper-responsive repressors. A genome-wide search for cop boxes revealed 28 such sequence motifs. They were tested by electrophoretic mobility shift assays for the interaction with purified CopR, the CopY-type repressor of L. lactis. Seven of the cop boxes interacted with CopR in a copper-sensitive manner. They were present in the promoter region of five genes, lctO, ytjD, copB, ydiD, and yahC; and two polycistronic operons, yahCD-yaiAB and copRZA. Induction of these genes by copper was confirmed by real-time quantitative PCR. The copRZA operon encodes the CopR repressor of the regulon; a copper chaperone, CopZ; and a putative copper ATPase, CopA. When expressed in Escherichia coli, the copRZA operon conferred copper resistance, suggesting that it functions in copper export from the cytoplasm. Other member genes of the CopR regulon may similarly be involved in copper metabolism.
Resumo:
Hardwoods comprise about half of the biomass of forestlands in North America and present many uses including economic, ecological and aesthetic functions. Forest trees rely on the genetic variation within tree populations to overcome the many biotic, abiotic, anthropogenic factors which are further worsened by climate change, that threaten their continued survival and functionality. To harness these inherent genetic variations of tree populations, informed knowledge of the genomic resources and techniques, which are currently lacking or very limited, are imperative for forest managers. The current study therefore aimed to develop genomic microsatellite markers for the leguminous tree species, honey locust, Gleditsia triacanthos L. and test their applicability in assessing genetic variation, estimation of gene flow patterns and identification of a full-sib mapping population. We also aimed to test the usefulness of already developed nuclear and gene-based microsatellite markers in delineation of species and taxonomic relationships between four of the taxonomically difficult Section Lobatae species (Quercus coccinea, Q. ellipsoidalis, Q. rubra and Q. velutina. We recorded 100% amplification of G. triacanthos genomic microsatellites developed using Illumina sequencing techniques in a panel of seven unrelated individuals with 14 of these showing high polymorphism and reproducibility. When characterized in 36 natural population samples, we recorded 20 alleles per locus with no indication for null alleles at 13 of the 14 microsatellites. This is the first report of genomic microsatellites for this species. Honey locust trees occur in fragmented populations of abandoned farmlands and pastures and is described as essentially dioecious. Pollen dispersal if the main source of gene flow within and between populations with the ability to offset the effects of random genetic drift. Factors known to influence gene include fragmentation and degree of isolation, which make the patterns gene flow in fragmented populations of honey locust a necessity for their sustainable management. In this follow-up study, we used a subset of nine of the 14 developed gSSRs to estimate gene flow and identify a full-sib mapping population in two isolated fragments of honey locust. Our analyses indicated that the majority of the seedlings (65-100% - at both strict and relaxed assignment thresholds) were sired by pollen from outside the two fragment populations. Only one selfing event was recorded confirming the functional dioeciousness of honey locust and that the seed parents are almost completely outcrossed. From the Butternut Valley, TN population, pollen donor genotypes were reconstructed and used in paternity assignment analyses to identify a relatively large full-sib family comprised of 149 individuals, proving the usefulness of isolated forest fragments in identification of full-sib families. In the Ames Plantation stand, contemporary pollen dispersal followed a fat-tailed exponential-power distribution, an indication of effective gene flow. Our estimate of δ was 4,282.28 m, suggesting that insect pollinators of honey locust disperse pollen over very long distances. The high proportion of pollen influx into our sampled population implies that our fragment population forms part of a large effectively reproducing population. The high tendency of oak species to hybridize while still maintaining their species identity make it difficult to resolve their taxonomic relationships. Oaks of the section Lobatae are famous in this regard and remain unresolved at both morphological and genetic markers. We applied 28 microsatellite markers including outlier loci with potential roles in reproductive isolation and adaptive divergence between species to natural populations of four known interfertile red oaks, Q. coccinea, Q. ellpsoidalis, Q. rubra and Q. velutina. To better resolve the taxonomic relationships in this difficult clade, we assigned individual samples to species, identified hybrids and introgressive forms and reconstructed phylogenetic relationships among the four species after exclusion of genetically intermediate individuals. Genetic assignment analyses identified four distinct species clusters, with Q. rubra most differentiated from the three other species, but also with a comparatively large number of misclassified individuals (7.14%), hybrids (7.14%) and introgressive forms (18.83%) between Q. ellipsoidalis and Q. velutina. After the exclusion of genetically intermediate individuals, Q. ellipsoidalis grouped as sister species to the largely parapatric Q. coccinea with high bootstrap support (91 %). Genetically intermediate forms in a mixed species stand were located proximate to both potential parental species, which supports recent hybridization of Q. velutina with both Q. ellipsoidalis and Q. rubra. Analyses of genome-wide patterns of interspecific differentiation can provide a better understanding of speciation processes and taxonomic relationships in this taxonomically difficult group of red oak species.
Resumo:
BACKGROUND: Microarray genome analysis is realising its promise for improving detection of genetic abnormalities in individuals with mental retardation and congenital abnormality. Copy number variations (CNVs) are now readily detectable using a variety of platforms and a major challenge is the distinction of pathogenic from ubiquitous, benign polymorphic CNVs. The aim of this study was to investigate replacement of time consuming, locus specific testing for specific microdeletion and microduplication syndromes with microarray analysis, which theoretically should detect all known syndromes with CNV aetiologies as well as new ones. METHODS: Genome wide copy number analysis was performed on 117 patients using Affymetrix 250K microarrays. RESULTS: 434 CNVs (195 losses and 239 gains) were found, including 18 pathogenic CNVs and 9 identified as "potentially pathogenic". Almost all pathogenic CNVs were larger than 500 kb, significantly larger than the median size of all CNVs detected. Segmental regions of loss of heterozygosity larger than 5 Mb were found in 5 patients. CONCLUSIONS: Genome microarray analysis has improved diagnostic success in this group of patients. Several examples of recently discovered "new syndromes" were found suggesting they are more common than previously suspected and collectively are likely to be a major cause of mental retardation. The findings have several implications for clinical practice. The study revealed the potential to make genetic diagnoses that were not evident in the clinical presentation, with implications for pretest counselling and the consent process. The importance of contributing novel CNVs to high quality databases for genotype-phenotype analysis and review of guidelines for selection of individuals for microarray analysis is emphasised.
Resumo:
Acute infection with the hepatitis C virus (HCV) induces a wide range of innate and adaptive immune responses. A total of 20-50% of acutely HCV-infected individuals permanently control the virus, referred to as 'spontaneous hepatitis C clearance', while the infection progresses to chronic hepatitis C in the majority of cases. Numerous studies have examined host genetic determinants of hepatitis C infection outcome and revealed the influence of genetic polymorphisms of human leukocyte antigens, killer immunoglobulin-like receptors, chemokines, interleukins and interferon-stimulated genes on spontaneous hepatitis C clearance. However, most genetic associations were not confirmed in independent cohorts, revealed opposing results in diverse populations or were limited by varying definitions of hepatitis C outcomes or small sample size. Coordinated efforts are needed in the search for key genetic determinants of spontaneous hepatitis C clearance that include well-conducted candidate genetic and genome-wide association studies, direct sequencing and follow-up functional studies.
Resumo:
The development of a completely annotated sheep genome sequence is a key need for understanding the phylogenetic relationships and genetic diversity among the many different sheep breeds worldwide and for identifying genes controlling economically and physiologically important traits. The ovine genome sequence assembly will be crucial for developing optimized breeding programs based on highly productive, healthy sheep phenotypes that are adapted to modern breeding and production conditions. Scientists and breeders around the globe have been contributing to this goal by generating genomic and cDNA libraries, performing genome-wide and trait-associated analyses of polymorphism, expression analysis, genome sequencing, and by developing virtual and physical comparative maps. The International Sheep Genomics Consortium (ISGC), an informal network of sheep genomics researchers, is playing a major role in coordinating many of these activities. In addition to serving as an essential tool for monitoring chromosome abnormalities in specific sheep populations, ovine molecular cytogenetics provides physical anchors which link and order genome regions, such as sequence contigs, genes and polymorphic DNA markers to ovine chromosomes. Likewise, molecular cytogenetics can contribute to the process of defining evolutionary breakpoints between related species. The selective expansion of the sheep cytogenetic map, using loci to connect maps and identify chromosome bands, can substantially contribute to improving the quality of the annotated sheep genome sequence and will also accelerate its assembly. Furthermore, identifying major morphological chromosome anomalies and micro-rearrangements, such as gene duplications or deletions, that might occur between different sheep breeds and other Ovis species will also be important to understand the diversity of sheep chromosome structure and its implications for cross-breeding. To date, 566 loci have been assigned to specific chromosome regions in sheep and the new cytogenetic map is presented as part of this review. This review will also summarize the current cytogenomic status of the sheep genome, describe current activities in the sheep cytogenomics research sector, and will discuss the cytogenomics data in context with other major sheep genomics projects.
Resumo:
Attention deficit/hyperactivity disorder (ADHD) is a highly heritable neurodevelopmental disorder of childhood onset. Clinical and biological evidence points to shared common central nervous system (CNS) pathology of ADHD and restless legs syndrome (RLS). It was hypothesized that variants previously found to be associated with RLS in two large genome-wide association studies (GWA), will also be associated with ADHD. SNPs located in MEIS1 (rs2300478), BTBD9 (rs9296249, rs3923809, rs6923737), and MAP2K5 (rs12593813, rs4489954) as well as three SNPs tagging the identified haplotype in MEIS1 (rs6710341, rs12469063, rs4544423) were genotyped in a well characterized German sample of 224 families comprising one or more affected sibs (386 children) and both parents. We found no evidence for preferential transmission of the hypothesized variants to ADHD. Subsequent analyses elicited nominal significant association with haplotypes consisting of the three SNPs in BTBD9 (chi2 = 14.8, df = 7, nominal p = 0.039). According to exploratory post hoc analyses, the major contribution to this finding came from the A-A-A-haplotype with a haplotype-wise nominal p-value of 0.009. However, this result did not withstand correction for multiple testing. In view of our results, RLS risk alleles may have a lower effect on ADHD than on RLS or may not be involved in ADHD. The negative findings may additionally result from genetic heterogeneity of ADHD, i.e. risk alleles for RLS may only be relevant for certain subtypes of ADHD. Genes relevant to RLS remain interesting candidates for ADHD; particularly BTBD9 needs further study, as it has been related to iron storage, a potential pathophysiological link between RLS and certain subtypes of ADHD.
Resumo:
Lung function measures are heritable, predict mortality and are relevant in diagnosis of chronic obstructive pulmonary disease (COPD). COPD and asthma are diseases of the airways with major public health impacts and each have a heritable component. Genome-wide association studies of SNPs have revealed novel genetic associations with both diseases but only account for a small proportion of the heritability. Complex copy number variation may account for some of the missing heritability. A well-characterised genomic region of complex copy number variation contains beta-defensin genes (DEFB103, DEFB104 and DEFB4), which have a role in the innate immune response. Previous studies have implicated these and related genes as being associated with asthma or COPD. We hypothesised that copy number variation of these genes may play a role in lung function in the general population and in COPD and asthma risk. We undertook copy number typing of this locus in 1149 adult and 689 children using a paralogue ratio test and investigated association with COPD, asthma and lung function. Replication of findings was assessed in a larger independent sample of COPD cases and smoking controls. We found evidence for an association of beta-defensin copy number with COPD in the adult cohort (OR = 1.4, 95%CI:1.02-1.92, P = 0.039) but this finding, and findings from a previous study, were not replicated in a larger follow-up sample(OR = 0.89, 95%CI:0.72-1.07, P = 0.217). No robust evidence of association with asthma in children was observed. We found no evidence for association between beta-defensin copy number and lung function in the general populations. Our findings suggest that previous reports of association of beta-defensin copy number with COPD should be viewed with caution. Suboptimal measurement of copy number can lead to spurious associations. Further beta-defensin copy number measurement in larger sample sizes of COPD cases and children with asthma are needed.
Resumo:
Coat color and pattern variations in domestic animals are frequently inherited as simple monogenic traits, but a number are known to have a complex genetic basis. While the analysis of complex trait data remains a challenge in all species, we can use the reduced haplotypic diversity in domestic animal populations to gain insight into the genomic interactions underlying complex phenotypes. White face and leg markings are examples of complex traits in horses where little is known of the underlying genetics. In this study, Franches-Montagnes (FM) horses were scored for the occurrence of white facial and leg markings using a standardized scoring system. A genome-wide association study (GWAS) was performed for several white patterning traits in 1,077 FM horses. Seven quantitative trait loci (QTL) affecting the white marking score with p-values p≤10(-4) were identified. Three loci, MC1R and the known white spotting genes, KIT and MITF, were identified as the major loci underlying the extent of white patterning in this breed. Together, the seven loci explain 54% of the genetic variance in total white marking score, while MITF and KIT alone account for 26%. Although MITF and KIT are the major loci controlling white patterning, their influence varies according to the basic coat color of the horse and the specific body location of the white patterning. Fine mapping across the MITF and KIT loci was used to characterize haplotypes present. Phylogenetic relationships among haplotypes were calculated to assess their selective and evolutionary influences on the extent of white patterning. This novel approach shows that KIT and MITF act in an additive manner and that accumulating mutations at these loci progressively increase the extent of white markings.
Resumo:
Hereditary nasal parakeratosis (HNPK), an inherited monogenic autosomal recessive skin disorder, leads to crusts and fissures on the nasal planum of Labrador Retrievers. We performed a genome-wide association study (GWAS) using 13 HNPK cases and 23 controls. We obtained a single strong association signal on chromosome 2 (p(raw) = 4.4×10⁻¹⁴). The analysis of shared haplotypes among the 13 cases defined a critical interval of 1.6 Mb with 25 predicted genes. We re-sequenced the genome of one case at 38× coverage and detected 3 non-synonymous variants in the critical interval with respect to the reference genome assembly. We genotyped these variants in larger cohorts of dogs and only one was perfectly associated with the HNPK phenotype in a cohort of more than 500 dogs. This candidate causative variant is a missense variant in the SUV39H2 gene encoding a histone 3 lysine 9 (H3K9) methyltransferase, which mediates chromatin silencing. The variant c.972T>G is predicted to change an evolutionary conserved asparagine into a lysine in the catalytically active domain of the enzyme (p.N324K). We further studied the histopathological alterations in the epidermis in vivo. Our data suggest that the HNPK phenotype is not caused by hyperproliferation, but rather delayed terminal differentiation of keratinocytes. Thus, our data provide evidence that SUV39H2 is involved in the epigenetic regulation of keratinocyte differentiation ensuring proper stratification and tight sealing of the mammalian epidermis.
Resumo:
We describe a mild form of disproportionate dwarfism in Labrador Retrievers, which is not associated with any obvious health problems such as secondary arthrosis. We designate this phenotype as skeletal dysplasia 2 (SD2). It is inherited as a monogenic autosomal recessive trait with incomplete penetrance primarily in working lines of the Labrador Retriever breed. Using 23 cases and 37 controls we mapped the causative mutation by genome-wide association and homozygosity mapping to a 4.44 Mb interval on chromosome 12. We re-sequenced the genome of one affected dog at 30x coverage and detected 92 non-synonymous variants in the critical interval. Only two of these variants, located in the lymphotoxin A (LTA) and collagen alpha-2(XI) chain gene (COL11A2), respectively, were perfectly associated with the trait. Previously described COL11A2 variants in humans or mice lead to skeletal dysplasias and/or deafness. The dog variant associated with disproportionate dwarfism, COL11A2:c.143G>C or p.R48P, probably has only a minor effect on collagen XI function, which might explain the comparatively mild phenotype seen in our study. The identification of this candidate causative mutation thus widens the known phenotypic spectrum of COL11A2 mutations. We speculate that non-pathogenic COL11A2 variants might even contribute to the heritable variation in height.