983 resultados para SNP- polymorphisme
Resumo:
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).
Resumo:
In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications of microarrays are becoming more and more popular. In this paper we describe a preprocessing methodology for a technology designed for the identification of DNA sequence variants in specific genes or regions of the human genome that are associated with phenotypes of interest such as disease. In particular we describe methodology useful for preprocessing Affymetrix SNP chips and obtaining genotype calls with the preprocessed data. We demonstrate how our procedure improves existing approaches using data from three relatively large studies including one in which large number independent calls are available. Software implementing these ideas are avialble from the Bioconductor oligo package.
Resumo:
Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Resumo:
Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.
Resumo:
Coat color dilution in several breeds of dog is characterized by a specific pigmentation phenotype and sometimes accompanied by hair loss and recurrent skin inflammation, the so-called color dilution alopecia or black hair follicular dysplasia. Coat color dilution (d) is inherited as a Mendelian autosomal recessive trait. In a previous study, MLPH polymorphisms showed perfect cosegregation with the dilute phenotype within breeds. However, different dilute haplotypes were found in different breeds, and no single polymorphism was identified in the coding sequence that was likely to be causative for the dilute phenotype. We resequenced the 5'-region of the canine MLPH gene and identified a strong candidate single nucleotide polymorphism within the nontranslated exon 1, which showed perfect association to the dilute phenotype in 65 dilute dogs from 7 different breeds. The A/G polymorphism is located at the last nucleotide of exon 1 and the mutant A-allele is predicted to reduce splicing efficiency 8-fold. An MLPH mRNA expression study using quantitative reverse transcriptase-polymerase chain reaction confirmed that dd animals had only about approximately 25% of the MLPH transcript compared with DD animals. These results provide preliminary evidence that the reported regulatory MLPH mutation might represent a causal mutation for coat color dilution in dogs.
Resumo:
BACKGROUND: Microarray genome analysis is realising its promise for improving detection of genetic abnormalities in individuals with mental retardation and congenital abnormality. Copy number variations (CNVs) are now readily detectable using a variety of platforms and a major challenge is the distinction of pathogenic from ubiquitous, benign polymorphic CNVs. The aim of this study was to investigate replacement of time consuming, locus specific testing for specific microdeletion and microduplication syndromes with microarray analysis, which theoretically should detect all known syndromes with CNV aetiologies as well as new ones. METHODS: Genome wide copy number analysis was performed on 117 patients using Affymetrix 250K microarrays. RESULTS: 434 CNVs (195 losses and 239 gains) were found, including 18 pathogenic CNVs and 9 identified as "potentially pathogenic". Almost all pathogenic CNVs were larger than 500 kb, significantly larger than the median size of all CNVs detected. Segmental regions of loss of heterozygosity larger than 5 Mb were found in 5 patients. CONCLUSIONS: Genome microarray analysis has improved diagnostic success in this group of patients. Several examples of recently discovered "new syndromes" were found suggesting they are more common than previously suspected and collectively are likely to be a major cause of mental retardation. The findings have several implications for clinical practice. The study revealed the potential to make genetic diagnoses that were not evident in the clinical presentation, with implications for pretest counselling and the consent process. The importance of contributing novel CNVs to high quality databases for genotype-phenotype analysis and review of guidelines for selection of individuals for microarray analysis is emphasised.
Resumo:
Horses were domesticated from the Eurasian steppes 5,000-6,000 years ago. Since then, the use of horses for transportation, warfare, and agriculture, as well as selection for desired traits and fitness, has resulted in diverse populations distributed across the world, many of which have become or are in the process of becoming formally organized into closed, breeding populations (breeds). This report describes the use of a genome-wide set of autosomal SNPs and 814 horses from 36 breeds to provide the first detailed description of equine breed diversity. F(ST) calculations, parsimony, and distance analysis demonstrated relationships among the breeds that largely reflect geographic origins and known breed histories. Low levels of population divergence were observed between breeds that are relatively early on in the process of breed development, and between those with high levels of within-breed diversity, whether due to large population size, ongoing outcrossing, or large within-breed phenotypic diversity. Populations with low within-breed diversity included those which have experienced population bottlenecks, have been under intense selective pressure, or are closed populations with long breed histories. These results provide new insights into the relationships among and the diversity within breeds of horses. In addition these results will facilitate future genome-wide association studies and investigations into genomic targets of selection.
Resumo:
A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a "cosmopolitan" tagging approach to capture the genetic diversity across approximately 2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions.
Resumo:
Recurrent airway obstruction is one of the most common airway diseases affecting mature horses. Increased bronchoalveolar mucus, neutrophil accumulation in airways, and airway obstruction are the main features of this disease. Mucociliary clearance is a key component of pulmonary defense mechanisms. Cilia are the motile part of this system and a complex linear array of dynein motors is responsible for their motility by moving along the microtubules in the axonemes of cilia and flagella. We previously detected a QTL for RAO on ECA 13 in a half-sib family of European Warmblood horses. The gene encoding DNAH3 is located in the peak of the detected QTL and encodes a dynein subunit. Therefore, we analysed this gene as a positional and functional candidate gene for RAO. In a mutation analysis of all 62 exons we detected 53 new polymorphisms including 7 non-synonymous variants. We performed an association study using 38 polymorphisms in a cohort of 422 animals. However, after correction for multiple testing we did not detect a significant association of any of these polymorphisms with RAO (P>0.05). Therefore, it seems unlikely that variants at the DNAH3 gene are responsible for the RAO QTL in European Warmblood horses.
Resumo:
As part of the global sheep Hapmap project, 24 individuals from each of seven indigenous Swiss sheep breeds (Bundner Oberländer sheep (BOS), Engadine Red sheep (ERS), Swiss Black-Brown Mountain sheep (SBS), Swiss Mirror sheep (SMS), Swiss White Alpine (SWA) sheep, Valais Blacknose sheep (VBS) and Valais Red sheep (VRS)), were genotyped using Illumina’s Ovine SNP50 BeadChip. In total, 167 animals were subjected to a detailed analysis for genetic diversity using 45 193 informative single nucleotide polymorphisms. The results of the phylogenetic analyses supported the known proximity between populations such as VBS and VRS or SMS and SWA. Average genomic relatedness within a breed was found to be 12 percent (BOS), 5 percent (ERS), 9 percent (SBS), 10 percent (SMS), 9 percent (SWA), 12 percent (VBS) and 20 percent (VRS). Furthermore, genomic relationships between breeds were found for single individuals from SWA and SMS, VRS and VBS as well as VRS and BOS. In addition, seven out of 40 indicated parent–offspring pairs could not be confirmed. These results were further supported by results from the genome-wide population cluster analysis. This study provides a better understanding of fine-scale population structures within and between Swiss sheep breeds. This relevant information will help to increase the conservation activities of the local Swiss sheep breeds.
Resumo:
Different cytokines are secreted in response to specific microbial molecules referred to as pathogen associated molecular patterns (PAMPs). Interleukin 6 (IL6) and interleukin 10 (IL10), both secreted by macrophages and lymphocytes, play a central role in the immunological response. In this work we obtained the genomic structure and complete DNA sequence of the porcine IL6 and IL10 genes and identified polymorphisms in the genomic sequences of these genes on a panel of ten different pig breeds. Comparative intra- and interbreed sequence analysis revealed a total of eight polymorphisms in the porcine IL6 gene and 21 in the porcine IL10 gene, which include single nucleotide polymorphisms (SNPs) and insertion deletion polymorphisms (indels). Additionally, the chromosomal localization of the IL10 gene was determined by FISH and RH mapping.
Resumo:
With hundreds of single nucleotide polymorphisms (SNPs) in a candidate gene and millions of SNPs across the genome, selecting an informative subset of SNPs to maximize the ability to detect genotype-phenotype association is of great interest and importance. In addition, with a large number of SNPs, analytic methods are needed that allow investigators to control the false positive rate resulting from large numbers of SNP genotype-phenotype analyses. This dissertation uses simulated data to explore methods for selecting SNPs for genotype-phenotype association studies. I examined the pattern of linkage disequilibrium (LD) across a candidate gene region and used this pattern to aid in localizing a disease-influencing mutation. The results indicate that the r2 measure of linkage disequilibrium is preferred over the common D′ measure for use in genotype-phenotype association studies. Using step-wise linear regression, the best predictor of the quantitative trait was not usually the single functional mutation. Rather it was a SNP that was in high linkage disequilibrium with the functional mutation. Next, I compared three strategies for selecting SNPs for application to phenotype association studies: based on measures of linkage disequilibrium, based on a measure of haplotype diversity, and random selection. The results demonstrate that SNPs selected based on maximum haplotype diversity are more informative and yield higher power than randomly selected SNPs or SNPs selected based on low pair-wise LD. The data also indicate that for genes with small contribution to the phenotype, it is more prudent for investigators to increase their sample size than to continuously increase the number of SNPs in order to improve statistical power. When typing large numbers of SNPs, researchers are faced with the challenge of utilizing an appropriate statistical method that controls the type I error rate while maintaining adequate power. We show that an empirical genotype based multi-locus global test that uses permutation testing to investigate the null distribution of the maximum test statistic maintains a desired overall type I error rate while not overly sacrificing statistical power. The results also show that when the penetrance model is simple the multi-locus global test does as well or better than the haplotype analysis. However, for more complex models, haplotype analyses offer advantages. The results of this dissertation will be of utility to human geneticists designing large-scale multi-locus genotype-phenotype association studies. ^