3 resultados para Genome wide mapping
em Digital Commons - Michigan Tech
Resumo:
This dissertation has three separate parts: the first part deals with the general pedigree association testing incorporating continuous covariates; the second part deals with the association tests under population stratification using the conditional likelihood tests; the third part deals with the genome-wide association studies based on the real rheumatoid arthritis (RA) disease data sets from Genetic Analysis Workshop 16 (GAW16) problem 1. Many statistical tests are developed to test the linkage and association using either case-control status or phenotype covariates for family data structure, separately. Those univariate analyses might not use all the information coming from the family members in practical studies. On the other hand, the human complex disease do not have a clear inheritance pattern, there might exist the gene interactions or act independently. In part I, the new proposed approach MPDT is focused on how to use both the case control information as well as the phenotype covariates. This approach can be applied to detect multiple marker effects. Based on the two existing popular statistics in family studies for case-control and quantitative traits respectively, the new approach could be used in the simple family structure data set as well as general pedigree structure. The combined statistics are calculated using the two statistics; A permutation procedure is applied for assessing the p-value with adjustment from the Bonferroni for the multiple markers. We use simulation studies to evaluate the type I error rates and the powers of the proposed approach. Our results show that the combined test using both case-control information and phenotype covariates not only has the correct type I error rates but also is more powerful than the other existing methods. For multiple marker interactions, our proposed method is also very powerful. Selective genotyping is an economical strategy in detecting and mapping quantitative trait loci in the genetic dissection of complex disease. When the samples arise from different ethnic groups or an admixture population, all the existing selective genotyping methods may result in spurious association due to different ancestry distributions. The problem can be more serious when the sample size is large, a general requirement to obtain sufficient power to detect modest genetic effects for most complex traits. In part II, I describe a useful strategy in selective genotyping while population stratification is present. Our procedure used a principal component based approach to eliminate any effect of population stratification. The paper evaluates the performance of our procedure using both simulated data from an early study data sets and also the HapMap data sets in a variety of population admixture models generated from empirical data. There are one binary trait and two continuous traits in the rheumatoid arthritis dataset of Problem 1 in the Genetic Analysis Workshop 16 (GAW16): RA status, AntiCCP and IgM. To allow multiple traits, we suggest a set of SNP-level F statistics by the concept of multiple-correlation to measure the genetic association between multiple trait values and SNP-specific genotypic scores and obtain their null distributions. Hereby, we perform 6 genome-wide association analyses using the novel one- and two-stage approaches which are based on single, double and triple traits. Incorporating all these 6 analyses, we successfully validate the SNPs which have been identified to be responsible for rheumatoid arthritis in the literature and detect more disease susceptibility SNPs for follow-up studies in the future. Except for chromosome 13 and 18, each of the others is found to harbour susceptible genetic regions for rheumatoid arthritis or related diseases, i.e., lupus erythematosus. This topic is discussed in part III.
Resumo:
Hardwoods comprise about half of the biomass of forestlands in North America and present many uses including economic, ecological and aesthetic functions. Forest trees rely on the genetic variation within tree populations to overcome the many biotic, abiotic, anthropogenic factors which are further worsened by climate change, that threaten their continued survival and functionality. To harness these inherent genetic variations of tree populations, informed knowledge of the genomic resources and techniques, which are currently lacking or very limited, are imperative for forest managers. The current study therefore aimed to develop genomic microsatellite markers for the leguminous tree species, honey locust, Gleditsia triacanthos L. and test their applicability in assessing genetic variation, estimation of gene flow patterns and identification of a full-sib mapping population. We also aimed to test the usefulness of already developed nuclear and gene-based microsatellite markers in delineation of species and taxonomic relationships between four of the taxonomically difficult Section Lobatae species (Quercus coccinea, Q. ellipsoidalis, Q. rubra and Q. velutina. We recorded 100% amplification of G. triacanthos genomic microsatellites developed using Illumina sequencing techniques in a panel of seven unrelated individuals with 14 of these showing high polymorphism and reproducibility. When characterized in 36 natural population samples, we recorded 20 alleles per locus with no indication for null alleles at 13 of the 14 microsatellites. This is the first report of genomic microsatellites for this species. Honey locust trees occur in fragmented populations of abandoned farmlands and pastures and is described as essentially dioecious. Pollen dispersal if the main source of gene flow within and between populations with the ability to offset the effects of random genetic drift. Factors known to influence gene include fragmentation and degree of isolation, which make the patterns gene flow in fragmented populations of honey locust a necessity for their sustainable management. In this follow-up study, we used a subset of nine of the 14 developed gSSRs to estimate gene flow and identify a full-sib mapping population in two isolated fragments of honey locust. Our analyses indicated that the majority of the seedlings (65-100% - at both strict and relaxed assignment thresholds) were sired by pollen from outside the two fragment populations. Only one selfing event was recorded confirming the functional dioeciousness of honey locust and that the seed parents are almost completely outcrossed. From the Butternut Valley, TN population, pollen donor genotypes were reconstructed and used in paternity assignment analyses to identify a relatively large full-sib family comprised of 149 individuals, proving the usefulness of isolated forest fragments in identification of full-sib families. In the Ames Plantation stand, contemporary pollen dispersal followed a fat-tailed exponential-power distribution, an indication of effective gene flow. Our estimate of δ was 4,282.28 m, suggesting that insect pollinators of honey locust disperse pollen over very long distances. The high proportion of pollen influx into our sampled population implies that our fragment population forms part of a large effectively reproducing population. The high tendency of oak species to hybridize while still maintaining their species identity make it difficult to resolve their taxonomic relationships. Oaks of the section Lobatae are famous in this regard and remain unresolved at both morphological and genetic markers. We applied 28 microsatellite markers including outlier loci with potential roles in reproductive isolation and adaptive divergence between species to natural populations of four known interfertile red oaks, Q. coccinea, Q. ellpsoidalis, Q. rubra and Q. velutina. To better resolve the taxonomic relationships in this difficult clade, we assigned individual samples to species, identified hybrids and introgressive forms and reconstructed phylogenetic relationships among the four species after exclusion of genetically intermediate individuals. Genetic assignment analyses identified four distinct species clusters, with Q. rubra most differentiated from the three other species, but also with a comparatively large number of misclassified individuals (7.14%), hybrids (7.14%) and introgressive forms (18.83%) between Q. ellipsoidalis and Q. velutina. After the exclusion of genetically intermediate individuals, Q. ellipsoidalis grouped as sister species to the largely parapatric Q. coccinea with high bootstrap support (91 %). Genetically intermediate forms in a mixed species stand were located proximate to both potential parental species, which supports recent hybridization of Q. velutina with both Q. ellipsoidalis and Q. rubra. Analyses of genome-wide patterns of interspecific differentiation can provide a better understanding of speciation processes and taxonomic relationships in this taxonomically difficult group of red oak species.
Resumo:
As the development of genotyping and next-generation sequencing technologies, multi-marker testing in genome-wide association study and rare variant association study became active research areas in statistical genetics. This dissertation contains three methodologies for association study by exploring different genetic data features and demonstrates how to use those methods to test genetic association hypothesis. The methods can be categorized into in three scenarios: 1) multi-marker testing for strong Linkage Disequilibrium regions, 2) multi-marker testing for family-based association studies, 3) multi-marker testing for rare variant association study. I also discussed the advantage of using these methods and demonstrated its power by simulation studies and applications to real genetic data.