3 resultados para genotype distribution
em DigitalCommons@The Texas Medical Center
Resumo:
The distribution of the number of heterozygous loci in two randomly chosen gametes or in a random diploid zygote provides information regarding the nonrandom association of alleles among different genetic loci. Two alternative statistics may be employed for detection of nonrandom association of genes of different loci when observations are made on these distributions: observed variance of the number of heterozygous loci (s2k) and a goodness-of-fit criterion (X2) to contrast the observed distribution with that expected under the hypothesis of random association of genes. It is shown, by simulation, that s2k is statistically more efficient than X2 to detect a given extent of nonrandom association. Asymptotic normality of s2k is justified, and X2 is shown to follow a chi-square (chi 2) distribution with partial loss of degrees of freedom arising because of estimation of parameters from the marginal gene frequency data. Whenever direct evaluations of linkage disequilibrium values are possible, tests based on maximum likelihood estimators of linkage disequilibria require a smaller sample size (number of zygotes or gametes) to detect a given level of nonrandom association in comparison with that required if such tests are conducted on the basis of s2k. Summarization of multilocus genotype (or haplotype) data, into the different number of heterozygous loci classes, thus, amounts to appreciable loss of information.
Resumo:
Enterococcus faecium has recently emerged as an important cause of nosocomial infections. We previously identified 15 predicted surface proteins with characteristics of MSCRAMMs and/or pili and demonstrated that their genes were frequently present in 30 clinical E. faecium isolates studied; one of these, acm, has been studied in further detail. To determine the prevalence of the other 14 genes among various E. faecium populations, we have now assessed 433 E. faecium isolates, including 264 isolates from human clinical infections, 69 isolates from stools of hospitalized patients, 70 isolates from stools of community volunteers, and 30 isolates from animal-related sources. A variable distribution of the 14 genes was detected, with their presence ranging from 51% to 98% of isolates. While 81% of clinical isolates carried 13 or 14 of the 14 genes tested, none of the community group isolates and only 13% of animal isolates carried 13 or 14 genes. The presence of these genes was most frequent in endocarditis isolates, with 11 genes present in all isolates, followed by isolates from other clinical sources. The number of genes significantly associated with clinical versus fecal or animal origin (P = 0.04 to <0.0001) varied from 10 to 13, depending on whether comparisons were made against individual clinical subgroups (endocarditis, blood, and other clinical isolates) or against all clinical isolates combined as one group. The strong association of these genes with clinical isolates raises the possibility that their preservation/acquisition has favored the adaptation of E. faecium to nosocomial environments and/or patients.
Resumo:
With hundreds of single nucleotide polymorphisms (SNPs) in a candidate gene and millions of SNPs across the genome, selecting an informative subset of SNPs to maximize the ability to detect genotype-phenotype association is of great interest and importance. In addition, with a large number of SNPs, analytic methods are needed that allow investigators to control the false positive rate resulting from large numbers of SNP genotype-phenotype analyses. This dissertation uses simulated data to explore methods for selecting SNPs for genotype-phenotype association studies. I examined the pattern of linkage disequilibrium (LD) across a candidate gene region and used this pattern to aid in localizing a disease-influencing mutation. The results indicate that the r2 measure of linkage disequilibrium is preferred over the common D′ measure for use in genotype-phenotype association studies. Using step-wise linear regression, the best predictor of the quantitative trait was not usually the single functional mutation. Rather it was a SNP that was in high linkage disequilibrium with the functional mutation. Next, I compared three strategies for selecting SNPs for application to phenotype association studies: based on measures of linkage disequilibrium, based on a measure of haplotype diversity, and random selection. The results demonstrate that SNPs selected based on maximum haplotype diversity are more informative and yield higher power than randomly selected SNPs or SNPs selected based on low pair-wise LD. The data also indicate that for genes with small contribution to the phenotype, it is more prudent for investigators to increase their sample size than to continuously increase the number of SNPs in order to improve statistical power. When typing large numbers of SNPs, researchers are faced with the challenge of utilizing an appropriate statistical method that controls the type I error rate while maintaining adequate power. We show that an empirical genotype based multi-locus global test that uses permutation testing to investigate the null distribution of the maximum test statistic maintains a desired overall type I error rate while not overly sacrificing statistical power. The results also show that when the penetrance model is simple the multi-locus global test does as well or better than the haplotype analysis. However, for more complex models, haplotype analyses offer advantages. The results of this dissertation will be of utility to human geneticists designing large-scale multi-locus genotype-phenotype association studies. ^