19 resultados para Genome-wide Search

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: To identify systemic sclerosis (SSc) susceptibility loci via a genome-wide association study. METHODS: A genome-wide association study was performed in 137 patients with SSc and 564 controls from Korea using the Affymetrix Human SNP Array 5.0. After fine-mapping studies, the results were replicated in 1,107 SSc patients and 2,747 controls from a US Caucasian population. RESULTS: The single-nucleotide polymorphisms (SNPs) (rs3128930, rs7763822, rs7764491, rs3117230, and rs3128965) of HLA-DPB1 and DPB2 on chromosome 6 formed a distinctive peak with log P values for association with SSc susceptibility (P=8.16x10(-13)). Subtyping analysis of HLA-DPB1 showed that DPB1*1301 (P=7.61x10(-8)) and DPB1*0901 (P=2.55x10(-5)) were the subtypes most susceptible to SSc in Korean subjects. In US Caucasians, 2 pairs of SNPs, rs7763822/rs7764491 and rs3117230/rs3128965, showed strong association with SSc patients who had either circulating anti-DNA topoisomerase I (P=7.58x10(-17)/4.84x10(-16)) or anticentromere autoantibodies (P=1.12x10(-3)/3.2x10(-5)), respectively. CONCLUSION: The results of our genome-wide association study in Korean subjects indicate that the region of HLA-DPB1 and DPB2 contains the loci most susceptible to SSc in a Korean population. The confirmatory studies in US Caucasians indicate that specific SNPs of HLA-DPB1 and/or DPB2 are strongly associated with US Caucasian patients with SSc who are positive for anti-DNA topoisomerase I or anticentromere autoantibodies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Apolipoprotein E (ApoE) plays a major role in the metabolism of high density and low density lipoproteins (HDL and LDL). Its common protein isoforms (E2, E3, E4) are risk factors for coronary artery disease (CAD) and explain between 16 to 23% of the inter-individual variation in plasma apoE levels. Linkage analysis has been completed for plasma apoE levels in the GENOA study (Genetic Epidemiology Network of Atherosclerosis). After stratification of the population by lipoprotein levels and body mass index (BMI) to create more homogeneity with regard to biological context for apoE levels, Hispanic families showed significant linkage on chromosome 17q for two strata (LOD=2.93 at 104 cM for a low cholesterol group, LOD=3.04 at 111 cM for a low cholesterol, high HDLC group). Replication of 17q linkage was observed for apoB and apoE levels in the unstratified Hispanic and African-American populations, and for apoE levels in African-American families. Replication of this 17q linkage in different populations and strata provides strong support for the presence of gene(s) in this region with significant roles in the determination of inter-individual variation in plasma apoE levels. Through a positional and functional candidate gene approach, ten genes were identified in the 17q linked region, and 62 polymorphisms in these genes were genotyped in the GENOA families. Association analysis was performed with FBAT, GEE, and variance-component based tests followed by conditional linkage analysis. Association studies with partial coverage of TagSNPs in the gene coding for apolipoprotein H (APOH) were performed, and significant results were found for 2 SNPs (APOH_20951 and APOH_05407) in the Hispanic low cholesterol strata accounting for 3.49% of the inter-individual variation in plasma apoE levels. Among the other candidate genes, we identified a haplotype block in the ACE1 gene that contains two major haplotypes associated with apoE levels as well as total cholesterol, apoB and LDLC levels in the unstratified Hispanic population. Identifying genes responsible for the remaining 60% of inter-individual variation in plasma apoE level, will yield new insights into the understanding of genetic interactions involved in the lipid metabolism, and a more precise understanding of the risk factors leading to CAD. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To identify genetic susceptibility loci for severe diabetic retinopathy, 286 Mexican-Americans with type 2 diabetes from Starr County, Texas completed detailed physical and ophthalmologic examinations including fundus photography for diabetic retinopathy grading. 103 individuals with moderate-to-severe non-proliferative diabetic retinopathy or proliferative diabetic retinopathy were defined as cases for this study. DNA samples extracted from study subjects were genotyped using the Affymetrix GeneChip® Human Mapping 100K Set, which includes 116,204 single nucleotide polymorphisms (SNPs) across the whole genome. Single-marker allelic tests and 2- to 8-SNP sliding-window Haplotype Trend Regression implemented in HelixTreeTM were first performed with these direct genotypes to identify genes/regions contributing to the risk of severe diabetic retinopathy. An additional 1,885,781 HapMap Phase II SNPs were imputed from the direct genotypes to expand the genomic coverage for a more detailed exploration of genetic susceptibility to diabetic retinopathy. The average estimated allelic dosage and imputed genotypes with the highest posterior probabilities were subsequently analyzed for associations using logistic regression and Fisher's Exact allelic tests, respectively. To move beyond these SNP-based approaches, 104,572 directly genotyped and 333,375 well-imputed SNPs were used to construct genetic distance matrices based on 262 retinopathy candidate genes and their 112 related biological pathways. Multivariate distance matrix regression was then used to test hypotheses with genes and pathways as the units of inference in the context of susceptibility to diabetic retinopathy. This study provides a framework for genome-wide association analyses, and implicated several genes involved in the regulation of oxidative stress, inflammatory processes, histidine metabolism, and pancreatic cancer pathways associated with severe diabetic retinopathy. Many of these loci have not previously been implicated in either diabetic retinopathy or diabetes. In summary, CDC73, IL12RB2, and SULF1 had the best evidence as candidates to influence diabetic retinopathy, possibly through novel biological mechanisms related to VEGF-mediated signaling pathway or inflammatory processes. While this study uncovered some genes for diabetic retinopathy, a comprehensive picture of the genetic architecture of diabetic retinopathy has not yet been achieved. Once fully understood, the genetics and biology of diabetic retinopathy will contribute to better strategies for diagnosis, treatment and prevention of this disease.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Numerous studies have been carried out to try to better understand the genetic predisposition for cardiovascular disease. Although it is widely believed that multifactorial diseases such as cardiovascular disease is the result from effects of many genes which working alone or interact with other genes, most genetic studies have been focused on identifying of cardiovascular disease susceptibility genes and usually ignore the effects of gene-gene interactions in the analysis. The current study applies a novel linkage disequilibrium based statistic for testing interactions between two linked loci using data from a genome-wide study of cardiovascular disease. A total of 53,394 single nucleotide polymorphisms (SNPs) are tested for pair-wise interactions, and 8,644 interactions are found to be significant with p-values less than 3.5×10-11. Results indicate that known cardiovascular disease susceptibility genes tend not to have many significantly interactions. One SNP in the CACNG1 (calcium channel, voltage-dependent, gamma subunit 1) gene and one SNP in the IL3RA (interleukin 3 receptor, alpha) gene are found to have the most significant pair-wise interactions. Findings from the current study should be replicated in other independent cohort to eliminate potential false positive results.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). The quality of the inferences about copy number can be affected by many factors including batch effects, DNA sample preparation, signal processing, and analytical approach. Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP genotyping data. However, these algorithms lack specificity to detect small CNVs due to the high false positive rate when calling CNVs based on the intensity values. Association tests based on detected CNVs therefore lack power even if the CNVs affecting disease risk are common. In this research, by combining an existing Hidden Markov Model (HMM) and the logistic regression model, a new genome-wide logistic regression algorithm was developed to detect CNV associations with diseases. We showed that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than an existing popular algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genome-Wide Association Study analytical (GWAS) methods were applied in a large biracial sample of individuals to investigate variation across the genome for its association with a surrogate low-density lipoprotein (LDL) particle size phenotype, the ratio of LDL-cholesterol level over ApoB level. Genotyping was performed on the Affymetrix 6.0 GeneChip with approximately one million single nucleotide polymorphisms (SNPs). The ratio of LDL cholesterol to ApoB was calculated, and association tests used multivariable linear regression analysis with an additive genetic model after adjustment for the covariates sex, age and BMI. Association tests were performed separately in African Americans and Caucasians. There were 9,562 qualified individuals in the Caucasian group and 3,015 qualified individuals in the African American group. Overall, in Caucasians two statistically significant loci were identified as being associated with the ratio of LDL-cholesterol over ApoB: rs10488699 (p<5 x10-8, 11q23.3 near BUD13) and the SNP rs964184 (p<5 x10-8 11q23.3 near ZNF259). We also found rs12286037 ((p<4x10-7) (11q23.3) near APOA5/A4/C3/A1 with suggestive associate in the Caucasian sample. In exploratory analyses, a difference in the pattern of association between individuals taking and not taking LDL-cholesterol lowering medications was observed. Individuals who were not taking medications had smaller p-value than those taking medication. In the African-American group, there were no significant (p<5x10-8) or suggestive associations (p<4x10-7) with the ratio of LDL-cholesterol over ApoB after adjusting for age, BMI, and sex and comparing individuals with and without LDL-cholesterol lowering medication. Conclusions: There were significant and suggestive associations between SNP genotype and the ratio of LDL-cholesterol to ApoB in Caucasians, but these associations may be modified by medication treatment.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. Many recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. The current study incorporated gene network information into gene-based analysis of GWAS data for Crohn's disease (CD). The purpose was to develop statistical models to boost the power of identifying disease-associated genes and gene subnetworks by maximizing the use of existing biological knowledge from multiple sources. The results revealed that Markov random field (MRF) based mixture model incorporating direct neighborhood information from a single gene network is not efficient in identifying CD-related genes based on the GWAS data. The incorporation of solely direct neighborhood information might lead to the low efficiency of these models. Alternative MRF models looking beyond direct neighboring information are necessary to be developed in the future for the purpose of this study.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) have successfully identified several genetic loci associated with inherited predisposition to primary biliary cirrhosis (PBC), the most common autoimmune disease of the liver. Pathway-based tests constitute a novel paradigm for GWAS analysis. By evaluating genetic variation across a biological pathway (gene set), these tests have the potential to determine the collective impact of variants with subtle effects that are individually too weak to be detected in traditional single variant GWAS analysis. To identify biological pathways associated with the risk of development of PBC, GWAS of PBC from Italy (449 cases and 940 controls) and Canada (530 cases and 398 controls) were independently analyzed. The linear combination test (LCT), a recently developed pathway-level statistical method was used for this analysis. For additional validation, pathways that were replicated at the P <0.05 level of significance in both GWAS on LCT analysis were also tested for association with PBC in each dataset using two complementary GWAS pathway approaches. The complementary approaches included a modification of the gene set enrichment analysis algorithm (i-GSEA4GWAS) and Fisher's exact test for pathway enrichment ratios. Twenty-five pathways were associated with PBC risk on LCT analysis in the Italian dataset at P<0.05, of which eight had an FDR<0.25. The top pathway in the Italian dataset was the TNF/stress related signaling pathway (p=7.38×10 -4, FDR=0.18). Twenty-six pathways were associated with PBC at the P<0.05 level using the LCT in the Canadian dataset with the regulation and function of ChREBP in liver pathway (p=5.68×10-4, FDR=0.285) emerging as the most significant pathway. Two pathways, phosphatidylinositol signaling system (Italian: p=0.016, FDR=0.436; Canadian: p=0.034, FDR=0.693) and hedgehog signaling (Italian: p=0.044, FDR=0.636; Canadian: p=0.041, FDR=0.693), were replicated at LCT P<0.05 in both datasets. Statistically significant association of both pathways with PBC genetic susceptibility was confirmed in the Italian dataset on i-GSEA4GWAS. Results for the phosphatidylinositol signaling system were also significant in both datasets on applying Fisher's exact test for pathway enrichment ratios. This study identified a combination of known and novel pathway-level associations with PBC risk. If functionally validated, the findings may yield fresh insights into the etiology of this complex autoimmune disease with possible preventive and therapeutic application.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ventricular system is a critical component of the central nervous system (CNS) that is formed early in the developmental stages and remains functional through the lifetime. Changes in the ventricular system can be easily discerned via neuroimaging procedures and most of the time it reflects changes in the physiology of the CNS. In this study we attempted to identify specific genes associated with variation in ventricular volume in humans. Methods. We conducted a genome wide association (GWA) analysis of the volume of the lateral ventricles among 1605 individuals of European ancestry from two community based cohorts, the Genetics of Microangiopathic Brain Injury (GMBI; N=814) and Atherosclerosis Risk in Communities (ARIC; N=791). Significant findings from the analysis were tested for replication in both the cohorts and then meta-analyzed to get an estimate of overall significance. Results. In our GWA analyses, no single nucleotide polymorphism (SNP) reached a genome-wide significance of p<10−8. There were 25 SNPs in GMBI and 9 SNPs in ARIC that reached a threshold of p<10 −5. However, none of the top SNPs from each cohort were replicated in the other. In the meta-analysis, no SNP reached the genome-wide threshold of 5×10−8, but we identified five novel SNPs associated with variation in ventricular volume at the p<10 −5 level. Strongest association was for rs2112536 in an intergenic region on chromosome 5q33 (Pmeta= 8.46×10−7 ). The remaining four SNPs were located on chromosome 3q23 encompassing the gene for Calsyntenin-2 (CLSTN2). The SNPs with strongest association in this region were rs17338555 (Pmeta= 5.28×10 −6), rs9812091 (Pmeta= 5.89×10−6 ), rs9812283 (Pmeta= 5.97×10−6) and rs9833213 (Pmeta= 6.96×10−6). Conclusions. This GWA study of ventricular volumes in the community-based cohorts of European descent identifies potential locus on chromosomes 3 and 5. Further characterization of these loci may provide insights into pathophysiology of ventricular involvement in various neurological diseases.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Schizophrenia (SZ) is a complex disorder with high heritability and variable phenotypes that has limited success in finding causal genes associated with the disease development. Pathway-based analysis is an effective approach in investigating the molecular mechanism of susceptible genes associated with complex diseases. The etiology of complex diseases could be a network of genetic factors and within the genes, interaction may occur. In this work we argue that some genes might be of small effect that by itself are neither sufficient nor necessary to cause the disease however, their effect may induce slight changes to the gene expression or affect the protein function, therefore, analyzing the gene-gene interaction mechanism within the disease pathway would play crucial role in dissecting the genetic architecture of complex diseases, making the pathway-based analysis a complementary approach to GWAS technique. ^ In this study, we implemented three novel linkage disequilibrium based statistics, the linear combination, the quadratic, and the decorrelation test statistics, to investigate the interaction between linked and unlinked genes in two independent case-control GWAS datasets for SZ including participants of European (EA) and African (AA) ancestries. The EA population included 1,173 cases and 1,378 controls with 729,454 genotyped SNPs, while the AA population included 219 cases and 288 controls with 845,814 genotyped SNPs. We identified 17,186 interacting gene-sets at significant level in EA dataset, and 12,691 gene-sets in AA dataset using the gene-gene interaction method. We also identified 18,846 genes in EA dataset and 19,431 genes in AA dataset that were in the disease pathways. However, few genes were reported of significant association to SZ. ^ Our research determined the pathways characteristics for schizophrenia through the gene-gene interaction and gene-pathway based approaches. Our findings suggest insightful inferences of our methods in studying the molecular mechanisms of common complex diseases.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Embryonic stem cells (ESCs) possess two unique characteristics: infinite self-renewal and the potential to differentiate into almost every cell type (pluripotency). Recently, global expression analyses of metastatic breast and lung cancers revealed an ESC-like expression program or signature, specifically for cancers that are mutant for p53 function. Surprisingly, although p53 is widely recognized as the guardian of the genome, due to its roles in cell cycle checkpoints, programmed cell death or senescence, relatively little is known about p53 functions in normal cells, especially in ESCs. My hypothesis is that p53 has specific transcription regulatory functions in human ESCs (hESCs) that a) oppose pluripotency and b) protect the stem cell genome in response to DNA damage and stress signaling. In mouse ESCs, these roles are believed to coincide, as p53 promotes differentiation in response to DNA damage, but this is unexplored in hESCs. To determine the biological roles of p53, specifically in hESCs, we mapped genome-wide chromatin interactions of p53 by chromatin immunoprecipitation and massively parallel tag sequencing (ChIP-Seq), and did so under three VIdifferent conditions of hESC status: pluripotency, differentiation-initiated and DNA-damage-induced. ChIP-Seq showed that p53 is enriched at distinct, induction-specific gene loci during each of these different conditions. Microarray gene expression analysis and functional annotation of the distinct p53-target genes revealed that p53 regulates specific genes encoding developmental regulators, which are expressed in differentiation-initiated but not DNA- damaged hESCs. We further discovered that, in response to differentiation signaling, p53 binds regions of chromatin that are repressed but also poised for rapid activation by core pluripotency factors OCT4 and NANOG in pluripotent hESCs. In response to DNA damage, genes associated with migration and motility are targeted by p53; whereas, the prime targets of p53 in control of cell death are conserved for p53 regulation in both differentiation and DNA damage. Our genome-wide profiling and bioinformatics analyses show that p53 occupies a special set of developmental regulatory genes during early differentiation of hESCs and functions in an induction-specific manner. In conclusion, our research unveiled previously unknown functions of p53 in ESC biology, which augments our understanding of one of the most deregulated proteins in human cancers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pathway based genome wide association study evolves from pathway analysis for microarray gene expression and is under rapid development as a complementary for single-SNP based genome wide association study. However, it faces new challenges, such as the summarization of SNP statistics to pathway statistics. The current study applies the ridge regularized Kernel Sliced Inverse Regression (KSIR) to achieve dimension reduction and compared this method to the other two widely used methods, the minimal-p-value (minP) approach of assigning the best test statistics of all SNPs in each pathway as the statistics of the pathway and the principal component analysis (PCA) method of utilizing PCA to calculate the principal components of each pathway. Comparison of the three methods using simulated datasets consisting of 500 cases, 500 controls and100 SNPs demonstrated that KSIR method outperformed the other two methods in terms of causal pathway ranking and the statistical power. PCA method showed similar performance as the minP method. KSIR method also showed a better performance over the other two methods in analyzing a real dataset, the WTCCC Ulcerative Colitis dataset consisting of 1762 cases, 3773 controls as the discovery cohort and 591 cases, 1639 controls as the replication cohort. Several immune and non-immune pathways relevant to ulcerative colitis were identified by these methods. Results from the current study provided a reference for further methodology development and identified novel pathways that may be of importance to the development of ulcerative colitis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The discovery of expanded simple repeated sequences causing or associated with human disease has lead to a new area of research involved in the elucidation of how the expanded repeat causes disease and how the repeat becomes unstable. ^ To study the genetic basis of the (CTG)n repeat instability in the DMPK gene in myotonic dystrophy (DM1) patients, somatic cell hybrids were constructed between the lymphocytes of DM1 patients and a variety of Chinese hamster ovary (CHO) cell DNA repair gene deficient mutants. By using small pool PCR (SP-PCR), the instability of the (CTG)n can be quantitated for both the frequency and sizes of length change mutations. ^ Additional SP-PCR analysis on 2/11 subclones generated from this original hybrid showed a marked increase in large repeat deletions, ∼50%. A bimodal distribution of repeats was seen around the progenitor allele and at a large deleted product (within the normal range) with no intermediate products present. ^ To determine if the repair capacity of the CHO cell led to a mutator phenotype in the hamster and hybrid clones, SP-PCR was also done on 3 hamster microsatellites in a variety of hamster cell backgrounds. No variant alleles were seen in over 2500 genome equivalents screened. ^ Human-hamster hybrids have long been shown to be chromosomally unstable, yet information about the stability of repeated sequences was not known. To test if repeat instability was associated with either intact or non-intact human chromosomes, more than 300 microsatellite repeats on 13 human chromosomes (intact and non-intact) were analyzed in eight hybrid cells. No variants were seen between the hybrid and patient alleles in the hybrids. ^ To identify whether DM1 patients have a previously undetected level of genome wide instability or if the instability is truly locus specific, SP-PCR was done on 6 human microsatellites within the patient used to make the hybrid cells. No variants were seen in over 1000 genomes screened. ^ These studies show that the somatic cell hybrid approach is a genetically stable system that allows for the determination of factors that could lead to changes in microsatellite instability. It also shows that there is something inherent about the DM1 expanded (CTG)n repeat that it is solely targeted by, as of yet, and unknown mechanism that causes the repeat to be unstable. (Abstract shortened by UMI.)^