10 resultados para Whole genome mapping
em DigitalCommons@The Texas Medical Center
Resumo:
The genomes of Fusobacterium nucleatum subspecies polymorphum strain ATCC 10953, Rickettsia typhi strain Wilmington, and Francisella tularensis subspecies holarctica strain OSU18 were sequenced, annotated, and analyzed. Each genome was then compared to the sequenced genomes of closely related bacteria. The genome of F. nucleatum ATCC 10953 was compared to two additional F. nucleatum subspecies, subspecies nucleatum and subspecies vincentii. This analysis revealed substantial evidence of horizontal gene transfer along with considerable genetic diversity within the species of F. nucleatum. R. typhi was compared to R. prowazekii and R. conorii. This analysis uncovered a hotspot for chromosomal rearrangements in the Spotted Fever Group but not the Typhus Group Rickettsia and revealed the close genetic relationship between the Typhus Group rickettsial species. F. tularensis OSU18 was compared to two additional F. tularensis strains. These comparisons uncovered significant chromosomal rearrangements between F. tularensis subspecies due to recombination between insertion sequence elements. ^
Resumo:
To identify genetic susceptibility loci for severe diabetic retinopathy, 286 Mexican-Americans with type 2 diabetes from Starr County, Texas completed detailed physical and ophthalmologic examinations including fundus photography for diabetic retinopathy grading. 103 individuals with moderate-to-severe non-proliferative diabetic retinopathy or proliferative diabetic retinopathy were defined as cases for this study. DNA samples extracted from study subjects were genotyped using the Affymetrix GeneChip® Human Mapping 100K Set, which includes 116,204 single nucleotide polymorphisms (SNPs) across the whole genome. Single-marker allelic tests and 2- to 8-SNP sliding-window Haplotype Trend Regression implemented in HelixTreeTM were first performed with these direct genotypes to identify genes/regions contributing to the risk of severe diabetic retinopathy. An additional 1,885,781 HapMap Phase II SNPs were imputed from the direct genotypes to expand the genomic coverage for a more detailed exploration of genetic susceptibility to diabetic retinopathy. The average estimated allelic dosage and imputed genotypes with the highest posterior probabilities were subsequently analyzed for associations using logistic regression and Fisher's Exact allelic tests, respectively. To move beyond these SNP-based approaches, 104,572 directly genotyped and 333,375 well-imputed SNPs were used to construct genetic distance matrices based on 262 retinopathy candidate genes and their 112 related biological pathways. Multivariate distance matrix regression was then used to test hypotheses with genes and pathways as the units of inference in the context of susceptibility to diabetic retinopathy. This study provides a framework for genome-wide association analyses, and implicated several genes involved in the regulation of oxidative stress, inflammatory processes, histidine metabolism, and pancreatic cancer pathways associated with severe diabetic retinopathy. Many of these loci have not previously been implicated in either diabetic retinopathy or diabetes. In summary, CDC73, IL12RB2, and SULF1 had the best evidence as candidates to influence diabetic retinopathy, possibly through novel biological mechanisms related to VEGF-mediated signaling pathway or inflammatory processes. While this study uncovered some genes for diabetic retinopathy, a comprehensive picture of the genetic architecture of diabetic retinopathy has not yet been achieved. Once fully understood, the genetics and biology of diabetic retinopathy will contribute to better strategies for diagnosis, treatment and prevention of this disease.^
Resumo:
Linkage and association studies are major analytical tools to search for susceptibility genes for complex diseases. With the availability of large collection of single nucleotide polymorphisms (SNPs) and the rapid progresses for high throughput genotyping technologies, together with the ambitious goals of the International HapMap Project, genetic markers covering the whole genome will be available for genome-wide linkage and association studies. In order not to inflate the type I error rate in performing genome-wide linkage and association studies, multiple adjustment for the significant level for each independent linkage and/or association test is required, and this has led to the suggestion of genome-wide significant cut-off as low as 5 × 10 −7. Almost no linkage and/or association study can meet such a stringent threshold by the standard statistical methods. Developing new statistics with high power is urgently needed to tackle this problem. This dissertation proposes and explores a class of novel test statistics that can be used in both population-based and family-based genetic data by employing a completely new strategy, which uses nonlinear transformation of the sample means to construct test statistics for linkage and association studies. Extensive simulation studies are used to illustrate the properties of the nonlinear test statistics. Power calculations are performed using both analytical and empirical methods. Finally, real data sets are analyzed with the nonlinear test statistics. Results show that the nonlinear test statistics have correct type I error rates, and most of the studied nonlinear test statistics have higher power than the standard chi-square test. This dissertation introduces a new idea to design novel test statistics with high power and might open new ways to mapping susceptibility genes for complex diseases. ^
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Human x rodent somatic cell hybrids have played an important role in human genetics research. They have been especially useful for assigning genes to chromosomes and isolating DNA markers from specific regions of the human genome.^ By employing a combination of somatic cell genetic, recombinant DNA, and cytogenetic techniques, human DNA excision repair gene ERCC4 was mapped regionally to human 16p13.13-13.2, even though the gene has not been cloned. Human x Chinese hamster ovary (CHO) cell hybrids selected for human ERCC4 activity and containing 16p13.1-p13.3 as the only human genetic material were identified. These hybrids were used to order DNA markers located in 16p13.1-p13.3. New DNA markers physically close to ERCC4 were isolated from such hybrids. Using amplified human DNA from the hybrids as probe in fluorescent in situ hybridization, the short arm breakpoint in the chromosome 16 inversion associated with acute myelomonocytic leukemia (AMML) was found to be physically close to the ERCC4 gene. The physical mapping and eventually, the cloning of the ERCC4 gene, will benefit the understanding of the DNA repair system and the study of other important biomedical problems such as tumorigenesis.^ To facilitate the cloning of ERCC4 gene and, in general, the cloning of genes from any defined regions of the human genome, a method was developed for the direct isolation of human transcribed genes ffom somatic cell hybrids. cDNA was prepared from human x rodent hybrid by using consensus 5$\sp\prime$ splice site sequences as primers. These primers were designed to select immature, unspliced messenger RNA (still retaining species specific repeat sequences) as templates. Screening of a derived cDNA library for human repeat sequences resulted in the isolation of human clones at the anticipated frequency with characteristics expected of exons of transcribed human genes. The usefulness of the splice site specific primers was analyzed and the cDNA synthesis conditions with these primers were optimized. The procedure was shown to be sensitive enough to clone weakly expressed genes. Studying the expression of the represented genes with the isolated clones was shown to be feasible. Such regional specific human gene fragments will be very valuable for many human genetic studies such as the search of inherited disease genes and the construction of a cDNA map of the human genome. ^
Resumo:
Attention has recently been drawn to Enterococcus faecium because of an increasing number of nosocomial infections caused by this species and its resistance to multiple antibacterial agents. However, relatively little is known about the pathogenic determinants of this organism. We have previously identified a cell-wall-anchored collagen adhesin, Acm, produced by some isolates of E. faecium, and a secreted antigen, SagA, exhibiting broad-spectrum binding to extracellular matrix proteins. Here, we analysed the draft genome of strain TX0016 for potential microbial surface components recognizing adhesive matrix molecules (MSCRAMMs). Genome-based bioinformatics identified 22 predicted cell-wall-anchored E. faecium surface proteins (Fms), of which 15 (including Acm) had characteristics typical of MSCRAMMs, including predicted folding into a modular architecture with multiple immunoglobulin-like domains. Functional characterization of one [Fms10; redesignated second collagen adhesin of E. faecium (Scm)] revealed that recombinant Scm(65) (A- and B-domains) and Scm(36) (A-domain) bound to collagen type V efficiently in a concentration-dependent manner, bound considerably less to collagen type I and fibrinogen, and differed from Acm in their binding specificities to collagen types IV and V. Results from far-UV circular dichroism measurements of recombinant Scm(36) and of Acm(37) indicated that these proteins were rich in beta-sheets, supporting our folding predictions. Whole-cell ELISA and FACS analyses unambiguously demonstrated surface expression of Scm in most E. faecium isolates. Strikingly, 11 of the 15 predicted MSCRAMMs clustered in four loci, each with a class C sortase gene; nine of these showed similarity to Enterococcus faecalis Ebp pilus subunits and also contained motifs essential for pilus assembly. Antibodies against one of the predicted major pilus proteins, Fms9 (redesignated EbpC(fm)), detected a 'ladder' pattern of high-molecular-mass protein bands in a Western blot analysis of cell surface extracts from E. faecium, suggesting that EbpC(fm) is polymerized into a pilus structure. Further analysis of the transcripts of the corresponding gene cluster indicated that fms1 (ebpA(fm)), fms5 (ebpB(fm)) and ebpC(fm) are co-transcribed, a result consistent with those for pilus-encoding gene clusters of other Gram-positive bacteria. All 15 genes occurred frequently in 30 clinically derived diverse E. faecium isolates tested. The common occurrence of MSCRAMM- and pilus-encoding genes and the presence of a second collagen-binding protein may have important implications for our understanding of this emerging pathogen.
Resumo:
OBJECTIVE: To identify systemic sclerosis (SSc) susceptibility loci via a genome-wide association study. METHODS: A genome-wide association study was performed in 137 patients with SSc and 564 controls from Korea using the Affymetrix Human SNP Array 5.0. After fine-mapping studies, the results were replicated in 1,107 SSc patients and 2,747 controls from a US Caucasian population. RESULTS: The single-nucleotide polymorphisms (SNPs) (rs3128930, rs7763822, rs7764491, rs3117230, and rs3128965) of HLA-DPB1 and DPB2 on chromosome 6 formed a distinctive peak with log P values for association with SSc susceptibility (P=8.16x10(-13)). Subtyping analysis of HLA-DPB1 showed that DPB1*1301 (P=7.61x10(-8)) and DPB1*0901 (P=2.55x10(-5)) were the subtypes most susceptible to SSc in Korean subjects. In US Caucasians, 2 pairs of SNPs, rs7763822/rs7764491 and rs3117230/rs3128965, showed strong association with SSc patients who had either circulating anti-DNA topoisomerase I (P=7.58x10(-17)/4.84x10(-16)) or anticentromere autoantibodies (P=1.12x10(-3)/3.2x10(-5)), respectively. CONCLUSION: The results of our genome-wide association study in Korean subjects indicate that the region of HLA-DPB1 and DPB2 contains the loci most susceptible to SSc in a Korean population. The confirmatory studies in US Caucasians indicate that specific SNPs of HLA-DPB1 and/or DPB2 are strongly associated with US Caucasian patients with SSc who are positive for anti-DNA topoisomerase I or anticentromere autoantibodies.
Resumo:
Radiotherapy involving the thoracic cavity and chemotherapy with the drug bleomycin are both dose limited by the development of pulmonary fibrosis. From evidence that there is variation in the population in susceptibility to pulmonary fibrosis, and animal data, it was hypothesized that individual variation in susceptibility to bleomycin-induced, or radiation-induced, pulmonary fibrosis is, in part, genetically controlled. In this thesis a three generation mouse genetic model of C57BL/6J (fibrosis prone) and C3Hf/Kam (fibrosis resistant) mouse strains and F1 and F2 (F1 intercross) progeny derived from the parental strains was developed to investigate the genetic basis of susceptibility to fibrosis. In the bleomycin studies the mice received 100 mg/kg (125 for females) of bleomycin, via mini osmotic pump. The animals were sacrificed at eight weeks following treatment or when their breathing rate indicated respiratory distress. In the radiation studies the mice were given a single dose of 14 or 16 Gy (Co$\sp{60})$ to the whole thorax and were sacrificed when moribund. The phenotype was defined as the percent of fibrosis area in the left lung as quantified with image analysis of histological sections. Quantitative trait loci (QTL) mapping was used to identify the chromosomal location of genes which contribute to susceptibility to bleomycin-induced pulmonary fibrosis in C57BL/6J mice compared to C3Hf/Kam mice and to determine if the QTL's which influence susceptibility to bleomycin-induced lung fibrosis in these progenitor strains could be implicated in susceptibility to radiation-induced lung fibrosis. For bleomycin, a genome wide scan revealed QTL's on chromosome 17, at the MHC, (LOD = 11.7 for males and 7.2 for females) accounting for approximately 21% of the phenotypic variance, and on chromosome 11 (LOD = 4.9), in male mice only, adding 8% of phenotypic variance. The bleomycin QTL on chromosome 17 was also implicated for susceptibility to radiation-induced fibrosis (LOD = 5.0) and contributes 7% of the phenotypic variance in the radiation study. In conclusion, susceptibility to both bleomycin-induced and radiation-induced pulmonary fibrosis are heritable traits, and are influenced by a genetic factor which maps to a genomic region containing the MHC. ^
Resumo:
Prostate cancer remains the second leading cause of male cancer deaths in the United States, yet the molecular mechanisms underlying this disease remain largely unknown. Cytogenetic and molecular analyses of prostate tumors suggest a consistent association with the loss of chromosome 10. Previously, we have defined a novel tumor suppressor locus PAC-1 within chromosome 10pter-q11. Introduction of the short arm of chromosome 10 into a prostatic adenocarcinoma cell line PC-3H resulted in dramatic tumor suppression and restoration of a programmed cell death pathway. Using a combined approach of comparative genomic hybridization and microsatellite analysis of PC-3H, I have identified a region of hemizygosity within 10p12-p15. This region has been shown to be involved in frequent loss of heterozygosity in gliomas and melanoma. To functionally dissect the region within chromosome 10p containing PAC-1, we developed a strategy of serial microcell fusion, a technique that allows the transfer of defined fragments of chromosome 10p into PC-3H. Serial microcell fusion was used to transfer defined 10p fragments into a mouse A9 fibrosarcoma cell line. Once characterized by FISH and microsatellite analyses, the 10p fragments were subsequently transferred into PC-3H to generate a panel of microcell hybrid clones containing overlapping deletions of chromosome 10p. In vivo and microsatellite analyses of these PC hybrids identified a small chromosome 10p fragment (an estimated 31 Mb in size inclusive of the centromere) that when transferred into the PC-3H background, resulted in significant tumor suppression and limited a region of functional tumor suppressor activity to chromosome 10p12.31-q11. This region coincides with a region of LOH demonstrated in prostate cancer. These studies demonstrate the utility of this approach as a powerful tool to limit regions of functional tumor suppressor activity. Furthermore, these data used in conjunction with data generated by the Human Genome Project lent a focused approach to identify candidate tumor suppressor genes involved in prostate cancer. ^
Resumo:
Thoracic aortic aneurysms leading to aortic dissections (TAAD) are a major cause of morbidity and mortality in the United States. TAAD is a complication of some known genetic disorders, such as Marfan syndrome and Turner syndrome, but the majority of familial cases are not due to a known genetic syndrome. Previous studies by our group have established that nonsyndromic, familial TAAD is inherited in an autosomal dominant manner with decreased penetrance and variable expression. Using one large family with multiple members with TAAD for the genome wide scan, a major locus for familial TAAD was mapped to 5q13–14 (TAAD1). Nine out of 15 families studied were linked to this locus, establishing that TAAD1 was a major locus, and that there was genetic heterogeneity for the condition. Mapping of TAAD2 locus was accomplished using a single large family with multiple members with TAAD not linked to known loci of aneurysm formation. This established a second novel locus for familial TAAD on 3p24–25 (LOD score of 4.3), termed the TAAD2 locus. Two putative loci with suggestive LOD scores were mapped on 4q and 12q through a genome scan carried out using three families. TAAD phenotype in 12 families did not segregate with known loci, indicating further genetic heterogeneity. An STS-tagged BAC based contig was constructed for 7.8Mb and 25Mb critical interval of TAAD1 and TAAD2 respectively and characterized to identify the defective gene. The hypothesis that the defective genes responsible for the TAAD1 and TAAD2 encoded extracellular matrix (ECM) proteins, the major components of the elastic fiber system in the aortic media was tested. Four genes encoding ECM proteins, versican, thrombospondin-3, CRTL1, on TAAD1 and FBLN2 at TAAD2 were sequenced, but no disease-causing mutations were identified. Studies to identify the defective gene are initiated through the positional candidate gene approach using combination of bioinformatics and expression studies. The identification of the TAAD susceptibility genes will allow for presymptomatic diagnosis of individuals at risk for this life threatening disease. The identification of the molecular defects that contribute to TAAD will also further our understanding of the proteins that provide structural integrity to the aortic wall. ^