8 resultados para Bioinformatics Analysis
em DigitalCommons@The Texas Medical Center
Resumo:
Attention has recently been drawn to Enterococcus faecium because of an increasing number of nosocomial infections caused by this species and its resistance to multiple antibacterial agents. However, relatively little is known about the pathogenic determinants of this organism. We have previously identified a cell-wall-anchored collagen adhesin, Acm, produced by some isolates of E. faecium, and a secreted antigen, SagA, exhibiting broad-spectrum binding to extracellular matrix proteins. Here, we analysed the draft genome of strain TX0016 for potential microbial surface components recognizing adhesive matrix molecules (MSCRAMMs). Genome-based bioinformatics identified 22 predicted cell-wall-anchored E. faecium surface proteins (Fms), of which 15 (including Acm) had characteristics typical of MSCRAMMs, including predicted folding into a modular architecture with multiple immunoglobulin-like domains. Functional characterization of one [Fms10; redesignated second collagen adhesin of E. faecium (Scm)] revealed that recombinant Scm(65) (A- and B-domains) and Scm(36) (A-domain) bound to collagen type V efficiently in a concentration-dependent manner, bound considerably less to collagen type I and fibrinogen, and differed from Acm in their binding specificities to collagen types IV and V. Results from far-UV circular dichroism measurements of recombinant Scm(36) and of Acm(37) indicated that these proteins were rich in beta-sheets, supporting our folding predictions. Whole-cell ELISA and FACS analyses unambiguously demonstrated surface expression of Scm in most E. faecium isolates. Strikingly, 11 of the 15 predicted MSCRAMMs clustered in four loci, each with a class C sortase gene; nine of these showed similarity to Enterococcus faecalis Ebp pilus subunits and also contained motifs essential for pilus assembly. Antibodies against one of the predicted major pilus proteins, Fms9 (redesignated EbpC(fm)), detected a 'ladder' pattern of high-molecular-mass protein bands in a Western blot analysis of cell surface extracts from E. faecium, suggesting that EbpC(fm) is polymerized into a pilus structure. Further analysis of the transcripts of the corresponding gene cluster indicated that fms1 (ebpA(fm)), fms5 (ebpB(fm)) and ebpC(fm) are co-transcribed, a result consistent with those for pilus-encoding gene clusters of other Gram-positive bacteria. All 15 genes occurred frequently in 30 clinically derived diverse E. faecium isolates tested. The common occurrence of MSCRAMM- and pilus-encoding genes and the presence of a second collagen-binding protein may have important implications for our understanding of this emerging pathogen.
Resumo:
Rhodobacter sphaeroides 2.4.1 is a Gram negative facultative photoheterotrophic bacterium that has been shown to have an N-acyl homoserine lactone-based quorum sensing system called cer for c&barbelow;ommunity e&barbelow;scape r&barbelow;esponse. The cer ORFs are cerR, the transcriptional regulator, cerI, the autoinducer synthase and cerA , whose function is unknown. The autoinducer molecule, 7,8- cis-N-(tetradecenoyl) homoserine lactone, has been characterized. The objective of this study was to identify an environmental stimulus that influences the regulation of cerRAI and, to characterize transcription of the cer operon. ^ A cerR::lacZ transcriptional fusion was made and β-Galactosidase assays were performed in R. sphaeroides 2.4.1 strains, wild type, AP3 (CerI−) and AP4 (CerR−). The cerR::lacZ β-Galactosidase assays were used as an initial survey of the mode of regulation of the Cer system. A cerA::lacZ translational fusion was created and was used to show that cerA can be translated. The presence of 7,8-cis-N-(tetradecenoyl) homoserine lactone was detected from R. sphaeroides strains wild type and AP4 (CerR−) using a lasR::lacZ translational fusion autoinducer bioassay. The cerR::lacZ transcriptional fusion in R. sphaeroides 2.4.1 wild type was tested under different environmental stimuli, such as various carbon sources, oxygen tensions, light intensities and culture media to determine if they influence transcription of the cer ORFs. Although lacZ assay data implicated high light intensity at 100 W/m2 to stimulate cer transcription, quantitative Northern RNA data of the cerR transcript showed that low light intensity at 3 W/m2 is at least one environmental stimulus that induces cer transcription. This finding was supported by DNA microarray analysis. Northern analysis of the cerRAI transcript provided evidence that the cer ORFs are co-transcribed, and that the cer operon contains two additional genes. Bioinformatics was used to identify genes that may be regulated by the Cer system by identifying putative lux box homologue sequences in the presumed promoter region of these genes. Genes that were identified were fliQ, celB and calsymin, all implicated in interacting with plants. Primer extension was used to help localize cis-elements in the promoter region. The cerR::lacZ transcriptional fusion was monitored in a subset of different global DNA binding transcriptional regulator mutant strains of R. sphaeroides 2.4.1. Those regulators involved in maintaining an anaerobic photosynthetic lifestyle appeared to have an effect. Collectively, the data imply that R. sphaeroides 2.4.1 activates the Cer system when grown anaerobic photosynthetically at low light intensity, 3 W/m2, and it may be involved in an interaction with plants. ^
Resumo:
Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^
Resumo:
Numerous studies have been carried out to try to better understand the genetic predisposition for cardiovascular disease. Although it is widely believed that multifactorial diseases such as cardiovascular disease is the result from effects of many genes which working alone or interact with other genes, most genetic studies have been focused on identifying of cardiovascular disease susceptibility genes and usually ignore the effects of gene-gene interactions in the analysis. The current study applies a novel linkage disequilibrium based statistic for testing interactions between two linked loci using data from a genome-wide study of cardiovascular disease. A total of 53,394 single nucleotide polymorphisms (SNPs) are tested for pair-wise interactions, and 8,644 interactions are found to be significant with p-values less than 3.5×10-11. Results indicate that known cardiovascular disease susceptibility genes tend not to have many significantly interactions. One SNP in the CACNG1 (calcium channel, voltage-dependent, gamma subunit 1) gene and one SNP in the IL3RA (interleukin 3 receptor, alpha) gene are found to have the most significant pair-wise interactions. Findings from the current study should be replicated in other independent cohort to eliminate potential false positive results.^
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^
Resumo:
Systemic sclerosis (SSc) or Scleroderma is a complex disease and its etiopathogenesis remains unelucidated. Fibrosis in multiple organs is a key feature of SSc and studies have shown that transforming growth factor-β (TGF-β) pathway has a crucial role in fibrotic responses. For a complex disease such as SSc, expression quantitative trait loci (eQTL) analysis is a powerful tool for identifying genetic variations that affect expression of genes involved in this disease. In this study, a multilevel model is described to perform a multivariate eQTL for identifying genetic variation (SNPs) specifically associated with the expression of three members of TGF-β pathway, CTGF, SPARC and COL3A1. The uniqueness of this model is that all three genes were included in one model, rather than one gene being examined at a time. A protein might contribute to multiple pathways and this approach allows the identification of important genetic variations linked to multiple genes belonging to the same pathway. In this study, 29 SNPs were identified and 16 of them located in known genes. Exploring the roles of these genes in TGF-β regulation will help elucidate the etiology of SSc, which will in turn help to better manage this complex disease. ^
Resumo:
Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (
Resumo:
The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^