10 resultados para Genotyping by sequencing

em DigitalCommons@The Texas Medical Center


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The basis for the recent transition of Enterococcus faecium from a primarily commensal organism to one of the leading causes of hospital-acquired infections in the United States is not yet understood. To address this, the first part of my project assessed isolates from early outbreaks in the USA and South America using sequence analysis, colony hybridizations, and minimal inhibitory concentrations (MICs) which showed clinical isolates possess virulence and antibiotic resistance determinants that are less abundant or lacking in community isolates. I also revealed that the level of ampicillin resistance increased over time in clinical strains. By sequencing the pbp5 gene, I demonstrated an ~5% difference in the pbp5 gene between strains with MICs <4ug/ml and those with MICs >4µg/ml, but no specific sequence changes correlated with increases in MICs within the latter group. A 3-10% nucleotide difference was also seen in three other genes analyzed, which suggested the existence of two distinct subpopulations of E. faecium. This led to the second part of my project analyzing concatenated core gene sequences, SNPs, the 16S rRNA, and phylogenetics of 21 E. faecium genomes confirming two distinct clades; a community-associated (CA) clade and hospital-associated (HA) clade. Molecular clock calculations indicate that these two clades likely diverged ~ 300,000 to > 1 million years ago, long before the modern antibiotic era. Genomic analysis also showed that, in addition to core genomic differences, HA E. faecium harbor specific accessory genetic elements that may confer selection advantages over CA E. faecium. The third part of my project discovered 6 E. faecium genes with the newly identified “WxL” domain. My analyses, using RT-PCR, western blots, patient sera, whole-cell ELISA, and immunogold electron microscopy, indicated that E. faecium WxL genes exist in operons, encode bacterial cell surface localized proteins, that WxL proteins are antigenic in humans, and are more exposed on the surface of clinical isolates versus community isolates (even though they are ubiquitous in both clades). ELISAs and BIAcore analyses also showed that proteins encoded by these operons bind several different host extracellular matrix proteins, as well as to each other, suggesting a novel cell-surface complex. In summary, my studies provide new insights into the evolution of E. faecium by showing that there are two distantly related clades; one being more successful in the hospital setting. My studies also identified operons encoding WxL proteins whose characteristics could also contribute to colonization and virulence within this species.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND AND PURPOSE: Familial aggregation of intracranial aneurysms (IA) strongly suggests a genetic contribution to pathogenesis. However, genetic risk factors have yet to be defined. For families affected by aortic aneurysms, specific gene variants have been identified, many affecting the receptors to transforming growth factor-beta (TGF-beta). In recent work, we found that aortic and intracranial aneurysms may share a common genetic basis in some families. We hypothesized, therefore, that mutations in TGF-beta receptors might also play a role in IA pathogenesis. METHODS: To identify genetic variants in TGF-beta and its receptors, TGFB1, TGFBR1, TGFBR2, ACVR1, TGFBR3, and ENG were directly sequenced in 44 unrelated patients with familial IA. Novel variants were confirmed by restriction digestion analyses, and allele frequencies were analyzed in cases versus individuals without known intracranial disease. Similarly, allele frequencies of a subset of known SNPs in each gene were also analyzed for association with IA. RESULTS: No mutations were found in TGFB1, TGFBR1, TGFBR2, or ACVR1. Novel variants identified in ENG (p.A60E) and TGFBR3 (p.W112R) were not detected in at least 892 reference chromosomes. ENG p.A60E showed significant association with familial IA in case-control studies (P=0.0080). No association with IA could be found for any of the known polymorphisms tested. CONCLUSIONS: Mutations in TGF-beta receptor genes are not a major cause of IA. However, we identified rare variants in ENG and TGFBR3 that may be important for IA pathogenesis in a subset of families.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Osteopontin (OPN) is a highly-phosphorylated extracellular matrix protein localized in bone, kidney, placenta, T-lymphocytes, macrophages, smooth muscle of the vascular system, milk, urine, and plasma. In ROS 17/2.8 osteoblast-like osteosarcoma cells, 1,25-dihydroxyvitamin D3 [1,25(OH)2D 3] regulates OPN at the transcriptional level resulting in increased steady state mRNA levels and increased production of OPN protein, maximal at 48 hours. Using ROS 17/2.8 cells as an osteoblast model, OPN was purified from culture medium after three hour treatments of either vehicle (ethanol) or 1,25(OH)2D3 via barium citrate precipitation followed by immunoaffinity chromatography. ^ Here, further evidence of regulation of OPN by 1,25(OH)2D 3 at the posttranslational level is presented. Prior to the up-regulation of OPN at the transcriptional level, 1,25(OH)2D3 induces a shift in OPN isoelectric point (pI) detected on two-dimensional gels from pI 4.6 to pI 5.1. Loading equal amounts of [32P]-labeled OPN recovered from ROS 17/2.8 cells exposed to 1,25(OH)2D3 or vehicle alone for three hours reveals that the shift from pI 4.6 to 5.1 is the result of reduced phosphorylation. Using structural analogs to 1,25(OH) 2D3, analog AT [25-(OH)-16-ene-23-yne-D3], which triggers Ca2+ influx through voltage sensitive Ca2+ channels but does not bind to the vitamin D receptor, mimicked the OPN pI shift while analog BT [1,25(OH)2-22-ene-24-cyclopropyl-D 3], which binds to the vitamin D receptor but does not allow Ca 2+ influx, did not. Inclusion of the Ca2+ channel blocker nifedipine also blocks the charge shift conversion of OPN. Further analysis of the signaling pathway initiated by 1,25(OH)2D3 reveals that inhibition of the cyclic 3′,5′ -adenosine monophosphate-dependent kinase, protein kinase A, or inhibition of the cyclic 3′,5′-guanine monophosphate-dependent kinase, protein kinase G, also prevents the charge shift conversion. ^ Isolation of OPN from rat femurs and tibiae provides evidence for the existence of these two OPN charge forms in vivo, evidenced by differential migration on isoelectric focusing gels and sodium dodecyl sulfate-polyacrylamide gels. Peptide sequencing of rat long bone fractions revealed the presence of a presumed dentin specific protein, dentin matrix protein-1 (DMP-1). Western blot analysis confirmed the existence of DMP-1 in these fractions. ^ Using the OPN charge forms in functional assays, it was determined that the charge forms have differential roles in both cell surface and mineralization functions. In cell attachment assays and Ca2+ influx assays using PC-3 prostate cancer cells, the pI 5.1 charge form of OPN was found to permit binding and increase intracellular Ca2+ concentrations of PC-3 cells. The increase in intracellular Ca2+ concentration was found to be integrin αvβ3-dependent. In mineralization assays, the pI 4.6 charge form of OPN promoted hydroxyapatite formation, while the pI 5.1 charge form had improved Ca2+ binding ability. ^ In conclusion, these findings suggest that 1,25(OH) 2D3 regulates OPN not only at the transcriptional level, but also plays a role in determination of the OPN phosphorylation state. The latter involves a short term (less than three hours) treatment and is associated with membrane-initiated Ca2+ influx. Functional assays utilizing the two OPN charge forms reveal the dependence of OPN post-translational state on its function. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A clone of the primary Eco R1 family of human DNA sequences has been used as an indicator sequence for detecting alterations induced by a toxic agent. Specific clones of this family have been examined and compared to the consensus sequence to determine the normal variability of this family. Though variations were observed, data indicated that such clones can be used to study induced DNA modifications. This DNA was exposed to the toxic agent dimethyl sulfate under various conditions and a distinct pattern of aberrations was shown to occur. It is suggested that this approach be used to characterize patterns of damage induced by various agents in the ultimate development of a system capable of monitoring human genotoxic exposure. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (

Relevância:

30.00% 30.00%

Publicador:

Resumo:

My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Paracrine motogenic factors, including motility cytokines and extracellular matrix molecules secreted by normal cells, can stimulate metastatic cell invasion. For extracellular matrix molecules, both the intact molecules and the degradative products may exhibit these activities, which in some cases are not shared by the intact molecules. We found that human peritumoral and lung fibroblasts secrete motility-stimulating activity for several recently established human sarcoma cell strains. The motility of lung metastasis-derived human SYN-1 sarcoma cells was preferentially stimulated by human lung and peritumoral fibroblast motility-stimulating factors (FMSFs). FMSFs were nondialyzable, susceptible to trypsin, and sensitive to dithiothreitol. Cycloheximide inhibited accumulation of FMSF activity in conditioned medium; however, addition of cycloheximide to the migration assay did not significantly affect motility-stimulating activity. Purified hepatocyte growth factor/scatter factor (HGF/SF), rabbit anti-hHGF, and RT-PCR analysis of peritumoral and lung fibroblast HGF/SF mRNA expression indicated that FMSF activity was unrelated to HGF/SF. Partial purification of FMSF by gel exclusion chromatography revealed several peaks of activity, suggesting multiple FMSF molecules or complexes.^ We purified the fibroblast motility-stimulating factor from human lung fibroblast-conditioned medium to apparent homogeneity by sequential heparin affinity chromatography and DEAE anion exchange chromatography. Lysylendopeptidase C digestion of FMSF and sequencing of peptides purified by reverse phase HPLC after digestion identified it as an N-terminal fragment of human fibronectin. Purified FMSF stimulated predominantly chemotaxis but chemokinesis as well of SYN-1 sarcoma cells and was chemotactic for a variety of human sarcoma cells, including fibrosarcoma, leiomyosarcoma, liposarcoma, synovial sarcoma and neurofibrosarcoma cells. The motility-stimulating activity present in HLF-CM was completely eliminated by either neutralization or immunodepletion with a rabbit anti-human-fibronectin antibody, thus further confirming that the fibronectin fragment was the FMSF responsible for the motility stimulation of human soft tissue sarcoma cells. Since human soft tissue sarcomas have a distinctive hematogenous metastatic pattern (predominantly lung), FMSF may play a role in this process. ^