Biblioteca Digital

29 resultados para Bioinformatics

em DigitalCommons@The Texas Medical Center

SURVIVAL PREDICTION FOR BRAIN TUMOR PATIENTS USING GENE EXPRESSION DATA

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.

FUNCTIONS OF DEADENYLATION FACTORS IN MRNA DECAY AND MRNA PROCESSING BODY FORMATION

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most newly synthesized messenger RNAs possess a 5’ cap and a 3’ poly(A) tail. The process of poly(A) tail shortening, also termed deadenylation, is important for post-transcriptional gene regulation, because deadenylation not only leads to mRNA translational inhibition but also is the first step of major mRNA degradation. Translationally inhibited mRNAs can be stored and/or degraded in dynamic cytoplasmic foci termed mRNA processing bodies, or P bodies, which are conserved in eukaryotes. To shed new light on the mechanisms of P body formation and P body functions, I focused on the link between deadenylation factors and P bodies. I found that the two major deadenylation complexes, Pan3-Pan2 and Ccr4-Caf1, can both be enriched in P bodies. The deadenylase activity of the Ccr4-Caf1 complex is prerequisite for P body formation. Pan3, but not the deadenylase Pan2, is essential for P body formation. While the C-terminal domain of Pan3 is important for interaction with Pan2, Pan3 N-terminal domain is important for Pan3 to form cytoplasmic foci colocalizing with P bodies and to promote mRNA decay. Interestingly, Pan3 N-terminal domain may be phosphorylated to regulate Pan3 localization and functions. Aside from the functions of the two deadenylation complexes in P bodies, I also studied all reported human P body proteins as a whole using bioinformatics. This effort not only has generated a comprehensive picture of the functions of and interactions among human P body proteins, but also has predicted proteins that may regulate P body formation and/or functions. In summary, my study has established a direct link between mRNA deadenylation and P body formation and has also led to new hypotheses to guide future research on how P body dynamics are controlled.

NETWORK TOPOLOGY IN HUMAN PROTEIN INTERACTION DATA PREDICTS FUNCTIONAL ASSOCIATION

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High-throughput assays, such as yeast two-hybrid system, have generated a huge amount of protein-protein interaction (PPI) data in the past decade. This tremendously increases the need for developing reliable methods to systematically and automatically suggest protein functions and relationships between them. With the available PPI data, it is now possible to study the functions and relationships in the context of a large-scale network. To data, several network-based schemes have been provided to effectively annotate protein functions on a large scale. However, due to those inherent noises in high-throughput data generation, new methods and algorithms should be developed to increase the reliability of functional annotations. Previous work in a yeast PPI network (Samanta and Liang, 2003) has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional associations between proteins, and hence suggest their functions. One advantage of the work is that their algorithm is not sensitive to noises (false positives) in high-throughput PPI data. In this study, we improved their prediction scheme by developing a new algorithm and new methods which we applied on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting functionally associated proteins. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as independent and unbiased benchmarks to evaluate our algorithms and methods within the human PPI network. We showed that, compared with the previous work from Samanta and Liang, our algorithm and methods developed in this study improved the overall quality of functional inferences for human proteins. By applying the algorithms to the human PPI network, we obtained 4,233 significant functional associations among 1,754 proteins. Further comparisons of their KEGG and GO annotations allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made pathway analysis to identify several subclusters that are highly enriched in certain signaling pathways. Particularly, we performed a detailed analysis on a subcluster enriched in the transforming growth factor β signaling pathway (P<10-50) which is important in cell proliferation and tumorigenesis. Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotations in this post-genomic era.

Identification and phenotypic characterization of a second collagen adhesin, Scm, and genome-based identification and analysis of 13 other predicted MSCRAMMs, including four distinct pilus loci, in Enterococcus faecium.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Attention has recently been drawn to Enterococcus faecium because of an increasing number of nosocomial infections caused by this species and its resistance to multiple antibacterial agents. However, relatively little is known about the pathogenic determinants of this organism. We have previously identified a cell-wall-anchored collagen adhesin, Acm, produced by some isolates of E. faecium, and a secreted antigen, SagA, exhibiting broad-spectrum binding to extracellular matrix proteins. Here, we analysed the draft genome of strain TX0016 for potential microbial surface components recognizing adhesive matrix molecules (MSCRAMMs). Genome-based bioinformatics identified 22 predicted cell-wall-anchored E. faecium surface proteins (Fms), of which 15 (including Acm) had characteristics typical of MSCRAMMs, including predicted folding into a modular architecture with multiple immunoglobulin-like domains. Functional characterization of one [Fms10; redesignated second collagen adhesin of E. faecium (Scm)] revealed that recombinant Scm(65) (A- and B-domains) and Scm(36) (A-domain) bound to collagen type V efficiently in a concentration-dependent manner, bound considerably less to collagen type I and fibrinogen, and differed from Acm in their binding specificities to collagen types IV and V. Results from far-UV circular dichroism measurements of recombinant Scm(36) and of Acm(37) indicated that these proteins were rich in beta-sheets, supporting our folding predictions. Whole-cell ELISA and FACS analyses unambiguously demonstrated surface expression of Scm in most E. faecium isolates. Strikingly, 11 of the 15 predicted MSCRAMMs clustered in four loci, each with a class C sortase gene; nine of these showed similarity to Enterococcus faecalis Ebp pilus subunits and also contained motifs essential for pilus assembly. Antibodies against one of the predicted major pilus proteins, Fms9 (redesignated EbpC(fm)), detected a 'ladder' pattern of high-molecular-mass protein bands in a Western blot analysis of cell surface extracts from E. faecium, suggesting that EbpC(fm) is polymerized into a pilus structure. Further analysis of the transcripts of the corresponding gene cluster indicated that fms1 (ebpA(fm)), fms5 (ebpB(fm)) and ebpC(fm) are co-transcribed, a result consistent with those for pilus-encoding gene clusters of other Gram-positive bacteria. All 15 genes occurred frequently in 30 clinically derived diverse E. faecium isolates tested. The common occurrence of MSCRAMM- and pilus-encoding genes and the presence of a second collagen-binding protein may have important implications for our understanding of this emerging pathogen.

Cytochrome P450 4F isoforms in different species: Identification, gene regulation and putative roles in inflammation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The cytochrome P450 4F subfamily comprises a group of enzymes that metabolize derivatives of arachidonic acid such as prostaglandins, lipoxins leukotrienes and hydroxyeicosatetraenoic acids, which are important mediators involved in the inflammatory response. Therefore, we speculate that CYP4Fs might be able to modulate the extent of the inflammation by controlling of the tissue levels of these inflammatory mediators, especially, leukotriene B4. One way to provide support for this hypothesis is to test whether the expression of CYP4Fs changes under inflammatory conditions, since these changes are required to adjust the levels of inflammatory mediators. ^ A lipopolysacchride (LPS) induced rat inflammation model was used to analyze the expressions of rat CYP4F4 and CYP4F5 in liver and kidney. LPS administration did not change the constitutive expression level of CYP4F4 and CYP4F5. In liver, the expressions of CYP4F4 and CYP4F5 decreased to 50–60% of the untreated level. The same effect of LPS on CYP4F4 and CYP4F5 expression can be mimicked in hepatocyte primary cultures treated with LPS, indicating a direct of effect of LPS on hepatocytes. LPS treatment also decreased the activity of liver microsomes towards chlorpromazine, however, antibody inhibition study revealed that liver CYP4Fs are not the only players in metabolizing chlorpromazine. To study further the underlying mechanism, CYP4F5 gene was isolated, characterized, and the promoter region was defined. ^ Accumulating evidence showed that peroxisome proliferator-activated receptors (PPARs) play an active role in inflammation. To investigate the possible role of PPARα in regulating CYP4F expression by inflammation or by clofibrate treatment, the expressions of two new mouse 4F isoforms were analyzed in PPARα knockout mice upon LPS or clofibrate challenge. A novel induction of CYP4F15 by LPS and clofibrate was observed in kidney, and this effect is totally dependent on the presence of PPARα. Renal CYP4F16 expression was not affected by LPS or clofibrate in both (+/+) and (−/−) mice. In contrast, hepatic expressions of CYP4F15 and CYP4F16 were reduced significantly in (+/+) mice, but much less in (−/−) mice, suggesting that PPARα is partially responsible for this down-regulation. Clofibrate treatment reduced the expression of CYP4F16 in liver, but has no effect on CYP4F15 and PPARα does not have a role in hepatic CYP4F expression regulated by clofibrate. In general, CYP4Fs are regulated in an isoform-, tissue- and species-specific manner. ^ A human CYP4F isoform, CYP4F11, was isolated. The genomic structure was also solved by using database mining and bioinformatics tools. Localization of CYP4F11 to chromosome 19, 16 kb upstream of CYP4F2, suggests that human CYP4F genes may form a cluster on chromosome 19. This novel human 4F is highly expressed in liver, as well as in kidney, heart and skeletal muscle. Further study of the activity and gene regulation on CYP4F11 will provide us more insights into the physiological functions of CYP4F subfamily. ^

The genetic basis of thoracic aortic aneurysms and dissections: Genetic heterogeneity and mapping of TAAD1 and TAAD2 loci

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Thoracic aortic aneurysms leading to aortic dissections (TAAD) are a major cause of morbidity and mortality in the United States. TAAD is a complication of some known genetic disorders, such as Marfan syndrome and Turner syndrome, but the majority of familial cases are not due to a known genetic syndrome. Previous studies by our group have established that nonsyndromic, familial TAAD is inherited in an autosomal dominant manner with decreased penetrance and variable expression. Using one large family with multiple members with TAAD for the genome wide scan, a major locus for familial TAAD was mapped to 5q13–14 (TAAD1). Nine out of 15 families studied were linked to this locus, establishing that TAAD1 was a major locus, and that there was genetic heterogeneity for the condition. Mapping of TAAD2 locus was accomplished using a single large family with multiple members with TAAD not linked to known loci of aneurysm formation. This established a second novel locus for familial TAAD on 3p24–25 (LOD score of 4.3), termed the TAAD2 locus. Two putative loci with suggestive LOD scores were mapped on 4q and 12q through a genome scan carried out using three families. TAAD phenotype in 12 families did not segregate with known loci, indicating further genetic heterogeneity. An STS-tagged BAC based contig was constructed for 7.8Mb and 25Mb critical interval of TAAD1 and TAAD2 respectively and characterized to identify the defective gene. The hypothesis that the defective genes responsible for the TAAD1 and TAAD2 encoded extracellular matrix (ECM) proteins, the major components of the elastic fiber system in the aortic media was tested. Four genes encoding ECM proteins, versican, thrombospondin-3, CRTL1, on TAAD1 and FBLN2 at TAAD2 were sequenced, but no disease-causing mutations were identified. Studies to identify the defective gene are initiated through the positional candidate gene approach using combination of bioinformatics and expression studies. The identification of the TAAD susceptibility genes will allow for presymptomatic diagnosis of individuals at risk for this life threatening disease. The identification of the molecular defects that contribute to TAAD will also further our understanding of the proteins that provide structural integrity to the aortic wall. ^

Regulation and transcript analysis of the ceroperon responsible for quorum sensing in Rhodobacter sphaeroides 2.4.1

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Rhodobacter sphaeroides 2.4.1 is a Gram negative facultative photoheterotrophic bacterium that has been shown to have an N-acyl homoserine lactone-based quorum sensing system called cer for c&barbelow;ommunity e&barbelow;scape r&barbelow;esponse. The cer ORFs are cerR, the transcriptional regulator, cerI, the autoinducer synthase and cerA , whose function is unknown. The autoinducer molecule, 7,8- cis-N-(tetradecenoyl) homoserine lactone, has been characterized. The objective of this study was to identify an environmental stimulus that influences the regulation of cerRAI and, to characterize transcription of the cer operon. ^ A cerR::lacZ transcriptional fusion was made and β-Galactosidase assays were performed in R. sphaeroides 2.4.1 strains, wild type, AP3 (CerI−) and AP4 (CerR−). The cerR::lacZ β-Galactosidase assays were used as an initial survey of the mode of regulation of the Cer system. A cerA::lacZ translational fusion was created and was used to show that cerA can be translated. The presence of 7,8-cis-N-(tetradecenoyl) homoserine lactone was detected from R. sphaeroides strains wild type and AP4 (CerR−) using a lasR::lacZ translational fusion autoinducer bioassay. The cerR::lacZ transcriptional fusion in R. sphaeroides 2.4.1 wild type was tested under different environmental stimuli, such as various carbon sources, oxygen tensions, light intensities and culture media to determine if they influence transcription of the cer ORFs. Although lacZ assay data implicated high light intensity at 100 W/m2 to stimulate cer transcription, quantitative Northern RNA data of the cerR transcript showed that low light intensity at 3 W/m2 is at least one environmental stimulus that induces cer transcription. This finding was supported by DNA microarray analysis. Northern analysis of the cerRAI transcript provided evidence that the cer ORFs are co-transcribed, and that the cer operon contains two additional genes. Bioinformatics was used to identify genes that may be regulated by the Cer system by identifying putative lux box homologue sequences in the presumed promoter region of these genes. Genes that were identified were fliQ, celB and calsymin, all implicated in interacting with plants. Primer extension was used to help localize cis-elements in the promoter region. The cerR::lacZ transcriptional fusion was monitored in a subset of different global DNA binding transcriptional regulator mutant strains of R. sphaeroides 2.4.1. Those regulators involved in maintaining an anaerobic photosynthetic lifestyle appeared to have an effect. Collectively, the data imply that R. sphaeroides 2.4.1 activates the Cer system when grown anaerobic photosynthetically at low light intensity, 3 W/m2, and it may be involved in an interaction with plants. ^

Application of cell line based genomic predictors to predict response to targeted therapies in breast cancer

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cancer cell lines can be treated with a drug and the molecular comparison of responders and non-responders may yield potential predictors that could be tested in the clinic. It is a bioinformatics challenge to apply the cell line-derived multivariable response predictors to patients who respond to therapy. Using the gene expression data from 23 breast cancer cell lines, I developed three predictors of dasatinib sensitivity by selecting differentially expressed genes and applying different classification algorithms. The performance of these predictors on independent cell lines with known dasatinib response was tested. The predictor based on weighted voting method has the best overall performance. It correctly predicted dasatinib sensitivity in 11 out of 12 (92%) breast and 17 out of 23 (74%) lung cancer cell lines. These predictors were then applied to the gene expression data from 133 breast cancer patients in an attempt to predict how the patients might respond to dasatinib therapy. Two predictors identified 13 patients in common to be dasatinib sensitive. Sixty two percent of these cases are triple negative (ER-negative, HER2-negative and PR-negative) and 76% are double negative. The result is consistent with the findings from other studies, which identified a target population for dasatinib treatment to be triple negative or basal breast cancer subtype. In conclusion, we think that the cell line-derived dasatinib classifiers can be applied to the human patients. ^

The information bottleneck method for genome-wide association studies

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Integrating sequence information in microarray data analysis by free energy modeling

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^

Genome-wide gene-gene interaction analysis for cardiovascular disease

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Numerous studies have been carried out to try to better understand the genetic predisposition for cardiovascular disease. Although it is widely believed that multifactorial diseases such as cardiovascular disease is the result from effects of many genes which working alone or interact with other genes, most genetic studies have been focused on identifying of cardiovascular disease susceptibility genes and usually ignore the effects of gene-gene interactions in the analysis. The current study applies a novel linkage disequilibrium based statistic for testing interactions between two linked loci using data from a genome-wide study of cardiovascular disease. A total of 53,394 single nucleotide polymorphisms (SNPs) are tested for pair-wise interactions, and 8,644 interactions are found to be significant with p-values less than 3.5×10-11. Results indicate that known cardiovascular disease susceptibility genes tend not to have many significantly interactions. One SNP in the CACNG1 (calcium channel, voltage-dependent, gamma subunit 1) gene and one SNP in the IL3RA (interleukin 3 receptor, alpha) gene are found to have the most significant pair-wise interactions. Findings from the current study should be replicated in other independent cohort to eliminate potential false positive results.^

Identification and characterization of mutations in SMC contractile genes involved in thoracic aortic aneurysms and dissections

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Aortic aneurysms and dissections are the 15th most common cause of death in the United States. Genetic factors contribute to the pathogenesis of thoracic aortic aneurysms and dissections (TAAD). Currently, six loci and four genes have been identified for familial TAAD. Notably, mutations in smooth muscle cell (SMC) contractile genes, ACTA2 and MYH11, are responsible for 15% of familial TAAD, suggesting that proper SMC contraction is important for normal aorta function. Therefore, we hypothesize that mutations in other genes encoding SMC contractile proteins also cause familial TAAD. ^ To test this hypothesis, we used a candidate gene approach to identify causative mutations in SMC contractile genes for familial TAAD. Sequencing DNA in 80 TAAD patients from unrelated families, we identified putative mutations in eight contractile genes. We chose myosin light chain kinase (MLCK ) S1759P for further study for the following reasons: (1) Serine 1759 is conserved between vertebrates and invertebrates. (2) S1759P is predicted to be functionally deleterious by bioinformatics. (3) Low blood pressure is observed in SMC-selective MLCK-deficient mice. ^ In the presence of Ca2+/Calmodulin (CaM), MLCK containing CaM binding and kinase domains are activated to phosphorylate myosin light chain, thereby initiate SMC contraction. The CaM binding sequence of MLCK forms an α-helix structure required for CaM binding. MLCK Serine 1759 is located within the CaM binding domain. S1759P is predicted to decrease the α-helix composition in the CaM binding domain. Hence, we hypothesize that MLCK mutations cause TAAD through disturbing CaM binding and MLCK activity. ^ We further sequenced MLCK in DNA samples from additional 86 probands with familial TAAD. Two more mutations, MLCK A1754T and R1480Stop, were identified, supporting that MLCK mutations cause familial TAAD. ^ To define whether MLCK mutations disrupted CaM binding and MLCK activity, we performed co-immunoprecipitation and kinase assays. Decreased CaM binding and kinase activity was detected in A1754T and S1759P. Moreover, R1480Stop is predicted to truncate kinase and CaM binding domains. We conclude that MLCK mutations disrupt CaM binding and MLCK activity. ^ Collectively, our study is first to show mutations in genes regulating SMC contraction cause TAAD. This finding further highlights the importance of SMC contraction in maintaining aorta function. ^

Functional data analysis approaches for genotype-phenotype association studies from next-generation sequencing

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

Statistical Methods for Differential Expressions of Genes Detected in Multiple-Condition Experiment of Microarray

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most studies of differential gene-expressions have been conducted between two given conditions. The two-condition experimental (TCE) approach is simple in that all genes detected display a common differential expression pattern responsive to a common two-condition difference. Therefore, the genes that are differentially expressed under the other conditions other than the given two conditions are undetectable with the TCE approach. In order to address the problem, we propose a new approach called multiple-condition experiment (MCE) without replication and develop corresponding statistical methods including inference of pairs of conditions for genes, new t-statistics, and a generalized multiple-testing method for any multiple-testing procedure via a control parameter C. We applied these statistical methods to analyze our real MCE data from breast cancer cell lines and found that 85 percent of gene-expression variations were caused by genotypic effects and genotype-ANAX1 overexpression interactions, which agrees well with our expected results. We also applied our methods to the adenoma dataset of Notterman et al. and identified 93 differentially expressed genes that could not be found in TCE. The MCE approach is a conceptual breakthrough in many aspects: (a) many conditions of interests can be conducted simultaneously; (b) study of association between differential expressions of genes and conditions becomes easy; (c) it can provide more precise information for molecular classification and diagnosis of tumors; (d) it can save lot of experimental resources and time for investigators.^

Single nucleotide polymorphisms (SNPs) associated with TGF-beta pathway and their significance in systemic sclerosis - A multilevel analysis

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Systemic sclerosis (SSc) or Scleroderma is a complex disease and its etiopathogenesis remains unelucidated. Fibrosis in multiple organs is a key feature of SSc and studies have shown that transforming growth factor-β (TGF-β) pathway has a crucial role in fibrotic responses. For a complex disease such as SSc, expression quantitative trait loci (eQTL) analysis is a powerful tool for identifying genetic variations that affect expression of genes involved in this disease. In this study, a multilevel model is described to perform a multivariate eQTL for identifying genetic variation (SNPs) specifically associated with the expression of three members of TGF-β pathway, CTGF, SPARC and COL3A1. The uniqueness of this model is that all three genes were included in one model, rather than one gene being examined at a time. A protein might contribute to multiple pathways and this approach allows the identification of important genetic variations linked to multiple genes belonging to the same pathway. In this study, 29 SNPs were identified and 16 of them located in known genes. Exploring the roles of these genes in TGF-β regulation will help elucidate the etiology of SSc, which will in turn help to better manage this complex disease. ^

«
1
2
»