8 resultados para functional methods
em DigitalCommons@The Texas Medical Center
Resumo:
High-throughput assays, such as yeast two-hybrid system, have generated a huge amount of protein-protein interaction (PPI) data in the past decade. This tremendously increases the need for developing reliable methods to systematically and automatically suggest protein functions and relationships between them. With the available PPI data, it is now possible to study the functions and relationships in the context of a large-scale network. To data, several network-based schemes have been provided to effectively annotate protein functions on a large scale. However, due to those inherent noises in high-throughput data generation, new methods and algorithms should be developed to increase the reliability of functional annotations. Previous work in a yeast PPI network (Samanta and Liang, 2003) has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional associations between proteins, and hence suggest their functions. One advantage of the work is that their algorithm is not sensitive to noises (false positives) in high-throughput PPI data. In this study, we improved their prediction scheme by developing a new algorithm and new methods which we applied on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting functionally associated proteins. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as independent and unbiased benchmarks to evaluate our algorithms and methods within the human PPI network. We showed that, compared with the previous work from Samanta and Liang, our algorithm and methods developed in this study improved the overall quality of functional inferences for human proteins. By applying the algorithms to the human PPI network, we obtained 4,233 significant functional associations among 1,754 proteins. Further comparisons of their KEGG and GO annotations allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made pathway analysis to identify several subclusters that are highly enriched in certain signaling pathways. Particularly, we performed a detailed analysis on a subcluster enriched in the transforming growth factor β signaling pathway (P<10-50) which is important in cell proliferation and tumorigenesis. Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotations in this post-genomic era.
Resumo:
Chondrocyte gene regulation is important for the generation and maintenance of cartilage tissues. Several regulatory factors have been identified that play a role in chondrogenesis, including the positive transacting factors of the SOX family such as SOX9, SOX5, and SOX6, as well as negative transacting factors such as C/EBP and delta EF1. However, a complete understanding of the intricate regulatory network that governs the tissue-specific expression of cartilage genes is not yet available. We have taken a computational approach to identify cis-regulatory, transcription factor (TF) binding motifs in a set of cartilage characteristic genes to better define the transcriptional regulatory networks that regulate chondrogenesis. Our computational methods have identified several TFs, whose binding profiles are available in the TRANSFAC database, as important to chondrogenesis. In addition, a cartilage-specific SOX-binding profile was constructed and used to identify both known, and novel, functional paired SOX-binding motifs in chondrocyte genes. Using DNA pattern-recognition algorithms, we have also identified cis-regulatory elements for unknown TFs. We have validated our computational predictions through mutational analyses in cell transfection experiments. One novel regulatory motif, N1, found at high frequency in the COL2A1 promoter, was found to bind to chondrocyte nuclear proteins. Mutational analyses suggest that this motif binds a repressive factor that regulates basal levels of the COL2A1 promoter.
Resumo:
Identifying and characterizing the genes responsible for inherited human diseases will ultimately lead to a more holistic understanding of disease pathogenesis, catalyze new diagnostic and treatment modalities, and provide insights into basic biological processes. This dissertation presents research aimed at delineating the genetic and molecular basis of human diseases through epigenetic and functional studies and can be divided into two independent areas of research. The first area of research describes the development of two high-throughput melting curve based methods to assay DNA methylation, referred to as McMSP and McCOBRA. The goal of this project was to develop DNA methylation methods that can be used to rapidly determine the DNA methylation status at a specific locus in a large number of samples. McMSP and McCOBRA provide several advantages over existing methods, as they are simple, accurate, robust, and high-throughput making them applicable to large-scale DNA methylation studies. McMSP and McCOBRA were then used in an epigenetic study of the complex disease Ankylosing spondylitis (AS). Specifically, I tested the hypothesis that aberrant patterns of DNA methylation in five AS candidate genes contribute to disease susceptibility. While no statistically significant methylation differences were observed between cases and controls, this is the first study to investigate the hypothesis that epigenetic variation contributes to AS susceptibility and therefore provides the conceptual framework for future studies. ^ In the second area of research, I performed experiments to better delimit the function of aryl hydrocarbon receptor-interacting protein-like 1 (AIPL1), which when mutated causes various forms of inherited blindness such as Leber congenital amaurosis. A yeast two-hybrid screen was performed to identify putative AIPL1-interacting proteins. After screening 2 × 106 bovine retinal cDNA library clones, 6 unique putative AIPL1-interacting proteins were identified. While these 6 AIPL1 protein-protein interactions must be confirmed, their identification is an important step in understanding the functional role of AIPL1 within the retina and will provide insight into the molecular mechanisms underlying inherited blindness. ^
Resumo:
The Ca2+-binding protein calmodulin (CaM) is a key transducer of Ca2+ oscillations by virtue of its ability to bind Ca 2+ selectively and then interact specifically with a large number of downstream enzymes and proteins. It remains unclear whether Ca2+ -dependent signaling alone can activate the full range of Ca 2+/CaM regulated processes or whether other regulatory schemes in the cell exist that allow specific targeting of CaM to subsets of Ca 2+/CaM binding sites or regions of the cell. Here we investigate the possibility that alterations of the availability of CaM may serve as a potential cellular mechanism for regulating the activation of CaM-dependent targets. By utilizing sensitive optical techniques with high spatial and temporal resolution, we examine the intracellular dynamics of CaM signaling at a resolution previously unattainable. After optimizing and characterizing both the optical methods and fluorescently labeled probes for intracellular measurements, the diffusion of CaM in the cytoplasm of HEK293 cells was analyzed. It was discovered that the diffusion characteristics of CaM are similar to that of a comparably sized inert molecule. Independent manipulation of experimental parameters, including increases in total concentrations of CaM and intracellular Ca2+ levels, did not change the diffusion of CaM in the cytoplasm. However, changes in diffusion were seen when the concentration of Ca2+/CaM-binding targets was increased in conjunction with elevated Ca2+. This indicates that CaM is not normally limiting for the activation of Ca 2+/CaM-dependent enzymes in HEK293 cells but reveals that the ratio of CaM to CaM-dependent targets is a potential mechanism for changing CaM availability. Next we considered whether cellular compartmentalization may act to regulate concentrations of available Ca2+/CaM in hippocampal neurons. We discovered changes in diffusion parameters of CaM under elevated Ca2+ conditions in the soma, neurite and nucleus which suggest that either the composition of cytoplasm is different in these compartments and/or they are composed of unique families of CaM-binding proteins. Finally, we return to the HEK293 cell and for the first time directly show the intracellular binding of CaM and CaMKII, an important target for CaM critical for neuronal function and plasticity. Furthermore, we analyzed the complex binding stoichiometry of this molecular interaction in the basal, activated and autophosphorylated states of CaMKII and determined the impact of this binding on CaM availability in the cell. Overall these results demonstrate that regulation of CaM availability is a viable cellular mechanism for regulating the output of CaM-dependent processes and that this process is tuned to the specific functional needs of a particular cell type and subcellular compartment. ^
Resumo:
In the rabbit retina, there are two kinds of horizontal cells (HCs). The A-type HC is a large axonless cell which contacts cones exclusively. The B-type HC is an axon bearing cell. While the somatic dendrites of B-type HCs also contact cones, the axon expands into an elaborately branched structure, the axon terminal (AT), which contacts a large number of rods. It is difficult to label the different HCs selectively by immunochemical methods. Therefore, we developed dye injection methods to label each type of HC. Then it was possible, (1) to describe the detailed structure of the AT (2) to identify the glutamate receptors mediating cone input to A and B-type HCs and rod input to ATs and (3) to test the hypothesis that the B-type HCs are coupled via Cx57 gap junctions. ^ To obtain well filled examples of single HCs, it was necessary to block gap junction coupling to stop the spread of Neurobiotin through the network. We used dye coupling in A-type HCs to screen a series of potential gap junction antagonists. One of these compounds, meclofenamic acid (MFA), was potent, water soluble and easily reversible. This compound may be a useful tool to manipulate gap junction coupling. ^ In the presence of MFA, Neurobiotin passed down the axon of B-type HCs to reveal the detailed structure of the AT. We observed that only one AT ending entered each rod spherule invagination. This observation was confirmed by calculation and two dye injections. ^ Glutamate is the neurotransmitter used by both rods and cones. AMPA receptors were colocalized with the dendrites of A and B-type HCs at each cone pedicle. In addition, AMPA receptors were located on the AT ending at each rod spherule. Thus rod and cone input to HCs is mediated by AMPA receptors. ^ A-type and B-type HCs may express different connexins because they have different dye-coupling properties. Recently, we found that connexin50 (Cx50) is expressed by A-type HCs. B-type HCs and B-type ATs are also independently coupled. Cx57 was expressed in the OPL and double label studies showed that Cx 57 was colocalized with the AT matrix but not with the somatic dendrites of B-type HCs. ^ In summary, we have identified a useful gap junction antagonist, MFA. There is one AT ending at each rod spherule, rods inputs to ATs is mediated by AMPA receptors and coupling in the AT matrix is mediated by Cx57. This confirms that HCs with different properties use distinct connexins. The properties of ATs described in this research are consistent. The connections and properties reported here suggest that ATs functions as rod HCs and provide a negative feedback signal to rods. ^
Resumo:
Two studies among college students were conducted to evaluate appropriate measurement methods for etiological research on computing-related upper extremity musculoskeletal disorders (UEMSDs). ^ A cross-sectional study among 100 graduate students evaluated the utility of symptoms surveys (a VAS scale and 5-point Likert scale) compared with two UEMSD clinical classification systems (Gerr and Moore protocols). The two symptom measures were highly concordant (Lin's rho = 0.54; Spearman's r = 0.72); the two clinical protocols were moderately concordant (Cohen's kappa = 0.50). Sensitivity and specificity, endorsed by Youden's J statistic, did not reveal much agreement between the symptoms surveys and clinical examinations. It cannot be concluded self-report symptoms surveys can be used as surrogate for clinical examinations. ^ A pilot repeated measures study conducted among 30 undergraduate students evaluated computing exposure measurement methods. Key findings are: temporal variations in symptoms, the odds of experiencing symptoms increased with every hour of computer use (adjOR = 1.1, p < .10) and every stretch break taken (adjOR = 1.3, p < .10). When measuring posture using the Computer Use Checklist, a positive association with symptoms was observed (adjOR = 1.3, p < 0.10), while measuring posture using a modified Rapid Upper Limb Assessment produced unexpected and inconsistent associations. The findings were inconclusive in identifying an appropriate posture assessment or superior conceptualization of computer use exposure. ^ A cross-sectional study of 166 graduate students evaluated the comparability of graduate students to College Computing & Health surveys administered to undergraduate students. Fifty-five percent reported computing-related pain and functional limitations. Years of computer use in graduate school and number of years in school where weekly computer use was ≥ 10 hours were associated with pain within an hour of computing in logistic regression analyses. The findings are consistent with current literature on both undergraduate and graduate students. ^
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.