12 resultados para causal discovery
em DigitalCommons@The Texas Medical Center
Resumo:
Nonsyndromic cleft lip with or without cleft palate (NSCLP), a common, complex orofacial birth defect that affects approximately 4,000 newborns each year in the United States, is caused by both genetic and environmental factors. Orofacial clefts affect the mouth and nose, causing severe deformity of the face, which require medical, dental and speech therapies. Despite having substantial genetic liability, less than 25% of the genetic contribute to NSCLP has been identified. The studies described in this thesis were performed to identify genes that contribute to NSCLP and to demonstrate the role of these genes in normal craniofacial development. Using genome scan and candidate gene approaches, novel associations with NSCLP were identified. These include MYH9 (7 SNPs, 0.009≤p<0.05), Wnt3A (4 SNPs, 0.001≤p≤0.005), Wnt11 (2 SNPs, 0.001≤p≤0.01) and CRISPLD2 (4 SNPs, 0.001≤p<0.05). The most interesting findings were for CRISPLD2. This gene is expressed in the fused mouse palate at E17.5. In zebrafish, crispld2 localized to the craniofacial region by one day post fertilization. Morpholino knockdown of crispld2 resulted in a lower survival rates and altered neural crest cell (NCC) clustering. Because NCCs form the tissues that populate the craniofacies, this NCC abnormality resulted in cartilage abnormalities of the jaw including fewer ceratobranchial cartilages forming the lower jaw (three pairs compared to five) and broader craniofacies compared to wild-type zebrafish. These findings suggest that the CRISPLD2 gene plays an important role in normal craniofacial development and perturbation of this gene in humans contributes to orofacial clefting. Overall, these results are important because they contribute to our understanding of normal craniofacial development and orofacial clefting etiology, information that can be used to develop better methods to diagnose, counsel and potentially treat NSCLP patients.
Resumo:
Following up genetic linkage studies to identify the underlying susceptibility gene(s) for complex disease traits is an arduous yet biologically and clinically important task. Complex traits, such as hypertension, are considered polygenic with many genes influencing risk, each with small effects. Chromosome 2 has been consistently identified as a genomic region with genetic linkage evidence suggesting that one or more loci contribute to blood pressure levels and hypertension status. Using combined positional candidate gene methods, the Family Blood Pressure Program has concentrated efforts in investigating this region of chromosome 2 in an effort to identify underlying candidate hypertension susceptibility gene(s). Initial informatics efforts identified the boundaries of the region and the known genes within it. A total of 82 polymorphic sites in eight positional candidate genes were genotyped in a large hypothesis-generating sample consisting of 1640 African Americans, 1339 whites, and 1616 Mexican Americans. To adjust for multiple comparisons, resampling-based false discovery adjustment was applied, extending traditional resampling methods to sibship samples. Following this adjustment for multiple comparisons, SLC4A5, a sodium bicarbonate transporter, was identified as a primary candidate gene for hypertension. Polymorphisms in SLC4A5 were subsequently genotyped and analyzed for validation in two populations of African Americans (N = 461; N = 778) and two of whites (N = 550; N = 967). Again, SNPs within SLC4A5 were significantly associated with blood pressure levels and hypertension status. While not identifying a single causal DNA sequence variation that is significantly associated with blood pressure levels and hypertension status across all samples, the results further implicate SLC4A5 as a candidate hypertension susceptibility gene, validating previous evidence for one or more genes on chromosome 2 that influence hypertension related phenotypes in the population-at-large. The methodology and results reported provide a case study of one approach for following up the results of genetic linkage analyses to identify genes influencing complex traits. ^
Resumo:
Few studies have investigated causal pathways linking psychosocial factors to each other and to screening mammography. Conflicting hypotheses exist in the theoretic literature regarding the role and importance of subjective norms, a person's perceived social pressure to perform the behavior and his/her motivation to comply. The Theory of Reasoned Action (TRA) hypothesizes that subjective norms directly affect intention; while the Transtheoretical Model (TTM) hypothesizes that attitudes mediate the influence of subjective norms on stage of change. No one has examined which hypothesis best predicts the effect of subjective norms on mammography intention and stage of change. Two statistical methods are available for testing mediation, sequential regression analysis (SRA) and latent variable structural equation modeling (LVSEM); however, software to apply LVSEM to dichotomous variables like intention has only recently become available. No one has compared the methods to determine whether or not they yield similar results for dichotomous variables. ^ Study objectives were to: (1) determine whether the effect of subjective norms on mammography intention and stage of change are mediated by pros and cons; and (2) compare mediation results from the SRA and LVSEM approaches when the outcome is dichotomous. We conducted a secondary analysis of data from a national sample of women veterans enrolled in Project H.O.M.E. (H&barbelow;ealthy O&barbelow;utlook on the M&barbelow;ammography E&barbelow;xperience), a behavioral intervention trial. ^ Results showed that the TTM model described the causal pathways better than the TRA one; however, we found support for only one of the TTM causal mechanisms. Cons was the sole mediator. The mediated effect of subjective norms on intention and stage of change by cons was very small. These findings suggest that interventionists focus their efforts on reducing negative attitudes toward mammography when resources are limited. ^ Both the SRA and LVSEM methods provided evidence for complete mediation, and the direction, magnitude, and standard errors of the parameter estimates were very similar. Because SRA parameter estimates were not biased toward the null, we can probably assume negligible measurement error in the independent and mediator variables. Simulation studies are needed to further our understanding of how these two methods perform under different data conditions. ^
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Chromatin, composed of repeating nucleosome units, is the genetic polymer of life. To aid in DNA compaction and organized storage, the double helix wraps around a core complex of histone proteins to form the nucleosome, and is therefore no longer freely accessible to cellular proteins for the processes of transcription, replication and DNA repair. Over the course of evolution, DNA-based applications have developed routes to access DNA bound up in chromatin, and further, have actually utilized the chromatin structure to create another level of complexity and information storage. The histone molecules that DNA surrounds have free-floating tails that extend out of the nucleosome. These tails are post-translationally modified to create docking sites for the proteins involved in transcription, replication and repair, thus providing one prominent way that specific genomic sequences are accessed and manipulated. Adding another degree of information storage, histone tail-modifications paint the genome in precise manners to influence a state of transcriptional activity or repression, to generate euchromatin, containing gene-dense regions, or heterochromatin, containing repeat sequences and low-density gene regions. The work presented here is the study of histone tail modifications, how they are written and how they are read, divided into two projects. Both begin with protein microarray experiments where we discover the protein domains that can bind modified histone tails, and how multiple tail modifications can influence this binding. Project one then looks deeper into the enzymes that lay down the tail modifications. Specifically, we studied histone-tail arginine methylation by PRMT6. We found that methylation of a specific histone residue by PRMT6, arginine 2 of H3, can antagonize the binding of protein domains to the H3 tail and therefore affect transcription of genes regulated by the H3-tail binding proteins. Project two focuses on a protein we identified to bind modified histone tails, PHF20, and was an endeavor to discover the biological role of this protein. Thus, in total, we are looking at a complete process: (1) histone tail modification by an enzyme (here, PRMT6), (2) how this and other modifications are bound by conserved protein domains, and (3) by using PHF20 as an example, the functional outcome of binding through investigating the biological role of a chromatin reader. ^
Resumo:
Planning and providing health care services for the elderly represents a major challenge to the health care system. One part of that challenge is the identification of those factors which determine the utilization of services by this population. The purpose of this study is to explain the use of health care services by elderly subscribers in a prepaid group health plan, using the theoretical framework developed by Andersen and Aday. The impact of the predisposing, enabling and need factors on utilization was modelled through a structural equation approach using LISREL. The data were derived from Kaiser-Permanente's Medicare Prospective Payment Project, August 1980-December 1982. Need factors, in general, were the most significant determinants of utilization, with the predisposing and enabling factors found to be secondary but necessary links in the causal chain. The model was fitted to the data from the youngest age group (65-74 years) and then evaluated for goodness of fit in the two older groups (75-84 and 85+ years). Implications of the study's findings and suggestions for further modelling the utilization behavior of the elderly are discussed. ^
Resumo:
Pathway based genome wide association study evolves from pathway analysis for microarray gene expression and is under rapid development as a complementary for single-SNP based genome wide association study. However, it faces new challenges, such as the summarization of SNP statistics to pathway statistics. The current study applies the ridge regularized Kernel Sliced Inverse Regression (KSIR) to achieve dimension reduction and compared this method to the other two widely used methods, the minimal-p-value (minP) approach of assigning the best test statistics of all SNPs in each pathway as the statistics of the pathway and the principal component analysis (PCA) method of utilizing PCA to calculate the principal components of each pathway. Comparison of the three methods using simulated datasets consisting of 500 cases, 500 controls and100 SNPs demonstrated that KSIR method outperformed the other two methods in terms of causal pathway ranking and the statistical power. PCA method showed similar performance as the minP method. KSIR method also showed a better performance over the other two methods in analyzing a real dataset, the WTCCC Ulcerative Colitis dataset consisting of 1762 cases, 3773 controls as the discovery cohort and 591 cases, 1639 controls as the replication cohort. Several immune and non-immune pathways relevant to ulcerative colitis were identified by these methods. Results from the current study provided a reference for further methodology development and identified novel pathways that may be of importance to the development of ulcerative colitis.^
Resumo:
This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^
Resumo:
Development of homology modeling methods will remain an area of active research. These methods aim to develop and model increasingly accurate three-dimensional structures of yet uncrystallized therapeutically relevant proteins e.g. Class A G-Protein Coupled Receptors. Incorporating protein flexibility is one way to achieve this goal. Here, I will discuss the enhancement and validation of the ligand-steered modeling, originally developed by Dr. Claudio Cavasotto, via cross modeling of the newly crystallized GPCR structures. This method uses known ligands and known experimental information to optimize relevant protein binding sites by incorporating protein flexibility. The ligand-steered models were able to model, reasonably reproduce binding sites and the co-crystallized native ligand poses of the β2 adrenergic and Adenosine 2A receptors using a single template structure. They also performed better than the choice of template, and crude models in a small scale high-throughput docking experiments and compound selectivity studies. Next, the application of this method to develop high-quality homology models of Cannabinoid Receptor 2, an emerging non-psychotic pain management target, is discussed. These models were validated by their ability to rationalize structure activity relationship data of two, inverse agonist and agonist, series of compounds. The method was also applied to improve the virtual screening performance of the β2 adrenergic crystal structure by optimizing the binding site using β2 specific compounds. These results show the feasibility of optimizing only the pharmacologically relevant protein binding sites and applicability to structure-based drug design projects.