935 resultados para Genotyping


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Linkage and association studies are major analytical tools to search for susceptibility genes for complex diseases. With the availability of large collection of single nucleotide polymorphisms (SNPs) and the rapid progresses for high throughput genotyping technologies, together with the ambitious goals of the International HapMap Project, genetic markers covering the whole genome will be available for genome-wide linkage and association studies. In order not to inflate the type I error rate in performing genome-wide linkage and association studies, multiple adjustment for the significant level for each independent linkage and/or association test is required, and this has led to the suggestion of genome-wide significant cut-off as low as 5 × 10 −7. Almost no linkage and/or association study can meet such a stringent threshold by the standard statistical methods. Developing new statistics with high power is urgently needed to tackle this problem. This dissertation proposes and explores a class of novel test statistics that can be used in both population-based and family-based genetic data by employing a completely new strategy, which uses nonlinear transformation of the sample means to construct test statistics for linkage and association studies. Extensive simulation studies are used to illustrate the properties of the nonlinear test statistics. Power calculations are performed using both analytical and empirical methods. Finally, real data sets are analyzed with the nonlinear test statistics. Results show that the nonlinear test statistics have correct type I error rates, and most of the studied nonlinear test statistics have higher power than the standard chi-square test. This dissertation introduces a new idea to design novel test statistics with high power and might open new ways to mapping susceptibility genes for complex diseases. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hypertension is usually defined as having values of systolic blood pressure ≥140 mmHg, diastolic blood pressure ≥90 mmHg. Hypertension is one of the main adverse effects of glucocorticoid on the cardiovascular system. Glucocorticoids are essential hormones, secreted from adrenal glands in circadian fashion. Glucocorticoid's effect on blood pressure is conveyed by the glucocorticoid receptor (NR3C1), an omnipresent nuclear transcription factor. Although polymorphisms in this gene have long been implicated to be a causal factor for cardiovascular diseases such as hypertension, no study has yet thoroughly interrogated the gene's polymorphisms for their effect on blood pressure levels. Therefore, I have first resequenced ∼30 kb of the gene, encompassing all exons, promoter regions, 5'/3' UTRs as well as at least 1.5 kb of the gene's flanking regions from 114 chromosome 5 monosomic cell lines, comprised of three major American ethnic groups—European American, African American and Mexican American. I observed 115 polymorphisms and 14 common molecularly phased haplotypes. A subset of markers was chosen for genotyping study populations of GENOA (Genetic Epidemiology Network of Atherosclerosis; 1022 non-Hispanic whites, 1228 African Americans and 954 Mexican Americans). Since these study populations include sibships, the family-based association test was performed on 4 blood pressure-related quantitative variables—pulse, systolic blood pressure, diastolic blood pressure and mean arterial pressure. Using these analyses, multiple correlated SNPs are significantly protective against high systolic blood pressure in non-Hispanic whites, which includes rsb198, a SNP formerly associated with beneficial body compositions. Haplotype association analysis also supports this finding and all p-values remained significant after permutation tests. I therefore conclude that multiple correlated SNPs on the gene may confer protection against high blood pressure in non-Hispanic whites. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Infections caused by Methicillin-resistant Staphylococcus aureus (MRSA) have been of great concern in hospitals due the difficulty in treating virulent, antibiotic resistant microorganisms in sensitive populations including children, the elderly, and immunocomprimised individuals. Since the late 1990's, MRSA infections have become a problem in the general community, and the strains of S. aureus that cause infections in the community are known to be genetically different than the hospital acquired strains. Community-acquired strains tend to be more virulent, affecting even relatively healthy individuals, and disease presentation tends to be more diverse than diseases observed in patients suffering from hospital-acquired strains. From the year 2000 to the present, there has been a significant increase in community-acquired infections in children, a population already particularly sensitive to S. aureus infection. Genotyping the strains of CA-MRSA circulating in the pediatric population is an important step in developing better antibiotic treatment strategies. Additionally, determining the carriage status of individuals in this population and comparing these data with strain genotypes will also be valuable in establishing prevention and control practices. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cytochromes P450 catalyze a monooxygenase reaction in which molecular oxygen is split and one oxygen atom is incorporated into the substrate. As a whole, P450 researchers have focused most of their attention on substrate metabolism and relatively little on how these enzymes are regulated. This study will focus on the regulation of two P450 isoforms known as, CYP2D6 and CYP4F11. ^ The human CYP2D gene locus contains two pseudogenes and one functional gene known as CYP2D6. This locus is highly polymorphic and produces several alternatively spliced transcripts from the pseudogene CYP2D7. My objective was to understand the role of SV5-in (splice variant 5), one of several alternative splice variants transcribed from the CYP2D7 pseudogene. My results indicate that SV5-in mRNA causes an increase in CYP2D6 protein levels and suggest that there is a role for SV5-in in regulation of CYP2D6 expression. ^ Second, CYP4F11 is a recently discovered and uncharacterized isoform, derived from the CYP4F subfamily. It metabolizes several clinically relevant drugs (i.e.—erythromycin and benzphetamine) and some endogenous inflammatory mediators (i.e.—LTB4). After evaluation of microarray data, I observed an increase in CYP4F11 mRNA levels from wild-type HCT116 cells compared to p53-null cells. Our objectives were to explore and understand this connection between p53 and CYP4F11. Microarray data were confirmed by Q-PCR, after which this effect was again observed at the protein level via Western blot and again at the promoter level via luciferase assay and chromatin immunoprecipitation. Our results indicate that p53 protein regulates expression of CYP4F11 mRNA and protein through CYP4F11 promoter binding (note that p53 binding to CYP4F11 DNA was not shown to be direct). These results signify a whole new level of regulation of drug metabolizing enzymes by p53. ^ An understanding of CYP4F11 regulation by p53 could help us understand another pathway leading to apoptosis or cell growth arrest. This can aid future drug studies and discover new drug metabolism pathways under the control of a tumor suppressor protein. An understanding of the CYP2D6 regulation pathway could illuminate the role of non-coding RNAs in the P450 field and potentially explain several inter-individual drug response variations observed in clinical medicine that are not yet completely explained by genotyping analysis. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Histo-blood group antigens (HBGAs) have been associated with susceptibility to enteric pathogens including noroviruses (NoVs), enterotoxigenic Escherichia coli (ETEC), Campylobacter jejuni, and Vibrio cholerae. We performed a retrospective cohort study to evaluate the relationship between traveler HBGA phenotypes and susceptibility to travelers' diarrhea (TD) and post-infectious complications. 364 travelers to Guadalajara, Mexico were followed prospectively from June 1 - September 30, 2007 and from June 7–July 28, 2008 for the development of TD and at 6 months for post-infectious irritable bowel syndrome (PIIBS). Noroviruses were detected from illness stool specimens with RT-PCR. Diarrheal stool samples were also assayed for enterotoxigenic and enteroaggregative E. coli, Salmonella species, Shigella species, Vibrio species, Campylobacter jejuni, Yersinia enterocolitica, Aeromonas species, and Plesiomonas species. Diarrheal stools were evaluated for inflammation with fecal leukocytes, mucus, and occult blood. Phenotyping for ABO and Lewis antigens with an ELISA assay and FUT2 gene PCR genotyping for secretor status were performed with saliva. 171 of 364 (47%) subjects developed TD. HBGA typing for the travelers revealed O (62.9%), A (34.6%), B (1.6%), and AB (0.8%) phenotypes. There were 7% nonsecretors and 93% secretors among the travelers. AB phenotypes were more commonly associated with Cryptosporidium species (P=0.04) and ETEC ( P=0.08) as causes of TD. AB and B phenotype individuals were more likely to experience inflammatory diarrhea, particularly mucoid diarrhea ( P=0.02). However, there were relatively few individuals with AB and B phenotypes. GI and GII NoV and Cryptosporidium species infections and PI-IBS were identified only in secretors, but these differences were not statistically significant, (P=1.00), (P=1.00), and (P=0.60), respectively. Additional studies are needed to evaluate whether AB phenotype individuals may be more susceptible to developing TD associated with Cryptosporidium species or ETEC, and whether AB and B phenotype individuals may be more likely to develop inflammatory TD. Further studies are needed to investigate whether nonsecretor travelers may be at less risk for developing infections with NoVs and Cryptosporidium species and PI-IBS.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Neuropsychological impairment occurs in 20%-40% of childhood acute lymphoblastic leukemia (ALL) survivors, possibly mediated by folate depletion following methotrexate chemotherapy. We evaluated the relationship between two folate pathway polymorphisms and neuropsychological impairment after childhood ALL chemotherapy. Eighty-six childhood ALL survivors were recruited between 2004-2007 at Texas Children's Hospital after exclusion for central nervous system leukemia, cranial irradiation, and age<1 year at diagnosis. Neuropsychological evaluation at a median of 5.3 years off therapy included a parental questionnaire and the following child performance measures: Trail Making Tests A and B, Grooved Pegboard Test Dominant-Hand and Nondominant-Hand, and Digit Span subtest. We performed genotyping for polymorphisms in two folate pathway genes: reduced folate carrier (RFC1 80G>A, rs1051266) and dihydrofolate reductase (DHFR Intron-1 19bp deletion). Fisher exact test, logistic regression, Student's t-test, and ANOVA were used to compare neuropsychological test scores by genotype, using a dominant model to group genotypes. In univariate analysis, survivors with cumulative methotrexate exposure ≥9000 mg/m2 had an increased risk of attention disorder (OR=6.2, 95% CI 1.2 – 31.3), compared to survivors with methotrexate exposure <9000 mg/m2. On average, female survivors scored 8.5 points higher than males on the Digit Span subtest, a test of working memory (p=0.02). The RFC1 80G>A and DHFR Intron-1 deletion polymorphisms were not related to attention disorder or impairment on tests of attention, processing speed, fine motor speed, or memory. These data imply a strong relationship between methotrexate dose intensity and impairment in attention after childhood ALL therapy. We did not find an association between the RFC1 80G>A or DHFR Intron-1 deletion polymorphisms and long-term neuropsychological impairment in childhood ALL survivors.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). The quality of the inferences about copy number can be affected by many factors including batch effects, DNA sample preparation, signal processing, and analytical approach. Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP genotyping data. However, these algorithms lack specificity to detect small CNVs due to the high false positive rate when calling CNVs based on the intensity values. Association tests based on detected CNVs therefore lack power even if the CNVs affecting disease risk are common. In this research, by combining an existing Hidden Markov Model (HMM) and the logistic regression model, a new genome-wide logistic regression algorithm was developed to detect CNV associations with diseases. We showed that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than an existing popular algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genome-Wide Association Study analytical (GWAS) methods were applied in a large biracial sample of individuals to investigate variation across the genome for its association with a surrogate low-density lipoprotein (LDL) particle size phenotype, the ratio of LDL-cholesterol level over ApoB level. Genotyping was performed on the Affymetrix 6.0 GeneChip with approximately one million single nucleotide polymorphisms (SNPs). The ratio of LDL cholesterol to ApoB was calculated, and association tests used multivariable linear regression analysis with an additive genetic model after adjustment for the covariates sex, age and BMI. Association tests were performed separately in African Americans and Caucasians. There were 9,562 qualified individuals in the Caucasian group and 3,015 qualified individuals in the African American group. Overall, in Caucasians two statistically significant loci were identified as being associated with the ratio of LDL-cholesterol over ApoB: rs10488699 (p<5 x10-8, 11q23.3 near BUD13) and the SNP rs964184 (p<5 x10-8 11q23.3 near ZNF259). We also found rs12286037 ((p<4x10-7) (11q23.3) near APOA5/A4/C3/A1 with suggestive associate in the Caucasian sample. In exploratory analyses, a difference in the pattern of association between individuals taking and not taking LDL-cholesterol lowering medications was observed. Individuals who were not taking medications had smaller p-value than those taking medication. In the African-American group, there were no significant (p<5x10-8) or suggestive associations (p<4x10-7) with the ratio of LDL-cholesterol over ApoB after adjusting for age, BMI, and sex and comparing individuals with and without LDL-cholesterol lowering medication. Conclusions: There were significant and suggestive associations between SNP genotype and the ratio of LDL-cholesterol to ApoB in Caucasians, but these associations may be modified by medication treatment.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Lung cancer is the leading cause of cancer-related mortality in the US. Emerging evidence has shown that host genetic factors can interact with environmental exposures to influence patient susceptibility to the diseases as well as clinical outcomes, such as survival and recurrence. We aimed to identify genetic prognostic markers for non-small cell lung cancer (NSCLC), a major (85%) subtype of lung cancer, and also in other subgroups. With the fast evolution of genotyping technology, genetic association studies have went through candidate gene approach, to pathway-based approach, to the genome wide association study (GWAS). Even in the era of GWAS, pathway-based approach has its own advantages on studying cancer clinical outcomes: it is cost-effective, requiring a smaller sample size than GWAS easier to identify a validation population and explore gene-gene interactions. In the current study, we adopted pathway-based approach focusing on two critical pathways - miRNA and inflammation pathways. MicroRNAs (miRNA) post-transcriptionally regulate around 30% of human genes. Polymorphisms within miRNA processing pathways and binding sites may influence patients’ prognosis through altered gene regulation. Inflammation plays an important role in cancer initiation and progression, and also has shown to impact patients’ clinical outcomes. We first evaluated 240 single nucleotide polymorphisms (SNPs) in miRNA biogenesis genes and predicted binding sites in NSCLC patients to determine associations with clinical outcomes in early-stage (stage I and II) and late-stage (stage III and IV) lung cancer patients, respectively. First, in 535 early-stage patients, after correcting multiple comparisons, FZD4:rs713065 (hazard ratio [HR]:0.46, 95% confidence interval [CI]:0.32-0.65) showed a significant inverse association with survival in early stage surgery-only patients. SP1:rs17695156 (HR:2.22, 95% CI:1.44-3.41) and DROSHA:rs6886834 (HR:6.38, 95% CI:2.49-16.31) conferred increased risk of progression in the all patients and surgery-only populations, respectively. FAS:rs2234978 was significantly associated with improved survival in all patients (HR:0.59, 95% CI:0.44-0.77) and in the surgery plus chemotherapy populations (HR:0.19, 95% CI:0.07-0.46).. Functional genomics analysis demonstrated that this variant creates a miR-651 binding site resulting in altered miRNA regulation of FAS, providing biological plausibility for the observed association. We then analyzed these associations in 598 late-stage patients. After multiple comparison corrections, no SNPs remained significant in the late stage group, while the top SNP NAT1:rs15561 (HR=1.98, 96%CI=1.32-2.94) conferred a significantly increased risk of death in the chemotherapy subgroup. To test the hypothesis that genetic variants in the inflammation-related pathways may be associated with survival in NSCLC patients, we first conducted a three-stage study. In the discovery phase, we investigated a comprehensive panel of 11,930 inflammation-related SNPs in three independent lung cancer populations. A missense SNP (rs2071554) in HLA-DOB was significantly associated with poor survival in the discovery population (HR: 1.46, 95% CI: 1.02-2.09), internal validation population (HR: 1.51, 95% CI: 1.02-2.25), and external validation (HR: 1.52, 95% CI: 1.01-2.29) population. Rs2900420 in KLRK1 was significantly associated with a reduced risk for death in the discovery (HR: 0.76, 95% CI: 0.60-0.96) and internal validation (HR: 0.77, 95% CI: 0.61-0.99) populations, and the association reached borderline significance in the external validation population (HR: 0.80, 95% CI: 0.63-1.02). We also evaluated these inflammation-related SNPs in NSCLC patients in never smokers. Lung cancer in never smokers has been increasingly recognized as distinct disease from that in ever-smokers. A two-stage study was performed using a discovery population from MD Anderson (411 patients) and a validation population from Mayo Clinic (311 patients). Three SNPs (IL17RA:rs879576, BMP8A:rs698141, and STK:rs290229) that were significantly associated with survival were validated (pCD74:rs1056400 and CD38:rs10805347) were borderline significant (p=0.08) in the Mayo Clinic population. In the combined analysis, IL17RA:rs879576 resulted in a 40% reduction in the risk for death (p=4.1 × 10-5 [p=0.61, heterogeneity test]). We also validated a survival tree created in MD Anderson population in the Mayo Clinic population. In conclusion, our results provided strong evidence that genetic variations in specific pathways that examined (miRNA and inflammation pathways) influenced clinical outcomes in NSCLC patients, and with further functional studies, the novel loci have potential to be translated into clinical use.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The interplay between obesity, physical activity, weight gain and genetic variants in mTOR pathway have not been studied in renal cell carcinoma (RCC). We examined the associations between obesity, weight gain, physical activity and RCC risk. We also analyzed whether genetic variants in the mTOR pathway could modify the association. Incident renal cell carcinoma cases and healthy controls were recruited from the University of Texas MD Anderson Cancer Center in Houston, Texas. Cases and controls were frequency-matched by age (±5 years), ethnicity, sex, and county of residence. Epidemiologic data were collected via in-person interview. A total of 577 cases and 593 healthy controls (all white) were included. One hundred ninety-two (192) SNPs from 22 genes were available and their genotyping data were extracted from previous genome-wide association studies. Logistic regression and regression spline were performed to obtain odds ratios. Obesity at age 20, 40, and 3 years prior to diagnosis/recruitment, and moderate and large weight gain from age 20 to 40 were each significantly associated with increased RCC risk. Low physical activity was associated with a 4.08-fold (95% CI: 2.92-5.70) increased risk. Five single nucleotide polymorphisms (SNPs) were significantly associated with RCC risk and their cumulative effect increased the risk by up to 72% (95% CI: 1.20-2.46). Strata specific effects for weight change and genotyping cumulative groups were observed. However, no interaction was suggested by our study. In conclusion, energy balance related risk factors and genetic variants in the mTOR pathway may jointly influence susceptibility to RCC. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Pumas are one of the most studied terrestrial mammals because of their widespread distribution, substantial ecological impacts, and conflicts with humans. Extensive efforts, often employing genetic methods, are undertaken to manage this species. However, the comparison of population genetic data is difficult because few of the microsatellite loci chosen are shared across research programs. Here, we describe the development of PumaPlex, a high-throughput assay to genotype 25 single nucleotide polymorphisms in pumas. We validated PumaPlex in more than 700 North American pumas (Puma concolor couguar), and demonstrated its ability to generate reproducible genotypes and accurately identify individuals. Furthermore, we compared PumaPlex with traditional genotyping of 12 microsatellite loci in fecal DNA samples and found that PumaPlex produced significantly more genotypes with fewer false alleles. PumaPlex promotes the cross-laboratory comparison of genotypes, is easily expandable in the future, and is a valuable tool for the genetic monitoring and management of North American puma populations.