Biblioteca Digital

19 resultados para Linear analysis

em DigitalCommons@The Texas Medical Center

Bayesian generalized linear models for meta-analysis of diagnostic tests

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^

Analysis of life expectancy using linear and nonlinear models

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Life expectancy has consistently increased over the last 150 years due to improvements in nutrition, medicine, and public health. Several studies found that in many developed countries, life expectancy continued to rise following a nearly linear trend, which was contrary to a common belief that the rate of improvement in life expectancy would decelerate and was fit with an S-shaped curve. Using samples of countries that exhibited a wide range of economic development levels, we explored the change in life expectancy over time by employing both nonlinear and linear models. We then observed if there were any significant differences in estimates between linear models, assuming an auto-correlated error structure. When data did not have a sigmoidal shape, nonlinear growth models sometimes failed to provide meaningful parameter estimates. The existence of an inflection point and asymptotes in the growth models made them inflexible with life expectancy data. In linear models, there was no significant difference in the life expectancy growth rate and future estimates between ordinary least squares (OLS) and generalized least squares (GLS). However, the generalized least squares model was more robust because the data involved time-series variables and residuals were positively correlated. ^

A cladistic analysis of apolipoprotein B gene variation and plasma lipid and apolipoprotein B levels, using familial data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Any functionally important mutation is embedded in an evolutionary matrix of other mutations. Cladistic analysis, based on this, is a method of investigating gene effects using a haplotype phylogeny to define a set of tests which localize causal mutations to branches of the phylogeny. Previous implementations of cladistic analysis have not addressed the issue of analyzing data from related individuals, though in human studies, family data are usually needed to obtain unambiguous haplotypes. In this study, a method of cladistic analysis is described in which haplotype effects are parameterized in a linear model which accounts for familial correlations. The method was used to study the effect of apolipoprotein (Apo) B gene variation on total-, LDL-, and HDL-cholesterol, triglyceride, and Apo B levels in 121 French families. Five polymorphisms defined Apo B haplotypes: the signal peptide Insertion/deletion, Bsp 1286I, XbaI, MspI, and EcoRI. Eleven haplotypes were found, and a haplotype phylogeny was constructed and used to define a set of tests of haplotype effects on lipid and apo B levels.^ This new method of cladistic analysis, the parametric method, found significant effects for single haplotypes for all variables. For HDL-cholesterol, 3 clusters of evolutionarily-related haplotypes affecting levels were found. Haplotype effects accounted for about 10% of the genetic variance of triglyceride and HDL-cholesterol levels. The results of the parametric method were compared to those of a method of cladistic analysis based on permutational testing. The permutational method detected fewer haplotype effects, even when modified to account for correlations within families. Simulation studies exploring these differences found evidence of systematic errors in the permutational method due to the process by which haplotype groups were selected for testing.^ The applicability of cladistic analysis to human data was shown. The parametric method is suggested as an improvement over the permutational method. This study has identified candidate haplotypes for sequence comparisons in order to locate the functional mutations in the Apo B gene which may influence plasma lipid levels. ^

A comparison of Markov and generalized linear models of hospital mortality

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper reports a comparison of three modeling strategies for the analysis of hospital mortality in a sample of general medicine inpatients in a Department of Veterans Affairs medical center. Logistic regression, a Markov chain model, and longitudinal logistic regression were evaluated on predictive performance as measured by the c-index and on accuracy of expected numbers of deaths compared to observed. The logistic regression used patient information collected at admission; the Markov model was comprised of two absorbing states for discharge and death and three transient states reflecting increasing severity of illness as measured by laboratory data collected during the hospital stay; longitudinal regression employed Generalized Estimating Equations (GEE) to model covariance structure for the repeated binary outcome. Results showed that the logistic regression predicted hospital mortality as well as the alternative methods but was limited in scope of application. The Markov chain provides insights into how day to day changes of illness severity lead to discharge or death. The longitudinal logistic regression showed that increasing illness trajectory is associated with hospital mortality. The conclusion is reached that for standard applications in modeling hospital mortality, logistic regression is adequate, but for new challenges facing health services research today, alternative methods are equally predictive, practical, and can provide new insights. ^

A cross-sectional analysis of physical activity and obesity in inner city Houston Texas Mexican American youth

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Obesity prevalence among children and adolescents is rising. It is one of the most attributable causes of hospitalization and death. Overweight and obese children are more likely to suffer from associated conditions such as hypertension, dyslipidemia, chronic inflammation, increased blood clotting tendency, endothelial dysfunction, hyperinsulinemia, and asthma. These children and adolescents are also more likely to be overweight and obese in adulthood. Interestingly, rates of obesity and overweight are not evenly distributed across racial and ethnic groups. Mexican American youth have higher rates of obesity and are at higher risk of becoming obese than non-Hispanic black and non-Hispanic white children. ^ Methods. This cross-sectional study describes the association between rates of obesity and physical activity in a sample of 1313 inner-city Mexican American children and adolescents (5-19 years of age) in Houston, Texas. This study is important because it will contribute to our understanding of childhood and adolescent obesity in this at-risk population. ^ Data from the Mexican American Feasibility Cohort using the Mano a Mano questionnaire are used to describe this population's status of obesity and physical activity. An initial sample taken from 5000 households in inner city Houston Texas was used as the baseline for this prospective cohort. The questionnaire was given in person to the participants to complete (or to parents for younger children) at a home visit by two specially trained bilingual interviewers. Analysis comprised prevalence estimates of obesity represented as percentile rank (<85%= normal weight, >85%= at risk, >95%= obese) by age and gender. The association between light, moderate, strenuous activity, and obesity was also examined using linear regression. ^ Results. Overall, 46% of this Mexican American Feasibility cohort is overweight or obese. The prevalence for children in the 6-11 age range (53.2%) was significantly greater than that reported from NHANES, 1999–2002 data (39.4%). Although the percentage of overweight and obese among the 12-19 year olds was greater than that reported in NHANES (38.5% versus 38.6%) this difference was not statistically significant. ^ A significant association between BMI and sit time and moderate physical activity (both p < 0.05) found in this sample. For males, this association was significant for moderate physical activity (p < 0.01). For the females, this association was significant for BMI and sit time (p < 0.05). These results need to be interpreted in the light of design and measurement limitations. ^ Conclusion. This study supports observations that the inner city Houston Texas Mexican American child and adolescent population is more overweight and obese than nationally reported figures, and that there are positive relationships between BMI, activity levels, and sit time in this population. This study supports the need for public health initiatives within the Houston Hispanic community. ^

A space-time analysis of reported shigellosis and salmonellosis cases in Texas from 2000 to 2004

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background. The purpose of this study was to describe the risk factors and demographics of persons with salmonellosis and shigellosis and to investigate both seasonal and spatial variations in the occurrence of these infections in Texas from 2000 to 2004, utilizing time series analyses and the geographic information system digital mapping methods. ^ Methods. Spatial Analysis: MapInfo software was used to map the distribution of age-adjusted rates of reported shigellosis and salmonellosis in Texas from 2000–2004 by zip codes. Census data on above or below poverty level, household income, highest level of educational attainment, race, ethnicity, and urban/rural community status was obtained from the 2000 Decennial Census for each zip code. The zip codes with the upper 10% and lower 10% were compared using t-tests and logistic regression to determine whether there were any potential risk factors. ^ Temporal analysis. Seasonal patterns in the prevalence of infections in Texas from 2000 to 2003 were determined by performing time-series analysis on the numbers of cases of salmonellosis and shigellosis. A linear regression was also performed to assess for trends in the incidence of each disease, along with auto-correlation and multi-component cosinor analysis. ^ Results. Spatial analysis: Analysis by general linear model showed a significant association between infection rates and age, with young children aged less than 5 and those aged 5–9 years having increased risk of infection for both disease conditions. The data demonstrated that those populations with high percentages of people who attained a higher than high school education were less likely to be represented in zip codes with high rates of shigellosis. However, for salmonellosis, logistic regression models indicated that when compared to populations with high percentages of non-high school graduates, having a high school diploma or equivalent increased the odds of having a high rate of infection. ^ Temporal analysis. For shigellosis, multi-component cosinor analyses were used to determine the approximated cosine curve which represented a statistically significant representation of the time series data for all age groups by sex. The shigellosis results show 2 peaks, with a major peak occurring in June and a secondary peak appearing around October. Salmonellosis results showed a single peak and trough in all age groups with the peak occurring in August and the trough occurring in February. ^ Conclusion. The results from this study can be used by public health agencies to determine the timing of public health awareness programs and interventions in order to prevent salmonellosis and shigellosis from occurring. Because young children depend on adults for their meals, it is important to increase the awareness of day-care workers and new parents about modes of transmission and hygienic methods of food preparation and storage. ^

Analysis of prognostic factors for the development of metastases and survival in renal cell carcinoma patients

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction and objective. A number of prognostic factors have been reported for predicting survival in patients with renal cell carcinoma. Yet few studies have analyzed the effects of those factors at different stages of the disease process. In this study, different stages of disease progression starting from nephrectomy to metastasis, from metastasis to death, and from evaluation to death were evaluated. ^ Methods. In this retrospective follow-up study, records of 97 deceased renal cell carcinoma (RCC) patients were reviewed between September 2006 to October 2006. Patients with TNM Stage IV disease before nephrectomy or with cancer diagnoses other than RCC were excluded leaving 64 records for analysis. Patient TNM staging, Furhman Grade, age, tumor size, tumor volume, histology and patient gender were analyzed in relation to time to metastases. Time from nephrectomy to metastasis, TNM staging, Furhman Grade, age, tumor size, tumor volume, histology and patient gender were tested for significance in relation to time from metastases to death. Finally, analysis of laboratory values at time of evaluation, Eastern Cooperative Oncology Group performance status (ECOG), UCLA Integrated Staging System (UISS), time from nephrectomy to metastasis, TNM staging, Furhman Grade, age, tumor size, tumor volume, histology and patient gender were tested for significance in relation to time from evaluation to death. Linear regression and Cox Proportional Hazard (univariate and multivariate) was used for testing significance. Kaplan-Meier Log-Rank test was used to detect any significance between groups at various endpoints. ^ Results. Compared to negative lymph nodes at time of nephrectomy, a single positive lymph node had significantly shorter time to metastasis (p<0.0001). Compared to other histological types, clear cell histology had significant metastasis free survival (p=0.003). Clear cell histology compared to other types (p=0.0002 univariate, p=0.038 multivariate) and time to metastasis with log conversion (p=0.028) significantly affected time from metastasis to death. A greater than one year and greater than two year metastasis free interval, compared to patients that had metastasis before one and two years, had statistically significant survival benefit (p=0.004 and p=0.0318). Time from evaluation to death was affected by greater than one year metastasis free interval (p=0.0459), alcohol consumption (p=0.044), LDH (p=0.006), ECOG performance status (p<0.001), and hemoglobin level (p=0.0092). The UISS risk stratified the patient population in a statistically significant manner for survival (p=0.001). No other factors were found to be significant. ^ Conclusion. Clear cell histology is predictive for both time to metastasis and metastasis to death. Nodal status at time of nephrectomy may predict risk of metastasis. The time interval to metastasis significantly predicts time from metastasis to death and time from evaluation to death. ECOG performance status, and hemoglobin levels predicts survival outcome at evaluation. Finally, UISS appropriately stratifies risk in our population. ^

Evaluation of therapist and client language in Motivational Interviewing (MI) sessions: A secondary analysis of data from the Southern Methodist Alcohol Research Trial (SMART) study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the United States, “binge” drinking among college students is an emerging public health concern due to the significant physical and psychological effects on young adults. The focus is on identifying interventions that can help decrease high-risk drinking behavior among this group of drinkers. One such intervention is Motivational interviewing (MI), a client-centered therapy that aims at resolving client ambivalence by developing discrepancy and engaging the client in change talk. Of late, there is a growing interest in determining the active ingredients that influence the alliance between the therapist and the client. This study is a secondary analysis of the data obtained from the Southern Methodist Alcohol Research Trial (SMART) project, a dismantling trial of MI and feedback among heavy drinking college students. The present project examines the relationship between therapist and client language in MI sessions on a sample of “binge” drinking college students. Of the 126 SMART tapes, 30 tapes (‘MI with feedback’ group = 15, ‘MI only’ group = 15) were randomly selected for this study. MISC 2.1, a mutually exclusive and exhaustive coding system, was used to code the audio/videotaped MI sessions. Therapist and client language were analyzed for communication characteristics. Overall, therapists adopted a MI consistent style and clients were found to engage in change talk. Counselor acceptance, empathy, spirit, and complex reflections were all significantly related to client change talk (p-values ranged from 0.001 to 0.047). Additionally, therapist ‘advice without permission’ and MI Inconsistent therapist behaviors were strongly correlated with client sustain talk (p-values ranged from 0.006 to 0.048). Simple linear regression models showed a significant correlation between MI consistent (MICO) therapist language (independent variable) and change talk (dependent variable) and MI inconsistent (MIIN) therapist language (independent variable) and sustain talk (dependent variable). The study has several limitations such as small sample size, self-selection bias, poor inter-rater reliability for the global scales and the lack of a temporal measure of therapist and client language. Future studies might consider a larger sample size to obtain more statistical power. In addition the correlation between therapist language, client language and drinking outcome needs to be explored.^

Functional data analysis approaches for genotype-phenotype association studies from next-generation sequencing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

A novel method for evaluating an interaction effect of correlated continuous covariates in a linear model

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interaction effect is an important scientific interest for many areas of research. Common approach for investigating the interaction effect of two continuous covariates on a response variable is through a cross-product term in multiple linear regression. In epidemiological studies, the two-way analysis of variance (ANOVA) type of method has also been utilized to examine the interaction effect by replacing the continuous covariates with their discretized levels. However, the implications of model assumptions of either approach have not been examined and the statistical validation has only focused on the general method, not specifically for the interaction effect.^ In this dissertation, we investigated the validity of both approaches based on the mathematical assumptions for non-skewed data. We showed that linear regression may not be an appropriate model when the interaction effect exists because it implies a highly skewed distribution for the response variable. We also showed that the normality and constant variance assumptions required by ANOVA are not satisfied in the model where the continuous covariates are replaced with their discretized levels. Therefore, naïve application of ANOVA method may lead to an incorrect conclusion. ^ Given the problems identified above, we proposed a novel method modifying from the traditional ANOVA approach to rigorously evaluate the interaction effect. The analytical expression of the interaction effect was derived based on the conditional distribution of the response variable given the discretized continuous covariates. A testing procedure that combines the p-values from each level of the discretized covariates was developed to test the overall significance of the interaction effect. According to the simulation study, the proposed method is more powerful then the least squares regression and the ANOVA method in detecting the interaction effect when data comes from a trivariate normal distribution. The proposed method was applied to a dataset from the National Institute of Neurological Disorders and Stroke (NINDS) tissue plasminogen activator (t-PA) stroke trial, and baseline age-by-weight interaction effect was found significant in predicting the change from baseline in NIHSS at Month-3 among patients received t-PA therapy.^

A descriptive and exploratory analysis of occupational injuries at a chemical manufacturing facility

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A retrospective study of 1353 occupational injuries occurring at a chemical manufacturing facility in Houston, Texas from January, 1982 through May, 1988 was performed to investigate the etiology of the occupational injury process. Injury incidence rates were calculated for various sub-populations of workers to determine differences in the risk of injury for various groups. Linear modeling techniques were used to determine the association between certain collected independent variables and severity of an injury event. Finally, two sub-groups of the worker population, shiftworkers and injury recidivists, were examined. An injury recidivist as defined is any worker experiencing one or more injury per year. Overall, female shiftworkers evidenced the highest average injury incidence rate compared to all other worker groups analyzed. Although the female shiftworkers were younger and less experienced, the etiology of their increased risk of injury remains unclear, although the rigors of performing shiftwork itself or ergonomic factors are suspect. In general, females were injured more frequently than males, but they did not incur more severe injuries. For all workers, many injuries were caused by erroneous or foregone training, and risk taking behaviors. Injuries of these types are avoidable. The distribution of injuries by severity level was bimodal; either injuries were of minor or major severity with only a small number of cases falling in between. Of the variables collected, only the type of injury incurred and the worker's titlecode were statistically significantly associated with injury severity. Shiftworkers did not sustain more severe injuries than other worker groups. Injury to shiftworkers varied as a 24-hour pattern; the greatest number occurred between 1200-1230 hours, (p = 0.002) by Cosinor analysis. Recidivists made up 3.3% of the population (23 males and 10 females), yet suffered 17.8% of the injuries. Although past research suggests that injury recidivism is a random statistical event, analysis of the data by logistic regression implicates gender, area worked, age and job titlecode as being statistically significantly related to injury recidivism at this facility. ^

NONLINEAR TIME SERIES/SYSTEM ANALYSIS (STATE-SPACE, DYNAMICS, INTERVENTION, RESILIENCE, KALMAN)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A discussion of nonlinear dynamics, demonstrated by the familiar automobile, is followed by the development of a systematic method of analysis of a possibly nonlinear time series using difference equations in the general state-space format. This format allows recursive state-dependent parameter estimation after each observation thereby revealing the dynamics inherent in the system in combination with random external perturbations.^ The one-step ahead prediction errors at each time period, transformed to have constant variance, and the estimated parametric sequences provide the information to (1) formally test whether time series observations y(,t) are some linear function of random errors (ELEM)(,s), for some t and s, or whether the series would more appropriately be described by a nonlinear model such as bilinear, exponential, threshold, etc., (2) formally test whether a statistically significant change has occurred in structure/level either historically or as it occurs, (3) forecast nonlinear system with a new and innovative (but very old numerical) technique utilizing rational functions to extrapolate individual parameters as smooth functions of time which are then combined to obtain the forecast of y and (4) suggest a measure of resilience, i.e. how much perturbation a structure/level can tolerate, whether internal or external to the system, and remain statistically unchanged. Although similar to one-step control, this provides a less rigid way to think about changes affecting social systems.^ Applications consisting of the analysis of some familiar and some simulated series demonstrate the procedure. Empirical results suggest that this state-space or modified augmented Kalman filter may provide interesting ways to identify particular kinds of nonlinearities as they occur in structural change via the state trajectory.^ A computational flow-chart detailing computations and software input and output is provided in the body of the text. IBM Advanced BASIC program listings to accomplish most of the analysis are provided in the appendix. ^

Motivational interviewing and feedback for college drinkers: Handling missing values with complete case analysis and multiple imputation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^

Pathway analysis of genome-wide association study data in primary biliary cirrhosisa

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) have successfully identified several genetic loci associated with inherited predisposition to primary biliary cirrhosis (PBC), the most common autoimmune disease of the liver. Pathway-based tests constitute a novel paradigm for GWAS analysis. By evaluating genetic variation across a biological pathway (gene set), these tests have the potential to determine the collective impact of variants with subtle effects that are individually too weak to be detected in traditional single variant GWAS analysis. To identify biological pathways associated with the risk of development of PBC, GWAS of PBC from Italy (449 cases and 940 controls) and Canada (530 cases and 398 controls) were independently analyzed. The linear combination test (LCT), a recently developed pathway-level statistical method was used for this analysis. For additional validation, pathways that were replicated at the P <0.05 level of significance in both GWAS on LCT analysis were also tested for association with PBC in each dataset using two complementary GWAS pathway approaches. The complementary approaches included a modification of the gene set enrichment analysis algorithm (i-GSEA4GWAS) and Fisher's exact test for pathway enrichment ratios. Twenty-five pathways were associated with PBC risk on LCT analysis in the Italian dataset at P<0.05, of which eight had an FDR<0.25. The top pathway in the Italian dataset was the TNF/stress related signaling pathway (p=7.38×10 -4, FDR=0.18). Twenty-six pathways were associated with PBC at the P<0.05 level using the LCT in the Canadian dataset with the regulation and function of ChREBP in liver pathway (p=5.68×10-4, FDR=0.285) emerging as the most significant pathway. Two pathways, phosphatidylinositol signaling system (Italian: p=0.016, FDR=0.436; Canadian: p=0.034, FDR=0.693) and hedgehog signaling (Italian: p=0.044, FDR=0.636; Canadian: p=0.041, FDR=0.693), were replicated at LCT P<0.05 in both datasets. Statistically significant association of both pathways with PBC genetic susceptibility was confirmed in the Italian dataset on i-GSEA4GWAS. Results for the phosphatidylinositol signaling system were also significant in both datasets on applying Fisher's exact test for pathway enrichment ratios. This study identified a combination of known and novel pathway-level associations with PBC risk. If functionally validated, the findings may yield fresh insights into the etiology of this complex autoimmune disease with possible preventive and therapeutic application.^

Detecting genetic and nutritional lung cancer risk factors related to folate metabolism using Bayesian generalized linear models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^

«
1
2
»