995 resultados para Biology, Biostatistics|Philosophy|Health Sciences, Public Health
Resumo:
Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^
Resumo:
The determination of size as well as power of a test is a vital part of a Clinical Trial Design. This research focuses on the simulation of clinical trial data with time-to-event as the primary outcome. It investigates the impact of different recruitment patterns, and time dependent hazard structures on size and power of the log-rank test. A non-homogeneous Poisson process is used to simulate entry times according to the different accrual patterns. A Weibull distribution is employed to simulate survival times according to the different hazard structures. The current study utilizes simulation methods to evaluate the effect of different recruitment patterns on size and power estimates of the log-rank test. The size of the log-rank test is estimated by simulating survival times with identical hazard rates between the treatment and the control arm of the study resulting in a hazard ratio of one. Powers of the log-rank test at specific values of hazard ratio (≠1) are estimated by simulating survival times with different, but proportional hazard rates for the two arms of the study. Different shapes (constant, decreasing, or increasing) of the hazard function of the Weibull distribution are also considered to assess the effect of hazard structure on the size and power of the log-rank test. ^
Resumo:
Objective. In 2009, the International Expert Committee recommended the use of HbA1c test for diagnosis of diabetes. Although it has been recommended for the diagnosis of diabetes, its precise test performance among Mexican Americans is uncertain. A strong “gold standard” would rely on repeated blood glucose measurement on different days, which is the recommended method for diagnosing diabetes in clinical practice. Our objective was to assess test performance of HbA1c in detecting diabetes and pre-diabetes against repeated fasting blood glucose measurement for the Mexican American population living in United States-Mexico border. Moreover, we wanted to find out a specific and precise threshold value of HbA1c for Diabetes Mellitus (DM) and pre-diabetes for this high-risk population which might assist in better diagnosis and better management of patient diabetes. ^ Research design and methods. We used CCHC dataset for our study. In 2004, the Cameron County Hispanic Cohort (CCHC), now numbering 2,574, was established drawn from randomly selected households on the basis of 2000 Census tract data. The CCHC study randomly selected a subset of people (aged 18-64 years) in CCHC cohort households to determine the influence of SES on diabetes and obesity. Among the participants in Cohort-2000, 67.15% are female; all are Hispanic. ^ Individuals were defined as having diabetes mellitus (Fasting plasma glucose [FPG] ≥ 126 mg/dL or pre-diabetes (100 ≤ FPG < 126 mg/dL). HbA1c test performance was evaluated using receiver operator characteristic (ROC) curves. Moreover, change-point models were used to determine HbA1c thresholds compatible with FPG thresholds for diabetes and pre-diabetes. ^ Results. When assessing Fasting Plasma Glucose (FPG) is used to detect diabetes, the sensitivity and specificity of HbA1c≥ 6.5% was 75% and 87% respectively (area under the curve 0.895). Additionally, when assessing FPG to detect pre-diabetes, the sensitivity and specificity of HbA1c≥ 6.0% (ADA recommended threshold) was 18% and 90% respectively. The sensitivity and specificity of HbA1c≥ 5.7% (International Expert Committee recommended threshold) for detecting pre-diabetes was 31% and 78% respectively. ROC analyses suggest HbA1c as a sound predictor of diabetes mellitus (area under the curve 0.895) but a poorer predictor for pre-diabetes (area under the curve 0.632). ^ Conclusions. Our data support the current recommendations for use of HbA1c in the diagnosis of diabetes for the Mexican American population as it has shown reasonable sensitivity, specificity and accuracy against repeated FPG measures. However, use of HbA1c may be premature for detecting pre-diabetes in this specific population because of the poor sensitivity with FPG. It might be the case that HbA1c is differentiating the cases more effectively who are at risk of developing diabetes. Following these pre-diabetic individuals for a longer-term for the detection of incident diabetes may lead to more confirmatory result.^
Resumo:
The ventricular system is a critical component of the central nervous system (CNS) that is formed early in the developmental stages and remains functional through the lifetime. Changes in the ventricular system can be easily discerned via neuroimaging procedures and most of the time it reflects changes in the physiology of the CNS. In this study we attempted to identify specific genes associated with variation in ventricular volume in humans. Methods. We conducted a genome wide association (GWA) analysis of the volume of the lateral ventricles among 1605 individuals of European ancestry from two community based cohorts, the Genetics of Microangiopathic Brain Injury (GMBI; N=814) and Atherosclerosis Risk in Communities (ARIC; N=791). Significant findings from the analysis were tested for replication in both the cohorts and then meta-analyzed to get an estimate of overall significance. Results. In our GWA analyses, no single nucleotide polymorphism (SNP) reached a genome-wide significance of p<10−8. There were 25 SNPs in GMBI and 9 SNPs in ARIC that reached a threshold of p<10 −5. However, none of the top SNPs from each cohort were replicated in the other. In the meta-analysis, no SNP reached the genome-wide threshold of 5×10−8, but we identified five novel SNPs associated with variation in ventricular volume at the p<10 −5 level. Strongest association was for rs2112536 in an intergenic region on chromosome 5q33 (Pmeta= 8.46×10−7 ). The remaining four SNPs were located on chromosome 3q23 encompassing the gene for Calsyntenin-2 (CLSTN2). The SNPs with strongest association in this region were rs17338555 (Pmeta= 5.28×10 −6), rs9812091 (Pmeta= 5.89×10−6 ), rs9812283 (Pmeta= 5.97×10−6) and rs9833213 (Pmeta= 6.96×10−6). Conclusions. This GWA study of ventricular volumes in the community-based cohorts of European descent identifies potential locus on chromosomes 3 and 5. Further characterization of these loci may provide insights into pathophysiology of ventricular involvement in various neurological diseases.^
Resumo:
In the biomedical studies, the general data structures have been the matched (paired) and unmatched designs. Recently, many researchers are interested in Meta-Analysis to obtain a better understanding from several clinical data of a medical treatment. The hybrid design, which is combined two data structures, may create the fundamental question for statistical methods and the challenges for statistical inferences. The applied methods are depending on the underlying distribution. If the outcomes are normally distributed, we would use the classic paired and two independent sample T-tests on the matched and unmatched cases. If not, we can apply Wilcoxon signed rank and rank sum test on each case. ^ To assess an overall treatment effect on a hybrid design, we can apply the inverse variance weight method used in Meta-Analysis. On the nonparametric case, we can use a test statistic which is combined on two Wilcoxon test statistics. However, these two test statistics are not in same scale. We propose the Hybrid Test Statistic based on the Hodges-Lehmann estimates of the treatment effects, which are medians in the same scale.^ To compare the proposed method, we use the classic meta-analysis T-test statistic on the combined the estimates of the treatment effects from two T-test statistics. Theoretically, the efficiency of two unbiased estimators of a parameter is the ratio of their variances. With the concept of Asymptotic Relative Efficiency (ARE) developed by Pitman, we show ARE of the hybrid test statistic relative to classic meta-analysis T-test statistic using the Hodges-Lemann estimators associated with two test statistics.^ From several simulation studies, we calculate the empirical type I error rate and power of the test statistics. The proposed statistic would provide effective tool to evaluate and understand the treatment effect in various public health studies as well as clinical trials.^
Resumo:
Scholars have found that socioeconomic status was one of the key factors that influenced early-stage lung cancer incidence rates in a variety of regions. This thesis examined the association between median household income and lung cancer incidence rates in Texas counties. A total of 254 individual counties in Texas with corresponding lung cancer incidence rates from 2004 to 2008 and median household incomes in 2006 were collected from the National Cancer Institute Surveillance System. A simple linear model and spatial linear models with two structures, Simultaneous Autoregressive Structure (SAR) and Conditional Autoregressive Structure (CAR), were used to link median household income and lung cancer incidence rates in Texas. The residuals of the spatial linear models were analyzed with Moran's I and Geary's C statistics, and the statistical results were used to detect similar lung cancer incidence rate clusters and disease patterns in Texas.^
Resumo:
The infant mortality rate (IMR) is considered to be one of the most important indices of a country's well-being. Countries around the world and other health organizations like the World Health Organization are dedicating their resources, knowledge and energy to reduce the infant mortality rates. The well-known Millennium Development Goal 4 (MDG 4), whose aim is to archive a two thirds reduction of the under-five mortality rate between 1990 and 2015, is an example of the commitment. ^ In this study our goal is to model the trends of IMR between the 1950s to 2010s for selected countries. We would like to know how the IMR is changing overtime and how it differs across countries. ^ IMR data collected over time forms a time series. The repeated observations of IMR time series are not statistically independent. So in modeling the trend of IMR, it is necessary to account for these correlations. We proposed to use the generalized least squares method in general linear models setting to deal with the variance-covariance structure in our model. In order to estimate the variance-covariance matrix, we referred to the time-series models, especially the autoregressive and moving average models. Furthermore, we will compared results from general linear model with correlation structure to that from ordinary least squares method without taking into account the correlation structure to check how significantly the estimates change.^
Resumo:
This thesis presents an analysis of data from Molecular Epidemiology of Type II Diabetes Mellitus in Mexican Americans. The study included 294 families. Among the participating families were 500 Mexican American females aged 19 to 86 who provided information on characteristics such as height, weight, and a variety of biochemical indicators. The research questions for this thesis are: (1) How strong is the association between indicators of the metabolic syndrome in study participants and their family histories of type II diabetes; and (2) How is an individual's family history of type II diabetes, age and socioeconomic status associated with the metabolic syndrome? In this thesis education status of the participants is used as an indicator of socioeconomic status. Answers to these questions are provided through the analysis of women's responses to written questionnaires and biochemical data. ^
Resumo:
Life expectancy has consistently increased over the last 150 years due to improvements in nutrition, medicine, and public health. Several studies found that in many developed countries, life expectancy continued to rise following a nearly linear trend, which was contrary to a common belief that the rate of improvement in life expectancy would decelerate and was fit with an S-shaped curve. Using samples of countries that exhibited a wide range of economic development levels, we explored the change in life expectancy over time by employing both nonlinear and linear models. We then observed if there were any significant differences in estimates between linear models, assuming an auto-correlated error structure. When data did not have a sigmoidal shape, nonlinear growth models sometimes failed to provide meaningful parameter estimates. The existence of an inflection point and asymptotes in the growth models made them inflexible with life expectancy data. In linear models, there was no significant difference in the life expectancy growth rate and future estimates between ordinary least squares (OLS) and generalized least squares (GLS). However, the generalized least squares model was more robust because the data involved time-series variables and residuals were positively correlated. ^
Resumo:
Schizophrenia (SZ) is a complex disorder with high heritability and variable phenotypes that has limited success in finding causal genes associated with the disease development. Pathway-based analysis is an effective approach in investigating the molecular mechanism of susceptible genes associated with complex diseases. The etiology of complex diseases could be a network of genetic factors and within the genes, interaction may occur. In this work we argue that some genes might be of small effect that by itself are neither sufficient nor necessary to cause the disease however, their effect may induce slight changes to the gene expression or affect the protein function, therefore, analyzing the gene-gene interaction mechanism within the disease pathway would play crucial role in dissecting the genetic architecture of complex diseases, making the pathway-based analysis a complementary approach to GWAS technique. ^ In this study, we implemented three novel linkage disequilibrium based statistics, the linear combination, the quadratic, and the decorrelation test statistics, to investigate the interaction between linked and unlinked genes in two independent case-control GWAS datasets for SZ including participants of European (EA) and African (AA) ancestries. The EA population included 1,173 cases and 1,378 controls with 729,454 genotyped SNPs, while the AA population included 219 cases and 288 controls with 845,814 genotyped SNPs. We identified 17,186 interacting gene-sets at significant level in EA dataset, and 12,691 gene-sets in AA dataset using the gene-gene interaction method. We also identified 18,846 genes in EA dataset and 19,431 genes in AA dataset that were in the disease pathways. However, few genes were reported of significant association to SZ. ^ Our research determined the pathways characteristics for schizophrenia through the gene-gene interaction and gene-pathway based approaches. Our findings suggest insightful inferences of our methods in studying the molecular mechanisms of common complex diseases.^
Resumo:
Objective: The study aimed to identify the risk factors involved in initiating thromboembolism (TE) in pancreatic cancer (PC) patients, with focus on ABO blood type. ^ Methods and Patients: There were 35.7% confirmed cases of TE and 64.3% cases remained free of TE (n=687). There were 12.7% only Pulmonary embolism (PE), 9% only Deep vein thrombosis (DVT), 53.5% only other sites, 3.3% combined PE and DVT, 8.6% combined PE and other sites, 9.8% combined DVT and other sites, and 3.3% all three combined cases. ^ Results: The risk factors for thrombosis identified by multivariate logistic regression were: history of previous anti-thrombotic treatment, tumor site in pancreatic body or tail, large tumor size, maximum glucose category more than 126 and 200 mg/dL. ^ The factors with worse overall survival by multivariate Cox regression and Kaplan Meier analyses were: locally advanced or metastatic stage, worsening performance status, high CA 19-9 levels, and HbA1C levels more than 6 %, at diagnosis. ^ There were 29.1% and 39.1% of the patients with thrombosis in the O and non-O blood type groups respectively. Both Non-O blood type (P=0.02) and the A, B and AB blood types (P= 0.007) were associated with thrombosis as compared to O type. The odds of thrombosis were nearly half in O blood type patients as compared to non-O blood type [OR-0.54 (95% C.I.- 0.37-0.79), P<0.001]. ^ Conclusion: A better understanding of the TE and PC relationship and involved risk factors may provide insights on tumor biology and patient response to prophylactic anticoagulation therapy.^
Resumo:
Maximizing data quality may be especially difficult in trauma-related clinical research. Strategies are needed to improve data quality and assess the impact of data quality on clinical predictive models. This study had two objectives. The first was to compare missing data between two multi-center trauma transfusion studies: a retrospective study (RS) using medical chart data with minimal data quality review and the PRospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study with standardized quality assurance. The second objective was to assess the impact of missing data on clinical prediction algorithms by evaluating blood transfusion prediction models using PROMMTT data. RS (2005-06) and PROMMTT (2009-10) investigated trauma patients receiving ≥ 1 unit of red blood cells (RBC) from ten Level I trauma centers. Missing data were compared for 33 variables collected in both studies using mixed effects logistic regression (including random intercepts for study site). Massive transfusion (MT) patients received ≥ 10 RBC units within 24h of admission. Correct classification percentages for three MT prediction models were evaluated using complete case analysis and multiple imputation based on the multivariate normal distribution. A sensitivity analysis for missing data was conducted to estimate the upper and lower bounds of correct classification using assumptions about missing data under best and worst case scenarios. Most variables (17/33=52%) had <1% missing data in RS and PROMMTT. Of the remaining variables, 50% demonstrated less missingness in PROMMTT, 25% had less missingness in RS, and 25% were similar between studies. Missing percentages for MT prediction variables in PROMMTT ranged from 2.2% (heart rate) to 45% (respiratory rate). For variables missing >1%, study site was associated with missingness (all p≤0.021). Survival time predicted missingness for 50% of RS and 60% of PROMMTT variables. MT models complete case proportions ranged from 41% to 88%. Complete case analysis and multiple imputation demonstrated similar correct classification results. Sensitivity analysis upper-lower bound ranges for the three MT models were 59-63%, 36-46%, and 46-58%. Prospective collection of ten-fold more variables with data quality assurance reduced overall missing data. Study site and patient survival were associated with missingness, suggesting that data were not missing completely at random, and complete case analysis may lead to biased results. Evaluating clinical prediction model accuracy may be misleading in the presence of missing data, especially with many predictor variables. The proposed sensitivity analysis estimating correct classification under upper (best case scenario)/lower (worst case scenario) bounds may be more informative than multiple imputation, which provided results similar to complete case analysis.^
Resumo:
Background: The follow-up care for women with breast cancer requires an understanding of disease recurrence patterns and the follow-up visit schedule should be determined according to the times when the recurrence are most likely to occur, so that preventive measure can be taken to avoid or minimize the recurrence. Objective: To model breast cancer recurrence through stochastic process with an aim to generate a hazard function for determining a follow-up schedule. Methods: We modeled the process of disease progression as the time transformed Weiner process and the first-hitting-time was used as an approximation of the true failure time. The women's "recurrence-free survival time" or a "not having the recurrence event" is modeled by the time it takes Weiner process to cross a threshold value which represents a woman experiences breast cancer recurrence event. We explored threshold regression model which takes account of covariates that contributed to the prognosis of breast cancer following development of the first-hitting time model. Using real data from SEER-Medicare, we proposed models of follow-up visits schedule on the basis of constant probability of disease recurrence between consecutive visits. Results: We demonstrated that the threshold regression based on first-hitting-time modeling approach can provide useful predictive information about breast cancer recurrence. Our results suggest the surveillance and follow-up schedule can be determined for women based on their prognostic factors such as tumor stage and others. Women with early stage of disease may be seen less frequently for follow-up visits than those women with locally advanced stages. Our results from SEER-Medicare data support the idea of risk-controlled follow-up strategies for groups of women. Conclusion: The methodology we proposed in this study allows one to determine individual follow-up scheduling based on a parametric hazard function that incorporates known prognostic factors.^
Resumo:
Study Objective: Identify the most frequent risk factors of Community Acquired-MRSA (CA-MRSA) Skin and Soft-tissue Infections (SSTIs) using a case series of patients and characterize them by age, race/ethnicity, gender, abscess location, druguse and intravenous drug-user (IVDU), underlying medical conditions, homelessness, treatment resistance, sepsis, those whose last healthcare visit was within the last 12 months, and describe the susceptibility pattern from this central Texas population that have come into the University Medical Center Brackenridge (UMCB) Emergency Department (ED). ^ Methods: This study was a retrospective case-series medical record review involving a convenience sample of patients in 2007 from an urban public hospital's ED in Texas that had a SSTI that tested positive for MRSA. All positive MRSA cultures underwent susceptibility testing to determine antibiotic resistance. The demographic and clinical variables that were independently associated with MRSA were determined by univariate and multivariate analysis using logistic regression to calculate odds ratios (OR), 95% confidence intervals, and significance (p≤ 0.05). ^ Results: In 2007, there were 857 positive MRSA cultures. The demographics were: males 60% and females 40%, with the average age of 36.2 (std. dev. =13) the study population consisted of non-Hispanic white (42%), Hispanics (38%), and non-Hispanic black (18.8%). Possible risk factors addressed included using recreational drugs (not including IVDU) (27%) homelessness (13%), diabetes status (12.6%) or having an infectious disease, and IVDU (10%). The most frequent abscess location was the leg (26.6%), followed by the arm and torso (both 13.7%). Eighty-three percent of patients had one prominent susceptibility pattern that had a susceptibility rate for the following antibiotics: trimethoprim/sulfamethoxazole (TMP-SMX) and vancomycin had 100%, gentamicin 99%, clindamycin 96%, tetracycline 96%, and erythromycin 56%. ^ Conclusion: The ED is becoming an important area for disease transmission between the sterile hospital environment and the outside environment. As always, it is important to further research in the ED in an effort to better understand MRSA transmission and antibiotic resistance, as well as to keep surveillance for the introduction of new opportunistic pathogens into the population. ^
Resumo:
This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^