11 resultados para Linear regression method

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interaction effect is an important scientific interest for many areas of research. Common approach for investigating the interaction effect of two continuous covariates on a response variable is through a cross-product term in multiple linear regression. In epidemiological studies, the two-way analysis of variance (ANOVA) type of method has also been utilized to examine the interaction effect by replacing the continuous covariates with their discretized levels. However, the implications of model assumptions of either approach have not been examined and the statistical validation has only focused on the general method, not specifically for the interaction effect.^ In this dissertation, we investigated the validity of both approaches based on the mathematical assumptions for non-skewed data. We showed that linear regression may not be an appropriate model when the interaction effect exists because it implies a highly skewed distribution for the response variable. We also showed that the normality and constant variance assumptions required by ANOVA are not satisfied in the model where the continuous covariates are replaced with their discretized levels. Therefore, naïve application of ANOVA method may lead to an incorrect conclusion. ^ Given the problems identified above, we proposed a novel method modifying from the traditional ANOVA approach to rigorously evaluate the interaction effect. The analytical expression of the interaction effect was derived based on the conditional distribution of the response variable given the discretized continuous covariates. A testing procedure that combines the p-values from each level of the discretized covariates was developed to test the overall significance of the interaction effect. According to the simulation study, the proposed method is more powerful then the least squares regression and the ANOVA method in detecting the interaction effect when data comes from a trivariate normal distribution. The proposed method was applied to a dataset from the National Institute of Neurological Disorders and Stroke (NINDS) tissue plasminogen activator (t-PA) stroke trial, and baseline age-by-weight interaction effect was found significant in predicting the change from baseline in NIHSS at Month-3 among patients received t-PA therapy.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With hundreds of single nucleotide polymorphisms (SNPs) in a candidate gene and millions of SNPs across the genome, selecting an informative subset of SNPs to maximize the ability to detect genotype-phenotype association is of great interest and importance. In addition, with a large number of SNPs, analytic methods are needed that allow investigators to control the false positive rate resulting from large numbers of SNP genotype-phenotype analyses. This dissertation uses simulated data to explore methods for selecting SNPs for genotype-phenotype association studies. I examined the pattern of linkage disequilibrium (LD) across a candidate gene region and used this pattern to aid in localizing a disease-influencing mutation. The results indicate that the r2 measure of linkage disequilibrium is preferred over the common D′ measure for use in genotype-phenotype association studies. Using step-wise linear regression, the best predictor of the quantitative trait was not usually the single functional mutation. Rather it was a SNP that was in high linkage disequilibrium with the functional mutation. Next, I compared three strategies for selecting SNPs for application to phenotype association studies: based on measures of linkage disequilibrium, based on a measure of haplotype diversity, and random selection. The results demonstrate that SNPs selected based on maximum haplotype diversity are more informative and yield higher power than randomly selected SNPs or SNPs selected based on low pair-wise LD. The data also indicate that for genes with small contribution to the phenotype, it is more prudent for investigators to increase their sample size than to continuously increase the number of SNPs in order to improve statistical power. When typing large numbers of SNPs, researchers are faced with the challenge of utilizing an appropriate statistical method that controls the type I error rate while maintaining adequate power. We show that an empirical genotype based multi-locus global test that uses permutation testing to investigate the null distribution of the maximum test statistic maintains a desired overall type I error rate while not overly sacrificing statistical power. The results also show that when the penetrance model is simple the multi-locus global test does as well or better than the haplotype analysis. However, for more complex models, haplotype analyses offer advantages. The results of this dissertation will be of utility to human geneticists designing large-scale multi-locus genotype-phenotype association studies. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Lung damage is a common side effect of chemotherapeutic drugs such as bleomycin. This study used a bleomycin mouse model which simulates the lung damage observed in humans. Noninvasive, in vivo cone-beam computed tomography (CBCT) was used to visualize and quantify fibrotic and inflammatory damage over the entire lung volume of mice. Bleomycin was used to induce pulmonary damage in vivo and the results from two CBCT systems, a micro-CT and flat panel CT (fpCT), were compared to histologic measurements, the standard method of murine lung damage quantification. Twenty C57BL/6 mice were given either 3 U/kg of bleomycin or saline intratracheally. The mice were scanned at baseline, before the administration of bleomycin, and then 10, 14, and 21 days afterward. At each time point, a subset of mice was sacrificed for histologic analysis. The resulting CT images were used to assess lung volume. Percent lung damage (PLD) was calculated for each mouse on both the fpCT (PLDfpcT) and the micro-CT (PLDμCT). Histologic PLD (PLDH) was calculated for each histologic section at each time point (day 10, n = 4; day 14, n = 4; day 21, n = 5; control group, n = 5). A linear regression was applied to the PLDfpCT vs. PLDH, PLDμCT vs. PLDH and PLDfpCT vs. PLDμCT distributions. This study did not demonstrate strong correlations between PLDCT and PLDH. The coefficient of determination, R, was 0.68 for PLDμCT vs. PLDH and 0.75 for the PLD fpCT vs. PLDH. The experimental issues identified from this study were: (1) inconsistent inflation of the lungs from scan to scan, (2) variable distribution of damage (one histologic section not representative of overall lung damage), (3) control mice not scanned with each group of bleomycin mice, (4) two CT systems caused long anesthesia time for the mice, and (5) respiratory gating did not hold the volume of lung constant throughout the scan. Addressing these issues might allow for further improvement of the correlation between PLDCT and PLDH. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Purpose. To determine if self-efficacy (SE) changes predicted total fat (TF) and total fiber (TFB) intake and the relationship between SE changes and the two dietary outcomes. ^ Design. This is a secondary analysis, utilizing baseline and first follow up (FFU) data from the NULIFE, a randomized trial. ^ Setting. Nutrition classes were taught in the Texas Medical Center in Houston, Texas. ^ Participants. 79 pre-menopausal, 25--45 year old African American women with an 85% response rate at FFU. ^ Method. Dietary intake was assessed with the Arizona Food Frequency Questionnaire and SE with the Self Efficacy for Dietary Change Questionnaire. Analysis was done using Stata version 9. Linear and logistic regression was used with adjustment for confounders. ^ Results. Linear regression analyses showed that SE changes for eating fruits and vegetables predicted total fiber intake in the control group for both the univariate (P = 0.001) and multivariate (P = 0.01) models while SE for eating fruits and vegetables at first follow-up predicted total fiber intake in the intervention for both models (P = 0.000). Logistic regression analyses of low fat SE changes and 30% or less for total fat intake, showed an adjusted OR of 0.22 (95% CI = 0.03, 1.48; P = 0.12) in the intervention group. The logistic regression analyses of SE changes in fruits and vegetables and 10g or more for total fiber intake, showed an adjusted OR of 6.25 (95% CI = 0.53, 72.78; P = 0.14) in the control group. ^ Conclusion. SE for eating fruits and vegetables at first follow-up predicted intervention groups' TFB intake and intervention women that increased their SE for eating a low fat diet were more likely to achieve the study goal of 30% or less calories from TF. SE changes for eating fruits and vegetables predicted the control's TFB intake and control women that increased their SE for eating fruits and vegetables were more likely to achieve the study goal of 10 g or more from TFB. Limitations are use of self-report measures, small sample size, and possible control group contamination.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Purpose. To examine the association between living in proximity to Toxics Release Inventory (TRI) facilities and the incidence of childhood cancer in the State of Texas. ^ Design. This is a secondary data analysis utilizing the publicly available Toxics release inventory (TRI), maintained by the U.S. Environmental protection agency that lists the facilities that release any of the 650 TRI chemicals. Total childhood cancer cases and childhood cancer rate (age 0-14 years) by county, for the years 1995-2003 were used from the Texas cancer registry, available at the Texas department of State Health Services website. Setting: This study was limited to the children population of the State of Texas. ^ Method. Analysis was done using Stata version 9 and SPSS version 15.0. Satscan was used for geographical spatial clustering of childhood cancer cases based on county centroids using the Poisson clustering algorithm which adjusts for population density. Pictorial maps were created using MapInfo professional version 8.0. ^ Results. One hundred and twenty five counties had no TRI facilities in their region, while 129 facilities had at least one TRI facility. An increasing trend for number of facilities and total disposal was observed except for the highest category based on cancer rate quartiles. Linear regression analysis using log transformation for number of facilities and total disposal in predicting cancer rates was computed, however both these variables were not found to be significant predictors. Seven significant geographical spatial clusters of counties for high childhood cancer rates (p<0.05) were indicated. Binomial logistic regression by categorizing the cancer rate in to two groups (<=150 and >150) indicated an odds ratio of 1.58 (CI 1.127, 2.222) for the natural log of number of facilities. ^ Conclusion. We have used a unique methodology by combining GIS and spatial clustering techniques with existing statistical approaches in examining the association between living in proximity to TRI facilities and the incidence of childhood cancer in the State of Texas. Although a concrete association was not indicated, further studies are required examining specific TRI chemicals. Use of this information can enable the researchers and public to identify potential concerns, gain a better understanding of potential risks, and work with industry and government to reduce toxic chemical use, disposal or other releases and the risks associated with them. TRI data, in conjunction with other information, can be used as a starting point in evaluating exposures and risks. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ovarian cancer is the leading cause of cancer-related death for females due to lack of specific early detection method. It is of great interest to find molecular-based biomarkers which are sensitive and specific to ovarian cancer for early diagnosis, prognosis and therapeutics. miRNAs have been proposed to be potential biomarkers that could be used in cancer prevention and therapeutics. The current study analyzed the miRNA and mRNA expression data extracted from the Cancer Genome Atlas (TCGA) database. Using simple linear regression and multiple regression models, we found 71 miRNA-mRNA pairs which were negatively associated between 56 miRNAs and 24 genes of PI3K/AKT pathway. Among these miRNA and mRNA target pairs, 9 of them were in agreement with the predictions from the most commonly used target prediction programs including miRGen, miRDB, miRTarbase and miR2Disease. These shared miRNA-mRNA pairs were considered to be the most potential genes that were involved in ovarian cancer. Furthermore, 4 of the 9 target genes encode cell cycle or apoptosis related proteins including Cyclin D1, p21, FOXO1 and Bcl2, suggesting that their regulator miRNAs including miR-16, miR-96 and miR-21 most likely played important roles in promoting tumor growth through dysregulated cell cycle or apoptosis. miR-96 was also found to directly target IRS-1. In addition, the results showed that miR-17 and miR-9 may be involved in ovarian cancer through targeting JAK1. This study might provide evidence for using miRNA or miRNA profile as biomarker.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Purpose: School districts in the U.S. regularly offer foods that compete with the USDA reimbursable meal, known as `a la carte' foods. These foods must adhere to state nutritional regulations; however, the implementation of these regulations often differs across districts. The purpose of this study is to compare two methods of offering a la carte foods on student's lunch intake: 1) an extensive a la carte program in which schools have a separate area for a la carte food sales, that includes non-reimbursable entrees; and 2) a moderate a la carte program, which offers the sale of a la carte foods on the same serving line with reimbursable meals. ^ Methods: Direct observation was used to assess children's lunch consumption in six schools, across two districts in Central Texas (n=373 observations). Schools were matched on socioeconomic status. Data collectors were randomly assigned to students, and recorded foods obtained, foods consumed, source of food, gender, grade, and ethnicity. Observations were entered into a nutrient database program, FIAS Millennium Edition, to obtain nutritional information. Differences in energy and nutrient intake across lunch sources and districts were assessed using ANOVA and independent t-tests. A linear regression model was applied to control for potential confounders. ^ Results: Students at schools with extensive a la carte programs consumed significantly more calories, carbohydrates, total fat, saturated fat, calcium, and sodium compared to students in schools with moderate a la carte offerings (p<.05). Students in the extensive a la carte program consumed approximately 94 calories more than students in the moderate a la carte program. There was no significant difference in the energy consumption in students who consumed any amount of a la carte compared to students who consumed none. In both districts, students who consumed a la carte offerings were more likely to consume sugar-sweetened beverages, sweets, chips, and pizza compared to students who consumed no a la carte foods. ^ Conclusion: The amount, type and method of offering a la carte foods can significantly affect student dietary intake. This pilot study indicates that when a la carte foods are more available, students consume more calories. Findings underscore the need for further investigation on how availability of a la carte foods affects children's diets. Guidelines for school a la carte offerings should be maximized to encourage the consumption of healthful foods and appropriate energy intake.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Little is known about the effects on patient adherence when the same study drug is administered in the same dose in two populations with two different diseases in two different clinical trials. The Minocycline in Rheumatoid Arthritis (MIRA) trial and the NIH Exploratory Trials in Parkinson's disease (NET-PD) Futility Study I provide a unique opportunity to do the above and to compare methods measuring adherence. This study may increase understanding of the influence of disease and adverse events on patient adherence and will provide insights to investigators selecting adherence assessment methods in clinical trials of minocycline and other drugs in future.^ Methods: Minocycline adherence by pill count and the effect of adverse events was compared in the MIRA and NET-PD FS1 trials using multivariable linear regression. Within the MIRA trial, agreement between assay and pill count was compared. The association of adverse events with assay adherence was examined using multivariable logistic regression.^ Results: Adherence derived from pill count in the MIRA and NET-PD FS1 trials did not differ significantly. Adverse events potentially related to minocycline did not appear useful to predict minocycline adherence. In the MIRA trial, adherence measured by pill count appears higher than adherence measured by assay. Agreement between pill count and assay was poor (kappa statistic = 0.25).^ Limitations: Trial and disease are completely confounded and hence the independent effect of disease on adherence to minocycline treatment cannot be studied.^ Conclusion: Simple pill count may be preferred over assay in the minocycline clinical trials to measure adherence. Assays may be less sensitive in a clinical setting where appointments are not scheduled in relation to medication administration time, given assays depend on many pharmacokinetic and instrument-related factors. However, pill count can be manipulated by the patient. Another study suggested that self-report method is more sensitive than pill count method in differentiating adherence from non-adherence. An effect of medication-related adverse events on adherence could not be detected.^