21 resultados para Data-driven Methods
Resumo:
Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^
Resumo:
Despite continued research and public health efforts to reduce smoking during pregnancy, prenatal cessation rates in the United States have decreased and the incidence of low birth weight has increased from 1985 to 1991. Lower socioeconomic status women who are at increased risk for poor pregnancy outcomes may be resistant to current intervention efforts during pregnancy. The purpose of this dissertation was to investigate the determinants of continued smoking and quitting among low-income pregnant women.^ Using data from cross-sectional surveys of 323 low-income pregnant smokers, the first study developed and tested measures of the pros and cons of smoking during pregnancy. The original decisional balance measure for smoking was compared with a new measure that added items thought to be more salient to the target population. Confirmatory factor analysis using structural equation modeling showed neither the original nor new measure fit the data adequately. Using behavioral science theory, content from interviews with the population, and statistical evidence, two 7-item scales representing the pros and cons were developed from a portion (n = 215) of the sample and successfully cross-validated on the remainder of the sample (n = 108). Logistic regression found only pros were significantly associated with continued smoking. In a discriminant function analysis, stage of change was significantly associated with pros and cons of smoking.^ The second study examined the structural relationships between psychosocial constructs representing some of the levels of and the pros and cons of smoking. The cross-sectional design mandates that statements made regarding prediction do not prove causation or directionality from the data or methods analysis. Structural equation modeling found the following: more stressors and family criticism were significantly more predictive of negative affect than social support; a bi-directional relationship was found between negative affect and current nicotine addiction; and negative affect, addiction, stressors, and family criticism were significant predictors of pros of smoking.^ The findings imply reversing the trend of decreasing smoking cessation during pregnancy may require supplementing current interventions for this population of pregnant smokers with programs addressing nicotine addiction, negative affect, and other psychosocial factors such as family functioning and stressors. ^
Resumo:
(1) A mathematical theory for computing the probabilities of various nucleotide configurations is developed, and the probability of obtaining the correct phylogenetic tree (model tree) from sequence data is evaluated for six phylogenetic tree-making methods (UPGMA, distance Wagner method, transformed distance method, Fitch-Margoliash's method, maximum parsimony method, and compatibility method). The number of nucleotides (m*) necessary to obtain the correct tree with a probability of 95% is estimated with special reference to the human, chimpanzee, and gorilla divergence. m* is at least 4,200, but the availability of outgroup species greatly reduces m* for all methods except UPGMA. m* increases if transitions occur more frequently than transversions as in the case of mitochondrial DNA. (2) A new tree-making method called the neighbor-joining method is proposed. This method is applicable either for distance data or character state data. Computer simulation has shown that the neighbor-joining method is generally better than UPGMA, Farris' method, Li's method, and modified Farris method on recovering the true topology when distance data are used. A related method, the simultaneous partitioning method, is also discussed. (3) The maximum likelihood (ML) method for phylogeny reconstruction under the assumption of both constant and varying evolutionary rates is studied, and a new algorithm for obtaining the ML tree is presented. This method gives a tree similar to that obtained by UPGMA when constant evolutionary rate is assumed, whereas it gives a tree similar to that obtained by the maximum parsimony tree and the neighbor-joining method when varying evolutionary rate is assumed. ^
Resumo:
The difficulty of detecting differential gene expression in microarray data has existed for many years. Several correction procedures try to avoid the family-wise error rate in multiple comparison process, including the Bonferroni and Sidak single-step p-value adjustments, Holm's step-down correction method, and Benjamini and Hochberg's false discovery rate (FDR) correction procedure. Each multiple comparison technique has its advantages and weaknesses. We studied each multiple comparison method through numerical studies (simulations) and applied the methods to the real exploratory DNA microarray data, which detect of molecular signatures in papillary thyroid cancer (PTC) patients. According to our results of simulation studies, Benjamini and Hochberg step-up FDR controlling procedure is the best process among these multiple comparison methods and we discovered 1277 potential biomarkers among 54675 probe sets after applying the Benjamini and Hochberg's method to PTC microarray data.^
Resumo:
Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (
New methods for quantification and analysis of quantitative real-time polymerase chain reaction data
Resumo:
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^