5 resultados para Mathematical chance method
em DigitalCommons@The Texas Medical Center
Resumo:
A non-parametric method was developed and tested to compare the partial areas under two correlated Receiver Operating Characteristic curves. Based on the theory of generalized U-statistics the mathematical formulas have been derived for computing ROC area, and the variance and covariance between the portions of two ROC curves. A practical SAS application also has been developed to facilitate the calculations. The accuracy of the non-parametric method was evaluated by comparing it to other methods. By applying our method to the data from a published ROC analysis of CT image, our results are very close to theirs. A hypothetical example was used to demonstrate the effects of two crossed ROC curves. The two ROC areas are the same. However each portion of the area between two ROC curves were found to be significantly different by the partial ROC curve analysis. For computation of ROC curves with large scales, such as a logistic regression model, we applied our method to the breast cancer study with Medicare claims data. It yielded the same ROC area computation as the SAS Logistic procedure. Our method also provides an alternative to the global summary of ROC area comparison by directly comparing the true-positive rates for two regression models and by determining the range of false-positive values where the models differ. ^
Resumo:
With the aim of understanding the mechanism of molecular evolution, mathematical problems on the evolutionary change of DNA sequences are studied. The problems studied and the results obtained are as follows: (1) Estimation of evolutionary distance between nucleotide sequences. Studying the pattern of nucleotide substitution for the case of unequal substitution rates, a new mathematical formula for estimating the average number of nucleotide substitutions per site between two homologous DNA sequences is developed. It is shown that this formula has a wider applicability than currently available formulae. A statistical method for estimating the number of nucleotide changes due to deletion and insertion is also developed. (2) Biases of the estimates of nucleotide substitutions obtained by the restriction enzyme method. The deviation of the estimate of nucleotide substitutions obtained by the restriction enzyme method from the true value is investigated theoretically. It is shown that the amount of the deviation depends on the nucleotides in the recognition sequence of the restriction enzyme used, unequal rates of substitution among different nucleotides, and nucleotide frequences, but the primary factor is the unequal rates of nucleotide substitution. When many different kinds of enzymes are used, however, the amount of average deviation is generally small. (3) Distribution of restriction fragment lengths. To see the effect of undetectable restriction fragments and fragment differences on the estimate of nucleotide differences, the theoretical distribution of fragment lengths is studied. This distribution depends on the type of restriction enzymes used as well as on the relative frequencies of four nucleotides. It is shown that undetectability of small fragments or fragment differences gives a serious underestimate of nucleotide substitutions when the length-difference method of estimation is used, but the extent of underestimation is small when the site-difference method is used. (4) Evolutionary relationships of DNA sequences in finite populations. A mathematical theory on the expected evolutionary relationships among DNA sequences (nucleons) randomly chosen from the same or different populations is developed under the assumption that the evolutionary change of nucleons is determined solely by mutation and random genetic drift. . . . (Author's abstract exceeds stipulated maximum length. Discontinued here with permission of author). UMI ^
Resumo:
In epidemiology literature, it is often required to investigate the relationships between means where the levels of experiment are actually monotone sets forming a partition on the range of sampling values. With this need, the analysis of these group means is generally performed using classical analysis of variance (ANOVA). However, this method has never been challenged. In this dissertation, we will formulate and present our examination of its validity. First, the classical assumptions of normality and constant variance are not always true. Second, under the null hypothesis of equal means, the test statistic for the classical ANOVA technique is still valid. Third, when the hypothesis of equal means is rejected, the classical analysis techniques for hypotheses of contrasts are not valid. Fourth, under the alternative hypothesis, we can show that the monotone property of levels leads to the conclusion that the means are monotone. Fifth, we propose an appropriate method for handing the data in this situation. ^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
Interaction effect is an important scientific interest for many areas of research. Common approach for investigating the interaction effect of two continuous covariates on a response variable is through a cross-product term in multiple linear regression. In epidemiological studies, the two-way analysis of variance (ANOVA) type of method has also been utilized to examine the interaction effect by replacing the continuous covariates with their discretized levels. However, the implications of model assumptions of either approach have not been examined and the statistical validation has only focused on the general method, not specifically for the interaction effect.^ In this dissertation, we investigated the validity of both approaches based on the mathematical assumptions for non-skewed data. We showed that linear regression may not be an appropriate model when the interaction effect exists because it implies a highly skewed distribution for the response variable. We also showed that the normality and constant variance assumptions required by ANOVA are not satisfied in the model where the continuous covariates are replaced with their discretized levels. Therefore, naïve application of ANOVA method may lead to an incorrect conclusion. ^ Given the problems identified above, we proposed a novel method modifying from the traditional ANOVA approach to rigorously evaluate the interaction effect. The analytical expression of the interaction effect was derived based on the conditional distribution of the response variable given the discretized continuous covariates. A testing procedure that combines the p-values from each level of the discretized covariates was developed to test the overall significance of the interaction effect. According to the simulation study, the proposed method is more powerful then the least squares regression and the ANOVA method in detecting the interaction effect when data comes from a trivariate normal distribution. The proposed method was applied to a dataset from the National Institute of Neurological Disorders and Stroke (NINDS) tissue plasminogen activator (t-PA) stroke trial, and baseline age-by-weight interaction effect was found significant in predicting the change from baseline in NIHSS at Month-3 among patients received t-PA therapy.^