9 resultados para statistical techniques
em DigitalCommons@The Texas Medical Center
Resumo:
The main goal of this study was to relate physical changes in image quality measured by Modulation Transfer Function (MTF) to diagnostic accuracy.^ One Hundred and Fifty Kodak Min-R screen/film combination conventional craniocaudal mammograms obtained with the Pfizer Microfocus Mammographic system were selected from the files of the Department of Radiology, at M.D. Anderson Hospital and Tumor Institute.^ The mammograms included 88 cases with a variety of benign diagnosis and 62 cases with a variety of malignant biopsy diagnosis. The average age of the patient population was 55 years old. 70 cases presented calcifications with 30 cases having calcifications smaller than 0.5mm. 46 cases presented irregular bordered masses larger than 1 cm. 30 cases presented smooth bordered masses with 20 larger than 1 cm.^ Four separated copies of the original images were made each having a different change in the MTF using a defocusing technique whereby copies of the original were obtained by light exposure through different thicknesses (spacing) of transparent film base.^ The mammograms were randomized, and evaluated by three experienced mammographers for the degree of visibility of various anatomical breast structures and pathological lesions (masses and calicifications), subjective image quality, and mammographic interpretation.^ 3,000 separate evaluations were anayzed by several statistical techniques including Receiver Operating Characteristic curve analysis, McNemar test for differences between proportions and the Landis et al. method of agreement weighted kappa for ordinal categorical data.^ Results from the statistical analysis show: (1) There were no statistical significant differences in the diagnostic accuracy of the observers when diagnosing from mammograms with the same MTF. (2) There were no statistically significant differences in diagnostic accuracy for each observer when diagnosing from mammograms with the different MTF's used in the study. (3) There statistical significant differences in detail visibility between the copies and the originals. Detail visibility was better in the originals. (4) Feature interpretations were not significantly different between the originals and the copies. (5) Perception of image quality did not affect image interpretation.^ Continuation and improvement of this research ca be accomplished by: using a case population more sensitive to MTF changes, i.e., asymptomatic women with minimum breast cancer, more observers (including less experienced radiologists and experienced technologists) must collaborate in the study, and using a minimum of 200 benign and 200 malignant cases.^
Resumo:
Hierarchically clustered populations are often encountered in public health research, but the traditional methods used in analyzing this type of data are not always adequate. In the case of survival time data, more appropriate methods have only begun to surface in the last couple of decades. Such methods include multilevel statistical techniques which, although more complicated to implement than traditional methods, are more appropriate. ^ One population that is known to exhibit a hierarchical structure is that of patients who utilize the health care system of the Department of Veterans Affairs where patients are grouped not only by hospital, but also by geographic network (VISN). This project analyzes survival time data sets housed at the Houston Veterans Affairs Medical Center Research Department using two different Cox Proportional Hazards regression models, a traditional model and a multilevel model. VISNs that exhibit significantly higher or lower survival rates than the rest are identified separately for each model. ^ In this particular case, although there are differences in the results of the two models, it is not enough to warrant using the more complex multilevel technique. This is shown by the small estimates of variance associated with levels two and three in the multilevel Cox analysis. Much of the differences that are exhibited in identification of VISNs with high or low survival rates is attributable to computer hardware difficulties rather than to any significant improvements in the model. ^
Resumo:
An analysis of variation in hospital inpatient charges in the greater Houston area is conducted to determine if there are consistent differences among payers. Differences in charges are examined for 59 Composite Diagnosis Related Groups (CDRGs) and two regression equations estimating charges are specified. Simple comparison of mean charges by diagnostic categories are significantly different for 42 (71 percent) of the 59 categories examined. In 41 of the 42 significant categories, charges to Medicaid were less than charges to private insurers. Meta-analytic statistical techniques yielded a weighted average effect size of $-$0.7198 for the 59 diagnostic categories, indicating an overall effect that Medicaid charges were less than private insurance charges. Results of a multiple regression estimating charges showed that private insurance was a significant independent variable, along with age, length of stay, and hospital variables. Results indicated consistent differential charges in the present analysis. ^
Resumo:
The possibility of a relationship between American Trypanosomiasis (Chagas') disease and pregnancy outcome was analyzed measuring feto-maternal morbidity and mortality in a sample of 604 pregnant women and their offspring seen at the Hospital Universitario de Maternidad y Neonatologia in Cordoba, Argentina during 1979.^ A cross-sectional, "case-comparison" investigation was employed to determine the degree of risk between having a reactive chagasic serologic test and a negative pregnancy outcome as determined by abortion, stillbirth, and infant death prior to one week of age. Patients were selected using a dichotomous, 0-1 scale with either the presence or the absence of a reactive Machado-Guerreiro complement fixation serologic blood test result.^ The data obtained were analyzed using appropriate statistical techniques for measuring the comparisons between the case and control groups under various demographic and socioeconomic variables such as, age, marital status, educational attainment, and residence. Similarly, additional biological variables of birth order, maternal and fetal complications, and prematurity were examined.^ From the analysis of the data obtained in this investigation, no definite conclusions can be reached regarding the risk of having an unsuccessful pregnancy outcome in the presence of a reactive serologic finding because the study design was a cross-sectional one and the number of events were too few for an adequate analysis. Notwithstanding these limitations, the results obtained, after statistical adjustments were employed, demonstrated that women with a reactive test result were older, were of a higher parity, and were less educated. Marital status and residence were not significant variables. The risk of pregnancy wastage, however, was almost twice as frequent in the reactive group as in the non-reactive group of women. Statistically significant differences in maternal morbidity involved two complications, polyhydramnios and varicosities of the lower extremities and vulva; while in the newborn, infection was higher in infants whose mothers exhibited a reactive serologic test result.^ In summary, what this research study has shown is the need for engaging in a larger, longitudinal study for an in-depth exploration of feto-maternal morbidity and mortality--an investigation that would corraborate or refute the findings of this study.^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
Calcium levels in spines play a significant role in determining the sign and magnitude of synaptic plasticity. The magnitude of calcium influx into spines is highly dependent on influx through N-methyl D-aspartate (NMDA) receptors, and therefore depends on the number of postsynaptic NMDA receptors in each spine. We have calculated previously how the number of postsynaptic NMDA receptors determines the mean and variance of calcium transients in the postsynaptic density, and how this alters the shape of plasticity curves. However, the number of postsynaptic NMDA receptors in the postsynaptic density is not well known. Anatomical methods for estimating the number of NMDA receptors produce estimates that are very different than those produced by physiological techniques. The physiological techniques are based on the statistics of synaptic transmission and it is difficult to experimentally estimate their precision. In this paper we use stochastic simulations in order to test the validity of a physiological estimation technique based on failure analysis. We find that the method is likely to underestimate the number of postsynaptic NMDA receptors, explain the source of the error, and re-derive a more precise estimation technique. We also show that the original failure analysis as well as our improved formulas are not robust to small estimation errors in key parameters.
Resumo:
Nuclear morphometry (NM) uses image analysis to measure features of the cell nucleus which are classified as: bulk properties, shape or form, and DNA distribution. Studies have used these measurements as diagnostic and prognostic indicators of disease with inconclusive results. The distributional properties of these variables have not been systematically investigated although much of the medical data exhibit nonnormal distributions. Measurements are done on several hundred cells per patient so summary measurements reflecting the underlying distribution are needed.^ Distributional characteristics of 34 NM variables from prostate cancer cells were investigated using graphical and analytical techniques. Cells per sample ranged from 52 to 458. A small sample of patients with benign prostatic hyperplasia (BPH), representing non-cancer cells, was used for general comparison with the cancer cells.^ Data transformations such as log, square root and 1/x did not yield normality as measured by the Shapiro-Wilks test for normality. A modulus transformation, used for distributions having abnormal kurtosis values, also did not produce normality.^ Kernel density histograms of the 34 variables exhibited non-normality and 18 variables also exhibited bimodality. A bimodality coefficient was calculated and 3 variables: DNA concentration, shape and elongation, showed the strongest evidence of bimodality and were studied further.^ Two analytical approaches were used to obtain a summary measure for each variable for each patient: cluster analysis to determine significant clusters and a mixture model analysis using a two component model having a Gaussian distribution with equal variances. The mixture component parameters were used to bootstrap the log likelihood ratio to determine the significant number of components, 1 or 2. These summary measures were used as predictors of disease severity in several proportional odds logistic regression models. The disease severity scale had 5 levels and was constructed of 3 components: extracapsulary penetration (ECP), lymph node involvement (LN+) and seminal vesicle involvement (SV+) which represent surrogate measures of prognosis. The summary measures were not strong predictors of disease severity. There was some indication from the mixture model results that there were changes in mean levels and proportions of the components in the lower severity levels. ^
Resumo:
Mixed longitudinal designs are important study designs for many areas of medical research. Mixed longitudinal studies have several advantages over cross-sectional or pure longitudinal studies, including shorter study completion time and ability to separate time and age effects, thus are an attractive choice. Statistical methodology used in general longitudinal studies has been rapidly developing within the last few decades. Common approaches for statistical modeling in studies with mixed longitudinal designs have been the linear mixed-effects model incorporating an age or time effect. The general linear mixed-effects model is considered an appropriate choice to analyze repeated measurements data in longitudinal studies. However, common use of linear mixed-effects model on mixed longitudinal studies often incorporates age as the only random-effect but fails to take into consideration the cohort effect in conducting statistical inferences on age-related trajectories of outcome measurements. We believe special attention should be paid to cohort effects when analyzing data in mixed longitudinal designs with multiple overlapping cohorts. Thus, this has become an important statistical issue to address. ^ This research aims to address statistical issues related to mixed longitudinal studies. The proposed study examined the existing statistical analysis methods for the mixed longitudinal designs and developed an alternative analytic method to incorporate effects from multiple overlapping cohorts as well as from different aged subjects. The proposed study used simulation to evaluate the performance of the proposed analytic method by comparing it with the commonly-used model. Finally, the study applied the proposed analytic method to the data collected by an existing study Project HeartBeat!, which had been evaluated using traditional analytic techniques. Project HeartBeat! is a longitudinal study of cardiovascular disease (CVD) risk factors in childhood and adolescence using a mixed longitudinal design. The proposed model was used to evaluate four blood lipids adjusting for age, gender, race/ethnicity, and endocrine hormones. The result of this dissertation suggest the proposed analytic model could be a more flexible and reliable choice than the traditional model in terms of fitting data to provide more accurate estimates in mixed longitudinal studies. Conceptually, the proposed model described in this study has useful features, including consideration of effects from multiple overlapping cohorts, and is an attractive approach for analyzing data in mixed longitudinal design studies.^
Resumo:
Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^