11 resultados para Missing data
em Duke University
Resumo:
BACKGROUND: Dropouts and missing data are nearly-ubiquitous in obesity randomized controlled trails, threatening validity and generalizability of conclusions. Herein, we meta-analytically evaluate the extent of missing data, the frequency with which various analytic methods are employed to accommodate dropouts, and the performance of multiple statistical methods. METHODOLOGY/PRINCIPAL FINDINGS: We searched PubMed and Cochrane databases (2000-2006) for articles published in English and manually searched bibliographic references. Articles of pharmaceutical randomized controlled trials with weight loss or weight gain prevention as major endpoints were included. Two authors independently reviewed each publication for inclusion. 121 articles met the inclusion criteria. Two authors independently extracted treatment, sample size, drop-out rates, study duration, and statistical method used to handle missing data from all articles and resolved disagreements by consensus. In the meta-analysis, drop-out rates were substantial with the survival (non-dropout) rates being approximated by an exponential decay curve (e(-lambdat)) where lambda was estimated to be .0088 (95% bootstrap confidence interval: .0076 to .0100) and t represents time in weeks. The estimated drop-out rate at 1 year was 37%. Most studies used last observation carried forward as the primary analytic method to handle missing data. We also obtained 12 raw obesity randomized controlled trial datasets for empirical analyses. Analyses of raw randomized controlled trial data suggested that both mixed models and multiple imputation performed well, but that multiple imputation may be more robust when missing data are extensive. CONCLUSION/SIGNIFICANCE: Our analysis offers an equation for predictions of dropout rates useful for future study planning. Our raw data analyses suggests that multiple imputation is better than other methods for handling missing data in obesity randomized controlled trials, followed closely by mixed models. We suggest these methods supplant last observation carried forward as the primary method of analysis.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Resumo:
OBJECTIVES: To compare the predictive performance and potential clinical usefulness of risk calculators of the European Randomized Study of Screening for Prostate Cancer (ERSPC RC) with and without information on prostate volume. METHODS: We studied 6 cohorts (5 European and 1 US) with a total of 15,300 men, all biopsied and with pre-biopsy TRUS measurements of prostate volume. Volume was categorized into 3 categories (25, 40, and 60 cc), to reflect use of digital rectal examination (DRE) for volume assessment. Risks of prostate cancer were calculated according to a ERSPC DRE-based RC (including PSA, DRE, prior biopsy, and prostate volume) and a PSA + DRE model (including PSA, DRE, and prior biopsy). Missing data on prostate volume were completed by single imputation. Risk predictions were evaluated with respect to calibration (graphically), discrimination (AUC curve), and clinical usefulness (net benefit, graphically assessed in decision curves). RESULTS: The AUCs of the ERSPC DRE-based RC ranged from 0.61 to 0.77 and were substantially larger than the AUCs of a model based on only PSA + DRE (ranging from 0.56 to 0.72) in each of the 6 cohorts. The ERSPC DRE-based RC provided net benefit over performing a prostate biopsy on the basis of PSA and DRE outcome in five of the six cohorts. CONCLUSIONS: Identifying men at increased risk for having a biopsy detectable prostate cancer should consider multiple factors, including an estimate of prostate volume.
Resumo:
BACKGROUND: Administrative or quality improvement registries may or may not contain the elements needed for investigations by trauma researchers. International Classification of Diseases Program for Injury Categorisation (ICDPIC), a statistical program available through Stata, is a powerful tool that can extract injury severity scores from ICD-9-CM codes. We conducted a validation study for use of the ICDPIC in trauma research. METHODS: We conducted a retrospective cohort validation study of 40,418 patients with injury using a large regional trauma registry. ICDPIC-generated AIS scores for each body region were compared with trauma registry AIS scores (gold standard) in adult and paediatric populations. A separate analysis was conducted among patients with traumatic brain injury (TBI) comparing the ICDPIC tool with ICD-9-CM embedded severity codes. Performance in characterising overall injury severity, by the ISS, was also assessed. RESULTS: The ICDPIC tool generated substantial correlations in thoracic and abdominal trauma (weighted κ 0.87-0.92), and in head and neck trauma (weighted κ 0.76-0.83). The ICDPIC tool captured TBI severity better than ICD-9-CM code embedded severity and offered the advantage of generating a severity value for every patient (rather than having missing data). Its ability to produce an accurate severity score was consistent within each body region as well as overall. CONCLUSIONS: The ICDPIC tool performs well in classifying injury severity and is superior to ICD-9-CM embedded severity for TBI. Use of ICDPIC demonstrates substantial efficiency and may be a preferred tool in determining injury severity for large trauma datasets, provided researchers understand its limitations and take caution when examining smaller trauma datasets.
Resumo:
Copyright © 2014 International Anesthesia Research Society.BACKGROUND: Goal-directed fluid therapy (GDFT) is associated with improved outcomes after surgery. The esophageal Doppler monitor (EDM) is widely used, but has several limitations. The NICOM, a completely noninvasive cardiac output monitor (Cheetah Medical), may be appropriate for guiding GDFT. No prospective studies have compared the NICOM and the EDM. We hypothesized that the NICOM is not significantly different from the EDM for monitoring during GDFT. METHODS: One hundred adult patients undergoing elective colorectal surgery participated in this study. Patients in phase I (n = 50) had intraoperative GDFT guided by the EDM while the NICOM was connected, and patients in phase II (n = 50) had intraoperative GDFT guided by the NICOM while the EDM was connected. Each patient's stroke volume was optimized using 250- mL colloid boluses. Agreement between the monitors was assessed, and patient outcomes (postoperative pain, nausea, and return of bowel function), complications (renal, pulmonary, infectious, and wound complications), and length of hospital stay (LOS) were compared. RESULTS: Using a 10% increase in stroke volume after fluid challenge, agreement between monitors was 60% at 5 minutes, 61% at 10 minutes, and 66% at 15 minutes, with no significant systematic disagreement (McNemar P > 0.05) at any time point. The EDM had significantly more missing data than the NICOM. No clinically significant differences were found in total LOS or other outcomes. The mean LOS was 6.56 ± 4.32 days in phase I and 6.07 ± 2.85 days in phase II, and 95% confidence limits for the difference were -0.96 to +1.95 days (P = 0.5016). CONCLUSIONS: The NICOM performs similarly to the EDM in guiding GDFT, with no clinically significant differences in outcomes, and offers increased ease of use as well as fewer missing data points. The NICOM may be a viable alternative monitor to guide GDFT.
Resumo:
Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.
This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.
The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new
individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the
refreshment sample itself. As we illustrate, nonignorable unit nonresponse
can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse
in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.
The second method incorporates informative prior beliefs about
marginal probabilities into Bayesian latent class models for categorical data.
The basic idea is to append synthetic observations to the original data such that
(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.
We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.
The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.
Resumo:
Background: Sickle Cell Disease (SCD) is a genetic hematological disorder that affects more than 7 million people globally (NHLBI, 2009). It is estimated that 50% of adults with SCD experience pain on most days, with 1/3 experiencing chronic pain daily (Smith et al., 2008). Persons with SCD also experience higher levels of pain catastrophizing (feelings of helplessness, pain rumination and magnification) than other chronic pain conditions, which is associated with increases in pain intensity, pain behavior, analgesic consumption, frequency and duration of hospital visits, and with reduced daily activities (Sullivan, Bishop, & Pivik, 1995; Keefe et al., 2000; Gil et al., 1992 & 1993). Therefore effective interventions are needed that can successfully be used manage pain and pain-related outcomes (e.g., pain catastrophizing) in persons with SCD. A review of the literature demonstrated limited information regarding the feasibility and efficacy of non-pharmacological approaches for pain in persons with SCD, finding an average effect size of .33 on pain reduction across measurable non-pharmacological studies. Second, a prospective study on persons with SCD that received care for a vaso-occlusive crisis (VOC; N = 95) found: (1) high levels of patient reported depression (29%) and anxiety (34%), and (2) that unemployment was significantly associated with increased frequency of acute care encounters and hospital admissions per person. Research suggests that one promising category of non-pharmacological interventions for managing both physical and affective components of pain are Mindfulness-based Interventions (MBIs; Thompson et al., 2010; Cox et al., 2013). The primary goal of this dissertation was thus to develop and test the feasibility, acceptability, and efficacy of a telephonic MBI for pain catastrophizing in persons with SCD and chronic pain.
Methods: First, a telephonic MBI was developed through an informal process that involved iterative feedback from patients, clinical experts in SCD and pain management, social workers, psychologists, and mindfulness clinicians. Through this process, relevant topics and skills were selected to adapt in each MBI session. Second, a pilot randomized controlled trial was conducted to test the feasibility, acceptability, and efficacy of the telephonic MBI for pain catastrophizing in persons with SCD and chronic pain. Acceptability and feasibility were determined by assessment of recruitment, attrition, dropout, and refusal rates (including refusal reasons), along with semi-structured interviews with nine randomly selected patients at the end of study. Participants completed assessments at baseline, Week 1, 3, and 6 to assess efficacy of the intervention on decreasing pain catastrophizing and other pain-related outcomes.
Results: A telephonic MBI is feasible and acceptable for persons with SCD and chronic pain. Seventy-eight patients with SCD and chronic pain were approached, and 76% (N = 60) were enrolled and randomized. The MBI attendance rate, approximately 57% of participants completing at least four mindfulness sessions, was deemed acceptable, and participants that received the telephonic MBI described it as acceptable, easy to access, and consume in post-intervention interviews. The amount of missing data was undesirable (MBI condition, 40%; control condition, 25%), but fell within the range of expected missing outcome data for a RCT with multiple follow-up assessments. Efficacy of the MBI on pain catastrophizing could not be determined due to small sample size and degree of missing data, but trajectory analyses conducted for the MBI condition only trended in the right direction and pain catastrophizing approached statistically significance.
Conclusion: Overall results showed that at telephonic group-based MBI is acceptable and feasible for persons with SCD and chronic pain. Though the study was not able to determine treatment efficacy nor powered to detect a statistically significant difference between conditions, participants (1) described the intervention as acceptable, and (2) the observed effect sizes for the MBI condition demonstrated large effects of the MBI on pain catastrophizing, mental health, and physical health. Replication of this MBI study with a larger sample size, active control group, and additional assessments at the end of each week (e.g., Week 1 through Week 6) is needed to determine treatment efficacy. Many lessons were learned that will guide the development of future studies including which MBI strategies were most helpful, methods to encourage continued participation, and how to improve data capture.
Resumo:
Previously developed models for predicting absolute risk of invasive epithelial ovarian cancer have included a limited number of risk factors and have had low discriminatory power (area under the receiver operating characteristic curve (AUC) < 0.60). Because of this, we developed and internally validated a relative risk prediction model that incorporates 17 established epidemiologic risk factors and 17 genome-wide significant single nucleotide polymorphisms (SNPs) using data from 11 case-control studies in the United States (5,793 cases; 9,512 controls) from the Ovarian Cancer Association Consortium (data accrued from 1992 to 2010). We developed a hierarchical logistic regression model for predicting case-control status that included imputation of missing data. We randomly divided the data into an 80% training sample and used the remaining 20% for model evaluation. The AUC for the full model was 0.664. A reduced model without SNPs performed similarly (AUC = 0.649). Both models performed better than a baseline model that included age and study site only (AUC = 0.563). The best predictive power was obtained in the full model among women younger than 50 years of age (AUC = 0.714); however, the addition of SNPs increased the AUC the most for women older than 50 years of age (AUC = 0.638 vs. 0.616). Adapting this improved model to estimate absolute risk and evaluating it in prospective data sets is warranted.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.
Resumo:
We propose a novel data-delivery method for delay-sensitive traffic that significantly reduces the energy consumption in wireless sensor networks without reducing the number of packets that meet end-to-end real-time deadlines. The proposed method, referred to as SensiQoS, leverages the spatial and temporal correlation between the data generated by events in a sensor network and realizes energy savings through application-specific in-network aggregation of the data. SensiQoS maximizes energy savings by adaptively waiting for packets from upstream nodes to perform in-network processing without missing the real-time deadline for the data packets. SensiQoS is a distributed packet scheduling scheme, where nodes make localized decisions on when to schedule a packet for transmission to meet its end-to-end real-time deadline and to which neighbor they should forward the packet to save energy. We also present a localized algorithm for nodes to adapt to network traffic to maximize energy savings in the network. Simulation results show that SensiQoS improves the energy savings in sensor networks where events are sensed by multiple nodes, and spatial and/or temporal correlation exists among the data packets. Energy savings due to SensiQoS increase with increase in the density of the sensor nodes and the size of the sensed events. © 2010 Harshavardhan Sabbineni and Krishnendu Chakrabarty.
Resumo:
BACKGROUND: The National Comprehensive Cancer Network and the American Society of Clinical Oncology have established guidelines for the treatment and surveillance of colorectal cancer (CRC), respectively. Considering these guidelines, an accurate and efficient method is needed to measure receipt of care. METHODS: The accuracy and completeness of Veterans Health Administration (VA) administrative data were assessed by comparing them with data manually abstracted during the Colorectal Cancer Care Collaborative (C4) quality improvement initiative for 618 patients with stage I-III CRC. RESULTS: The VA administrative data contained gender, marital, and birth information for all patients but race information was missing for 62.1% of patients. The percent agreement for demographic variables ranged from 98.1-100%. The kappa statistic for receipt of treatments ranged from 0.21 to 0.60 and there was a 96.9% agreement for the date of surgical resection. The percentage of post-diagnosis surveillance events in C4 also in VA administrative data were 76.0% for colonoscopy, 84.6% for physician visit, and 26.3% for carcinoembryonic antigen (CEA) test. CONCLUSIONS: VA administrative data are accurate and complete for non-race demographic variables, receipt of CRC treatment, colonoscopy, and physician visits; but alternative data sources may be necessary to capture patient race and receipt of CEA tests.