14 resultados para Process control -- Statistical methods

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Dropouts and missing data are nearly-ubiquitous in obesity randomized controlled trails, threatening validity and generalizability of conclusions. Herein, we meta-analytically evaluate the extent of missing data, the frequency with which various analytic methods are employed to accommodate dropouts, and the performance of multiple statistical methods. METHODOLOGY/PRINCIPAL FINDINGS: We searched PubMed and Cochrane databases (2000-2006) for articles published in English and manually searched bibliographic references. Articles of pharmaceutical randomized controlled trials with weight loss or weight gain prevention as major endpoints were included. Two authors independently reviewed each publication for inclusion. 121 articles met the inclusion criteria. Two authors independently extracted treatment, sample size, drop-out rates, study duration, and statistical method used to handle missing data from all articles and resolved disagreements by consensus. In the meta-analysis, drop-out rates were substantial with the survival (non-dropout) rates being approximated by an exponential decay curve (e(-lambdat)) where lambda was estimated to be .0088 (95% bootstrap confidence interval: .0076 to .0100) and t represents time in weeks. The estimated drop-out rate at 1 year was 37%. Most studies used last observation carried forward as the primary analytic method to handle missing data. We also obtained 12 raw obesity randomized controlled trial datasets for empirical analyses. Analyses of raw randomized controlled trial data suggested that both mixed models and multiple imputation performed well, but that multiple imputation may be more robust when missing data are extensive. CONCLUSION/SIGNIFICANCE: Our analysis offers an equation for predictions of dropout rates useful for future study planning. Our raw data analyses suggests that multiple imputation is better than other methods for handling missing data in obesity randomized controlled trials, followed closely by mixed models. We suggest these methods supplant last observation carried forward as the primary method of analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Guidance for appropriate utilisation of transthoracic echocardiograms (TTEs) can be incorporated into ordering prompts, potentially affecting the number of requests. METHODS: We incorporated data from the 2011 Appropriate Use Criteria for Echocardiography, the 2010 National Institute for Clinical Excellence Guideline on Chronic Heart Failure, and American College of Cardiology Choosing Wisely list on TTE use for dyspnoea, oedema and valvular disease into electronic ordering systems at Durham Veterans Affairs Medical Center. Our primary outcome was TTE orders per month. Secondary outcomes included rates of outpatient TTE ordering per 100 visits and frequency of brain natriuretic peptide (BNP) ordering prior to TTE. Outcomes were measured for 20 months before and 12 months after the intervention. RESULTS: The number of TTEs ordered did not decrease (338±32 TTEs/month prior vs 320±33 afterwards, p=0.12). Rates of outpatient TTE ordering decreased minimally post intervention (2.28 per 100 primary care/cardiology visits prior vs 1.99 afterwards, p<0.01). Effects on TTE ordering and ordering rate significantly interacted with time from intervention (p<0.02 for both), as the small initial effects waned after 6 months. The percentage of TTE orders with preceding BNP increased (36.5% prior vs 42.2% after for inpatients, p=0.01; 10.8% prior vs 14.5% after for outpatients, p<0.01). CONCLUSIONS: Ordering prompts for TTEs initially minimally reduced the number of TTEs ordered and increased BNP measurement at a single institution, but the effect on TTEs ordered was likely insignificant from a utilisation standpoint and decayed over time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A steady increase in knowledge of the molecular and antigenic structure of the gp120 and gp41 HIV-1 envelope glycoproteins (Env) is yielding important new insights for vaccine design, but it has been difficult to translate this information to an immunogen that elicits broadly neutralizing antibodies. To help bridge this gap, we used phylogenetically corrected statistical methods to identify amino acid signature patterns in Envs derived from people who have made potently neutralizing antibodies, with the hypothesis that these Envs may share common features that would be useful for incorporation in a vaccine immunogen. Before attempting this, essentially as a control, we explored the utility of our computational methods for defining signatures of complex neutralization phenotypes by analyzing Env sequences from 251 clonal viruses that were differentially sensitive to neutralization by the well-characterized gp120-specific monoclonal antibody, b12. We identified ten b12-neutralization signatures, including seven either in the b12-binding surface of gp120 or in the V2 region of gp120 that have been previously shown to impact b12 sensitivity. A simple algorithm based on the b12 signature pattern was predictive of b12 sensitivity/resistance in an additional blinded panel of 57 viruses. Upon obtaining these reassuring outcomes, we went on to apply these same computational methods to define signature patterns in Env from HIV-1 infected individuals who had potent, broadly neutralizing responses. We analyzed a checkerboard-style neutralization dataset with sera from 69 HIV-1-infected individuals tested against a panel of 25 different Envs. Distinct clusters of sera with high and low neutralization potencies were identified. Six signature positions in Env sequences obtained from the 69 samples were found to be strongly associated with either the high or low potency responses. Five sites were in the CD4-induced coreceptor binding site of gp120, suggesting an important role for this region in the elicitation of broadly neutralizing antibody responses against HIV-1.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex diseases will have multiple functional sites, and it will be invaluable to understand the cross-locus interaction in terms of linkage disequilibrium (LD) between those sites (epistasis) in addition to the haplotype-LD effects. We investigated the statistical properties of a class of matrix-based statistics to assess this epistasis. These statistical methods include two LD contrast tests (Zaykin et al., 2006) and partial least squares regression (Wang et al., 2008). To estimate Type 1 error rates and power, we simulated multiple two-variant disease models using the SIMLA software package. SIMLA allows for the joint action of up to two disease genes in the simulated data with all possible multiplicative interaction effects between them. Our goal was to detect an interaction between multiple disease-causing variants by means of their linkage disequilibrium (LD) patterns with other markers. We measured the effects of marginal disease effect size, haplotype LD, disease prevalence and minor allele frequency have on cross-locus interaction (epistasis). In the setting of strong allele effects and strong interaction, the correlation between the two disease genes was weak (r=0.2). In a complex system with multiple correlations (both marginal and interaction), it was difficult to determine the source of a significant result. Despite these complications, the partial least squares and modified LD contrast methods maintained adequate power to detect the epistatic effects; however, for many of the analyses we often could not separate interaction from a strong marginal effect. While we did not exhaust the entire parameter space of possible models, we do provide guidance on the effects that population parameters have on cross-locus interaction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Integrated vector management (IVM) is increasingly being recommended as an option for sustainable malaria control. However, many malaria-endemic countries lack a policy framework to guide and promote the approach. The objective of the study was to assess knowledge and perceptions in relation to current malaria vector control policy and IVM in Uganda, and to make recommendations for consideration during future development of a specific IVM policy. METHODS: The study used a structured questionnaire to interview 34 individuals working at technical or policy-making levels in health, environment, agriculture and fisheries sectors. Specific questions on IVM focused on the following key elements of the approach: integration of chemical and non-chemical interventions of vector control; evidence-based decision making; inter-sectoral collaboration; capacity building; legislation; advocacy and community mobilization. RESULTS: All participants were familiar with the term IVM and knew various conventional malaria vector control (MVC) methods. Only 75% thought that Uganda had a MVC policy. Eighty percent (80%) felt there was inter-sectoral collaboration towards IVM, but that it was poor due to financial constraints, difficulties in involving all possible sectors and political differences. The health, environment and agricultural sectors were cited as key areas requiring cooperation in order for IVM to succeed. Sixty-seven percent (67%) of participants responded that communities were actively being involved in MVC, while 48% felt that the use of research results for evidence-based decision making was inadequate or poor. A majority of the participants felt that malaria research in Uganda was rarely used to facilitate policy changes. Suggestions by participants for formulation of specific and effective IVM policy included: revising the MVC policy and IVM-related policies in other sectors into a single, unified IVM policy and, using legislation to enforce IVM in development projects. CONCLUSION: Integrated management of malaria vectors in Uganda remains an underdeveloped component of malaria control policy. Cooperation between the health and other sectors needs strengthening and funding for MVC increased in order to develop and effectively implement an appropriate IVM policy. Continuous engagement of communities by government as well as monitoring and evaluation of vector control programmes will be crucial for sustaining IVM in the country.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.

This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.

On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.

In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.

We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,

and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.

In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Evidence is lacking to inform providers' and patients' decisions about many common treatment strategies for patients with end stage renal disease (ESRD). METHODS/DESIGN: The DEcIDE Patient Outcomes in ESRD Study is funded by the United States (US) Agency for Health Care Research and Quality to study the comparative effectiveness of: 1) antihypertensive therapies, 2) early versus later initiation of dialysis, and 3) intravenous iron therapies on clinical outcomes in patients with ESRD. Ongoing studies utilize four existing, nationally representative cohorts of patients with ESRD, including (1) the Choices for Healthy Outcomes in Caring for ESRD study (1041 incident dialysis patients recruited from October 1995 to June 1999 with complete outcome ascertainment through 2009), (2) the Dialysis Clinic Inc (45,124 incident dialysis patients initiating and receiving their care from 2003-2010 with complete outcome ascertainment through 2010), (3) the United States Renal Data System (333,308 incident dialysis patients from 2006-2009 with complete outcome ascertainment through 2010), and (4) the Cleveland Clinic Foundation Chronic Kidney Disease Registry (53,399 patients with chronic kidney disease with outcome ascertainment from 2005 through 2009). We ascertain patient reported outcomes (i.e., health-related quality of life), morbidity, and mortality using clinical and administrative data, and data obtained from national death indices. We use advanced statistical methods (e.g., propensity scoring and marginal structural modeling) to account for potential biases of our study designs. All data are de-identified for analyses. The conduct of studies and dissemination of findings are guided by input from Stakeholders in the ESRD community. DISCUSSION: The DEcIDE Patient Outcomes in ESRD Study will provide needed evidence regarding the effectiveness of common treatments employed for dialysis patients. Carefully planned dissemination strategies to the ESRD community will enhance studies' impact on clinical care and patients' outcomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nolan and Temple Lang argue that “the ability to express statistical computations is an es- sential skill.” A key related capacity is the ability to conduct and present data analysis in a way that another person can understand and replicate. The copy-and-paste workflow that is an artifact of antiquated user-interface design makes reproducibility of statistical analysis more difficult, especially as data become increasingly complex and statistical methods become increasingly sophisticated. R Markdown is a new technology that makes creating fully-reproducible statistical analysis simple and painless. It provides a solution suitable not only for cutting edge research, but also for use in an introductory statistics course. We present experiential and statistical evidence that R Markdown can be used effectively in introductory statistics courses, and discuss its role in the rapidly-changing world of statistical computation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The obesity epidemic has spread to young adults, leading to significant public health implications later in adulthood. Intervention in early adulthood may be an effective public health strategy for reducing the long-term health impact of the epidemic. Few weight loss trials have been conducted in young adults. It is unclear what weight loss strategies are beneficial in this population. PURPOSE: To describe the design and rationale of the NHLBI-sponsored Cell Phone Intervention for You (CITY) study, which is a single center, randomized three-arm trial that compares the impact on weight loss of 1) a behavioral intervention that is delivered almost entirely via cell phone technology (Cell Phone group); and 2) a behavioral intervention delivered mainly through monthly personal coaching calls enhanced by self-monitoring via cell phone (Personal Coaching group), each compared to 3) a usual care, advice-only control condition. METHODS: A total of 365 community-dwelling overweight/obese adults aged 18-35 years were randomized to receive one of these three interventions for 24 months in parallel group design. Study personnel assessing outcomes were blinded to group assignment. The primary outcome is weight change at 24 [corrected] months. We hypothesize that each active intervention will cause more weight loss than the usual care condition. Study completion is anticipated in 2014. CONCLUSIONS: If effective, implementation of the CITY interventions could mitigate the alarming rates of obesity in young adults through promotion of weight loss. ClinicalTrial.gov: NCT01092364.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

© 2014, The International Biometric Society.A potential venue to improve healthcare efficiency is to effectively tailor individualized treatment strategies by incorporating patient level predictor information such as environmental exposure, biological, and genetic marker measurements. Many useful statistical methods for deriving individualized treatment rules (ITR) have become available in recent years. Prior to adopting any ITR in clinical practice, it is crucial to evaluate its value in improving patient outcomes. Existing methods for quantifying such values mainly consider either a single marker or semi-parametric methods that are subject to bias under model misspecification. In this article, we consider a general setting with multiple markers and propose a two-step robust method to derive ITRs and evaluate their values. We also propose procedures for comparing different ITRs, which can be used to quantify the incremental value of new markers in improving treatment selection. While working models are used in step I to approximate optimal ITRs, we add a layer of calibration to guard against model misspecification and further assess the value of the ITR non-parametrically, which ensures the validity of the inference. To account for the sampling variability of the estimated rules and their corresponding values, we propose a resampling procedure to provide valid confidence intervals for the value functions as well as for the incremental value of new markers for treatment selection. Our proposals are examined through extensive simulation studies and illustrated with the data from a clinical trial that studies the effects of two drug combinations on HIV-1 infected patients.