7 resultados para big data processing
em Duke University
Resumo:
INTRODUCTION: The characterization of urinary calculi using noninvasive methods has the potential to affect clinical management. CT remains the gold standard for diagnosis of urinary calculi, but has not reliably differentiated varying stone compositions. Dual-energy CT (DECT) has emerged as a technology to improve CT characterization of anatomic structures. This study aims to assess the ability of DECT to accurately discriminate between different types of urinary calculi in an in vitro model using novel postimage acquisition data processing techniques. METHODS: Fifty urinary calculi were assessed, of which 44 had >or=60% composition of one component. DECT was performed utilizing 64-slice multidetector CT. The attenuation profiles of the lower-energy (DECT-Low) and higher-energy (DECT-High) datasets were used to investigate whether differences could be seen between different stone compositions. RESULTS: Postimage acquisition processing allowed for identification of the main different chemical compositions of urinary calculi: brushite, calcium oxalate-calcium phosphate, struvite, cystine, and uric acid. Statistical analysis demonstrated that this processing identified all stone compositions without obvious graphical overlap. CONCLUSION: Dual-energy multidetector CT with postprocessing techniques allows for accurate discrimination among the main different subtypes of urinary calculi in an in vitro model. The ability to better detect stone composition may have implications in determining the optimum clinical treatment modality for urinary calculi from noninvasive, preprocedure radiological assessment.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Plants exhibit different developmental strategies than animals; these are characterized by a tight linkage between environmental conditions and development. As plants have neither specialized sensory organs nor a nervous system, intercellular regulators are essential for their development. Recently, major advances have been made in understanding how intercellular regulation is achieved in plants on a molecular level. Plants use a variety of molecules for intercellular regulation: hormones are used as systemic signals that are interpreted at the individual-cell level; receptor peptide-ligand systems regulate local homeostasis; moving transcriptional regulators act in a switch-like manner over small and large distances. Together, these mechanisms coherently coordinate developmental decisions with resource allocation and growth.
Resumo:
BACKGROUND: Historically, only partial assessments of data quality have been performed in clinical trials, for which the most common method of measuring database error rates has been to compare the case report form (CRF) to database entries and count discrepancies. Importantly, errors arising from medical record abstraction and transcription are rarely evaluated as part of such quality assessments. Electronic Data Capture (EDC) technology has had a further impact, as paper CRFs typically leveraged for quality measurement are not used in EDC processes. METHODS AND PRINCIPAL FINDINGS: The National Institute on Drug Abuse Treatment Clinical Trials Network has developed, implemented, and evaluated methodology for holistically assessing data quality on EDC trials. We characterize the average source-to-database error rate (14.3 errors per 10,000 fields) for the first year of use of the new evaluation method. This error rate was significantly lower than the average of published error rates for source-to-database audits, and was similar to CRF-to-database error rates reported in the published literature. We attribute this largely to an absence of medical record abstraction on the trials we examined, and to an outpatient setting characterized by less acute patient conditions. CONCLUSIONS: Historically, medical record abstraction is the most significant source of error by an order of magnitude, and should be measured and managed during the course of clinical trials. Source-to-database error rates are highly dependent on the amount of structured data collection in the clinical setting and on the complexity of the medical record, dependencies that should be considered when developing data quality benchmarks.
Resumo:
Cumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data in public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and cloud software platforms. Given user-specified requirements in terms of time, monetary cost, and risk tolerance, Cumulon automatically makes intelligent decisions on implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings -- such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for them. Cumulon explores two alternative approaches toward supporting such markets, with different trade-offs between system and optimization complexity. Experimental study is conducted to show the efficiency of Cumulon's execution engine, as well as the optimizer's effectiveness in finding the optimal plan in the vast plan space.
Resumo:
In many important high-technology markets, including software development, data processing, communications, aeronautics, and defense, suppliers learn through experience how to provide better service at lower cost. This paper examines how a buyer designs dynamic competition among rival suppliers to exploit learning economies while minimizing the costs of becoming locked in to one producer. Strategies for controlling dynamic competition include the handicapping of more efficient suppliers in procurement competitions, the protection and allocation of intellectual property, and the sharing of information among rival suppliers. (JEL C73, D44, L10).
Resumo:
Segmentation of anatomical and pathological structures in ophthalmic images is crucial for the diagnosis and study of ocular diseases. However, manual segmentation is often a time-consuming and subjective process. This paper presents an automatic approach for segmenting retinal layers in Spectral Domain Optical Coherence Tomography images using graph theory and dynamic programming. Results show that this method accurately segments eight retinal layer boundaries in normal adult eyes more closely to an expert grader as compared to a second expert grader.