4 resultados para Data-driven modelling
em Duke University
Resumo:
An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.
This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.
On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.
In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.
We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,
and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.
In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.
Resumo:
Head motion during a Positron Emission Tomography (PET) brain scan can considerably degrade image quality. External motion-tracking devices have proven successful in minimizing this effect, but the associated time, maintenance, and workflow changes inhibit their widespread clinical use. List-mode PET acquisition allows for the retroactive analysis of coincidence events on any time scale throughout a scan, and therefore potentially offers a data-driven motion detection and characterization technique. An algorithm was developed to parse list-mode data, divide the full acquisition into short scan intervals, and calculate the line-of-response (LOR) midpoint average for each interval. These LOR midpoint averages, known as “radioactivity centroids,” were presumed to represent the center of the radioactivity distribution in the scanner, and it was thought that changes in this metric over time would correspond to intra-scan motion.
Several scans were taken of the 3D Hoffman brain phantom on a GE Discovery IQ PET/CT scanner to test the ability of the radioactivity to indicate intra-scan motion. Each scan incrementally surveyed motion in a different degree of freedom (2 translational and 2 rotational). The radioactivity centroids calculated from these scans correlated linearly to phantom positions/orientations. Centroid measurements over 1-second intervals performed on scans with ~1mCi of activity in the center of the field of view had standard deviations of 0.026 cm in the x- and y-dimensions and 0.020 cm in the z-dimension, which demonstrates high precision and repeatability in this metric. Radioactivity centroids are thus shown to successfully represent discrete motions on the submillimeter scale. It is also shown that while the radioactivity centroid can precisely indicate the amount of motion during an acquisition, it fails to distinguish what type of motion occurred.
Resumo:
While genome-wide gene expression data are generated at an increasing rate, the repertoire of approaches for pattern discovery in these data is still limited. Identifying subtle patterns of interest in large amounts of data (tens of thousands of profiles) associated with a certain level of noise remains a challenge. A microarray time series was recently generated to study the transcriptional program of the mouse segmentation clock, a biological oscillator associated with the periodic formation of the segments of the body axis. A method related to Fourier analysis, the Lomb-Scargle periodogram, was used to detect periodic profiles in the dataset, leading to the identification of a novel set of cyclic genes associated with the segmentation clock. Here, we applied to the same microarray time series dataset four distinct mathematical methods to identify significant patterns in gene expression profiles. These methods are called: Phase consistency, Address reduction, Cyclohedron test and Stable persistence, and are based on different conceptual frameworks that are either hypothesis- or data-driven. Some of the methods, unlike Fourier transforms, are not dependent on the assumption of periodicity of the pattern of interest. Remarkably, these methods identified blindly the expression profiles of known cyclic genes as the most significant patterns in the dataset. Many candidate genes predicted by more than one approach appeared to be true positive cyclic genes and will be of particular interest for future research. In addition, these methods predicted novel candidate cyclic genes that were consistent with previous biological knowledge and experimental validation in mouse embryos. Our results demonstrate the utility of these novel pattern detection strategies, notably for detection of periodic profiles, and suggest that combining several distinct mathematical approaches to analyze microarray datasets is a valuable strategy for identifying genes that exhibit novel, interesting transcriptional patterns.
Resumo:
To investigate the neural systems that contribute to the formation of complex, self-relevant emotional memories, dedicated fans of rival college basketball teams watched a competitive game while undergoing functional magnetic resonance imaging (fMRI). During a subsequent recognition memory task, participants were shown video clips depicting plays of the game, stemming either from previously-viewed game segments (targets) or from non-viewed portions of the same game (foils). After an old-new judgment, participants provided emotional valence and intensity ratings of the clips. A data driven approach was first used to decompose the fMRI signal acquired during free viewing of the game into spatially independent components. Correlations were then calculated between the identified components and post-scanning emotion ratings for successfully encoded targets. Two components were correlated with intensity ratings, including temporal lobe regions implicated in memory and emotional functions, such as the hippocampus and amygdala, as well as a midline fronto-cingulo-parietal network implicated in social cognition and self-relevant processing. These data were supported by a general linear model analysis, which revealed additional valence effects in fronto-striatal-insular regions when plays were divided into positive and negative events according to the fan's perspective. Overall, these findings contribute to our understanding of how emotional factors impact distributed neural systems to successfully encode dynamic, personally-relevant event sequences.