6 resultados para multiple data sources

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.

This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.

On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.

In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.

We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,

and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.

In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: The National Comprehensive Cancer Network and the American Society of Clinical Oncology have established guidelines for the treatment and surveillance of colorectal cancer (CRC), respectively. Considering these guidelines, an accurate and efficient method is needed to measure receipt of care. METHODS: The accuracy and completeness of Veterans Health Administration (VA) administrative data were assessed by comparing them with data manually abstracted during the Colorectal Cancer Care Collaborative (C4) quality improvement initiative for 618 patients with stage I-III CRC. RESULTS: The VA administrative data contained gender, marital, and birth information for all patients but race information was missing for 62.1% of patients. The percent agreement for demographic variables ranged from 98.1-100%. The kappa statistic for receipt of treatments ranged from 0.21 to 0.60 and there was a 96.9% agreement for the date of surgical resection. The percentage of post-diagnosis surveillance events in C4 also in VA administrative data were 76.0% for colonoscopy, 84.6% for physician visit, and 26.3% for carcinoembryonic antigen (CEA) test. CONCLUSIONS: VA administrative data are accurate and complete for non-race demographic variables, receipt of CRC treatment, colonoscopy, and physician visits; but alternative data sources may be necessary to capture patient race and receipt of CEA tests.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Empirical studies of education programs and systems, by nature, rely upon use of student outcomes that are measurable. Often, these come in the form of test scores. However, in light of growing evidence about the long-run importance of other student skills and behaviors, the time has come for a broader approach to evaluating education. This dissertation undertakes experimental, quasi-experimental, and descriptive analyses to examine social, behavioral, and health-related mechanisms of the educational process. My overarching research question is simply, which inside- and outside-the-classroom features of schools and educational interventions are most beneficial to students in the long term? Furthermore, how can we apply this evidence toward informing policy that could effectively reduce stark social, educational, and economic inequalities?

The first study of three assesses mechanisms by which the Fast Track project, a randomized intervention in the early 1990s for high-risk children in four communities (Durham, NC; Nashville, TN; rural PA; and Seattle, WA), reduced delinquency, arrests, and health and mental health service utilization in adolescence through young adulthood (ages 12-20). A decomposition of treatment effects indicates that about a third of Fast Track’s impact on later crime outcomes can be accounted for by improvements in social and self-regulation skills during childhood (ages 6-11), such as prosocial behavior, emotion regulation and problem solving. These skills proved less valuable for the prevention of mental and physical health problems.

The second study contributes new evidence on how non-instructional investments – such as increased spending on school social workers, guidance counselors, and health services – affect multiple aspects of student performance and well-being. Merging several administrative data sources spanning the 1996-2013 school years in North Carolina, I use an instrumental variables approach to estimate the extent to which local expenditure shifts affect students’ academic and behavioral outcomes. My findings indicate that exogenous increases in spending on non-instructional services not only reduce student absenteeism and disciplinary problems (important predictors of long-term outcomes) but also significantly raise student achievement, in similar magnitude to corresponding increases in instructional spending. Furthermore, subgroup analyses suggest that investments in student support personnel such as social workers, health services, and guidance counselors, in schools with concentrated low-income student populations could go a long way toward closing socioeconomic achievement gaps.

The third study examines individual pathways that lead to high school graduation or dropout. It employs a variety of machine learning techniques, including decision trees, random forests with bagging and boosting, and support vector machines, to predict student dropout using longitudinal administrative data from North Carolina. I consider a large set of predictor measures from grades three through eight including academic achievement, behavioral indicators, and background characteristics. My findings indicate that the most important predictors include eighth grade absences, math scores, and age-for-grade as well as early reading scores. Support vector classification (with a high cost parameter and low gamma parameter) predicts high school dropout with the highest overall validity in the testing dataset at 90.1 percent followed by decision trees with boosting and interaction terms at 89.5 percent.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Integrating information from multiple sources is a crucial function of the brain. Examples of such integration include multiple stimuli of different modalties, such as visual and auditory, multiple stimuli of the same modality, such as auditory and auditory, and integrating stimuli from the sensory organs (i.e. ears) with stimuli delivered from brain-machine interfaces.

The overall aim of this body of work is to empirically examine stimulus integration in these three domains to inform our broader understanding of how and when the brain combines information from multiple sources.

First, I examine visually-guided auditory, a problem with implications for the general problem in learning of how the brain determines what lesson to learn (and what lessons not to learn). For example, sound localization is a behavior that is partially learned with the aid of vision. This process requires correctly matching a visual location to that of a sound. This is an intrinsically circular problem when sound location is itself uncertain and the visual scene is rife with possible visual matches. Here, we develop a simple paradigm using visual guidance of sound localization to gain insight into how the brain confronts this type of circularity. We tested two competing hypotheses. 1: The brain guides sound location learning based on the synchrony or simultaneity of auditory-visual stimuli, potentially involving a Hebbian associative mechanism. 2: The brain uses a ‘guess and check’ heuristic in which visual feedback that is obtained after an eye movement to a sound alters future performance, perhaps by recruiting the brain’s reward-related circuitry. We assessed the effects of exposure to visual stimuli spatially mismatched from sounds on performance of an interleaved auditory-only saccade task. We found that when humans and monkeys were provided the visual stimulus asynchronously with the sound but as feedback to an auditory-guided saccade, they shifted their subsequent auditory-only performance toward the direction of the visual cue by 1.3-1.7 degrees, or 22-28% of the original 6 degree visual-auditory mismatch. In contrast when the visual stimulus was presented synchronously with the sound but extinguished too quickly to provide this feedback, there was little change in subsequent auditory-only performance. Our results suggest that the outcome of our own actions is vital to localizing sounds correctly. Contrary to previous expectations, visual calibration of auditory space does not appear to require visual-auditory associations based on synchrony/simultaneity.

My next line of research examines how electrical stimulation of the inferior colliculus influences perception of sounds in a nonhuman primate. The central nucleus of the inferior colliculus is the major ascending relay of auditory information before it reaches the forebrain, and thus an ideal target for understanding low-level information processing prior to the forebrain, as almost all auditory signals pass through the central nucleus of the inferior colliculus before reaching the forebrain. Thus, the inferior colliculus is the ideal structure to examine to understand the format of the inputs into the forebrain and, by extension, the processing of auditory scenes that occurs in the brainstem. Therefore, the inferior colliculus was an attractive target for understanding stimulus integration in the ascending auditory pathway.

Moreover, understanding the relationship between the auditory selectivity of neurons and their contribution to perception is critical to the design of effective auditory brain prosthetics. These prosthetics seek to mimic natural activity patterns to achieve desired perceptual outcomes. We measured the contribution of inferior colliculus (IC) sites to perception using combined recording and electrical stimulation. Monkeys performed a frequency-based discrimination task, reporting whether a probe sound was higher or lower in frequency than a reference sound. Stimulation pulses were paired with the probe sound on 50% of trials (0.5-80 µA, 100-300 Hz, n=172 IC locations in 3 rhesus monkeys). Electrical stimulation tended to bias the animals’ judgments in a fashion that was coarsely but significantly correlated with the best frequency of the stimulation site in comparison to the reference frequency employed in the task. Although there was considerable variability in the effects of stimulation (including impairments in performance and shifts in performance away from the direction predicted based on the site’s response properties), the results indicate that stimulation of the IC can evoke percepts correlated with the frequency tuning properties of the IC. Consistent with the implications of recent human studies, the main avenue for improvement for the auditory midbrain implant suggested by our findings is to increase the number and spatial extent of electrodes, to increase the size of the region that can be electrically activated and provide a greater range of evoked percepts.

My next line of research employs a frequency-tagging approach to examine the extent to which multiple sound sources are combined (or segregated) in the nonhuman primate inferior colliculus. In the single-sound case, most inferior colliculus neurons respond and entrain to sounds in a very broad region of space, and many are entirely spatially insensitive, so it is unknown how the neurons will respond to a situation with more than one sound. I use multiple AM stimuli of different frequencies, which the inferior colliculus represents using a spike timing code. This allows me to measure spike timing in the inferior colliculus to determine which sound source is responsible for neural activity in an auditory scene containing multiple sounds. Using this approach, I find that the same neurons that are tuned to broad regions of space in the single sound condition become dramatically more selective in the dual sound condition, preferentially entraining spikes to stimuli from a smaller region of space. I will examine the possibility that there may be a conceptual linkage between this finding and the finding of receptive field shifts in the visual system.

In chapter 5, I will comment on these findings more generally, compare them to existing theoretical models, and discuss what these results tell us about processing in the central nervous system in a multi-stimulus situation. My results suggest that the brain is flexible in its processing and can adapt its integration schema to fit the available cues and the demands of the task.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation is a three-part analysis examining how the welfare state in advanced Western democracies has responded to recent demographic changes. Specifically, this dissertation investigates two primary relationships, beginning with the influence of government spending on poverty. I analyze two at-risk populations in particular: immigrants and children of single mothers. Next, attention is turned to the influence of individual and environmental traits on preferences for social spending. I focus specifically on religiosity, religious beliefs and religious identity. I pool data from a number of international macro- and micro-data sources including the Luxembourg Income Study (LIS), International Social Survey Program (ISSP), the World Bank Databank, and the OECD Databank. Analyses highlight the power of the welfare state to reduce poverty, but also the effectiveness of specific areas of spending focused on addressing new social risks. While previous research has touted the strength of the welfare state, my analyses highlight the need to consider new social risks and encourage closer attention to how social position affects preferences for the welfare state.