4 resultados para knowlede discovery
em Duke University
Resumo:
An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.
This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.
On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.
In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.
We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,
and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.
In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.
Resumo:
MOTIVATION: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This article develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. RESULTS: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. AVAILABILITY AND IMPLEMENTATION: The related source code and documents are freely available at https://sites.google.com/site/bestumich/issues. CONTACT: yili@umich.edu.
Resumo:
Intratumoral B lymphocytes are an integral part of the lung tumor microenvironment. Interrogation of the antibodies they express may improve our understanding of the host response to cancer and could be useful in elucidating novel molecular targets. We used two strategies to explore the repertoire of intratumoral B cell antibodies. First, we cloned VH and VL genes from single intratumoral B lymphocytes isolated from one lung tumor, expressed the genes as recombinant mAbs, and used the mAbs to identify the cognate tumor antigens. The Igs derived from intratumoral B cells demonstrated class switching, with a mean VH mutation frequency of 4%. Although there was no evidence for clonal expansion, these data are consistent with antigen-driven somatic hypermutation. Individual recombinant antibodies were polyreactive, although one clone demonstrated preferential immunoreactivity with tropomyosin 4 (TPM4). We found that higher levels of TPM4 antibodies were more common in cancer patients, but measurement of TPM4 antibody levels was not a sensitive test for detecting cancer. Second, in an effort to focus our recombinant antibody expression efforts on those B cells that displayed evidence of clonal expansion driven by antigen stimulation, we performed deep sequencing of the Ig genes of B cells collected from seven different tumors. Deep sequencing demonstrated somatic hypermutation but no dominant clones. These strategies may be useful for the study of B cell antibody expression, although identification of a dominant clone and unique therapeutic targets may require extensive investigation.
Resumo:
Constitutive biosynthesis of lipid A via the Raetz pathway is essential for the viability and fitness of Gram-negative bacteria, includingChlamydia trachomatis Although nearly all of the enzymes in the lipid A biosynthetic pathway are highly conserved across Gram-negative bacteria, the cleavage of the pyrophosphate group of UDP-2,3-diacyl-GlcN (UDP-DAGn) to form lipid X is carried out by two unrelated enzymes: LpxH in beta- and gammaproteobacteria and LpxI in alphaproteobacteria. The intracellular pathogenC. trachomatislacks an ortholog for either of these two enzymes, and yet, it synthesizes lipid A and exhibits conservation of genes encoding other lipid A enzymes. Employing a complementation screen against aC. trachomatisgenomic library using a conditional-lethallpxHmutantEscherichia colistrain, we have identified an open reading frame (Ct461, renamedlpxG) encoding a previously uncharacterized enzyme that complements the UDP-DAGn hydrolase function inE. coliand catalyzes the conversion of UDP-DAGn to lipid Xin vitro LpxG shows little sequence similarity to either LpxH or LpxI, highlighting LpxG as the founding member of a third class of UDP-DAGn hydrolases. Overexpression of LpxG results in toxic accumulation of lipid X and profoundly reduces the infectivity ofC. trachomatis, validating LpxG as the long-sought-after UDP-DAGn pyrophosphatase in this prominent human pathogen. The complementation approach presented here overcomes the lack of suitable genetic tools forC. trachomatisand should be broadly applicable for the functional characterization of other essentialC. trachomatisgenes.IMPORTANCEChlamydia trachomatisis a leading cause of infectious blindness and sexually transmitted disease. Due to the lack of robust genetic tools, the functions of manyChlamydiagenes remain uncharacterized, including the essential gene encoding the UDP-DAGn pyrophosphatase activity for the biosynthesis of lipid A, the membrane anchor of lipooligosaccharide and the predominant lipid species of the outer leaflet of the bacterial outer membrane. We designed a complementation screen against theC. trachomatisgenomic library using a conditional-lethal mutant ofE. coliand identified the missing essential gene in the lipid A biosynthetic pathway, which we designatedlpxG We show that LpxG is a member of the calcineurin-like phosphatases and displays robust UDP-DAGn pyrophosphatase activityin vitro Overexpression of LpxG inC. trachomatisleads to the accumulation of the predicted lipid intermediate and reduces bacterial infectivity, validating thein vivofunction of LpxG and highlighting the importance of regulated lipid A biosynthesis inC. trachomatis.