8 resultados para Semantic neighbour discovery
em Duke University
Resumo:
An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.
This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.
On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.
In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.
We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,
and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.
In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.
Resumo:
Researchers currently debate whether new semantic knowledge can be learned and retrieved despite extensive damage to medial temporal lobe (MTL) structures. The authors explored whether H. M., a patient with amnesia, could acquire new semantic information in the context of his lifelong hobby of solving crossword puzzles. First, H. M. was tested on a series of word-skills tests believed important in solving crosswords. He also completed 3 new crosswords: 1 puzzle testing pre-1953 knowledge, another testing post-1953 knowledge, and another combining the 2 by giving postoperative semantic clues for preoperative answers. From the results, the authors concluded that H. M. can acquire new semantic knowledge, at least temporarily, when he can anchor it to mental representations established preoperatively.
Resumo:
Undergraduates were asked to generate a name for a hypothetical new exemplar of a category. They produced names that had the same numbers of syllables, the same endings, and the same types of word stems as existing exemplars of that category. In addition, novel exemplars, each consisting of a nonsense syllable root and a prototypical ending, were accurately assigned to categories. The data demonstrate the abstraction and use of surface properties of words.
Resumo:
Cognitive neuroscience, as a discipline, links the biological systems studied by neuroscience to the processing constructs studied by psychology. By mapping these relations throughout the literature of cognitive neuroscience, we visualize the semantic structure of the discipline and point to directions for future research that will advance its integrative goal. For this purpose, network text analyses were applied to an exhaustive corpus of abstracts collected from five major journals over a 30-month period, including every study that used fMRI to investigate psychological processes. From this, we generate network maps that illustrate the relationships among psychological and anatomical terms, along with centrality statistics that guide inferences about network structure. Three terms--prefrontal cortex, amygdala, and anterior cingulate cortex--dominate the network structure with their high frequency in the literature and the density of their connections with other neuroanatomical terms. From network statistics, we identify terms that are understudied compared with their importance in the network (e.g., insula and thalamus), are underspecified in the language of the discipline (e.g., terms associated with executive function), or are imperfectly integrated with other concepts (e.g., subdisciplines like decision neuroscience that are disconnected from the main network). Taking these results as the basis for prescriptive recommendations, we conclude that semantic analyses provide useful guidance for cognitive neuroscience as a discipline, both by illustrating systematic biases in the conduct and presentation of research and by identifying directions that may be most productive for future research.
Resumo:
MOTIVATION: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This article develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. RESULTS: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. AVAILABILITY AND IMPLEMENTATION: The related source code and documents are freely available at https://sites.google.com/site/bestumich/issues. CONTACT: yili@umich.edu.
Resumo:
Intratumoral B lymphocytes are an integral part of the lung tumor microenvironment. Interrogation of the antibodies they express may improve our understanding of the host response to cancer and could be useful in elucidating novel molecular targets. We used two strategies to explore the repertoire of intratumoral B cell antibodies. First, we cloned VH and VL genes from single intratumoral B lymphocytes isolated from one lung tumor, expressed the genes as recombinant mAbs, and used the mAbs to identify the cognate tumor antigens. The Igs derived from intratumoral B cells demonstrated class switching, with a mean VH mutation frequency of 4%. Although there was no evidence for clonal expansion, these data are consistent with antigen-driven somatic hypermutation. Individual recombinant antibodies were polyreactive, although one clone demonstrated preferential immunoreactivity with tropomyosin 4 (TPM4). We found that higher levels of TPM4 antibodies were more common in cancer patients, but measurement of TPM4 antibody levels was not a sensitive test for detecting cancer. Second, in an effort to focus our recombinant antibody expression efforts on those B cells that displayed evidence of clonal expansion driven by antigen stimulation, we performed deep sequencing of the Ig genes of B cells collected from seven different tumors. Deep sequencing demonstrated somatic hypermutation but no dominant clones. These strategies may be useful for the study of B cell antibody expression, although identification of a dominant clone and unique therapeutic targets may require extensive investigation.
Resumo:
Constitutive biosynthesis of lipid A via the Raetz pathway is essential for the viability and fitness of Gram-negative bacteria, includingChlamydia trachomatis Although nearly all of the enzymes in the lipid A biosynthetic pathway are highly conserved across Gram-negative bacteria, the cleavage of the pyrophosphate group of UDP-2,3-diacyl-GlcN (UDP-DAGn) to form lipid X is carried out by two unrelated enzymes: LpxH in beta- and gammaproteobacteria and LpxI in alphaproteobacteria. The intracellular pathogenC. trachomatislacks an ortholog for either of these two enzymes, and yet, it synthesizes lipid A and exhibits conservation of genes encoding other lipid A enzymes. Employing a complementation screen against aC. trachomatisgenomic library using a conditional-lethallpxHmutantEscherichia colistrain, we have identified an open reading frame (Ct461, renamedlpxG) encoding a previously uncharacterized enzyme that complements the UDP-DAGn hydrolase function inE. coliand catalyzes the conversion of UDP-DAGn to lipid Xin vitro LpxG shows little sequence similarity to either LpxH or LpxI, highlighting LpxG as the founding member of a third class of UDP-DAGn hydrolases. Overexpression of LpxG results in toxic accumulation of lipid X and profoundly reduces the infectivity ofC. trachomatis, validating LpxG as the long-sought-after UDP-DAGn pyrophosphatase in this prominent human pathogen. The complementation approach presented here overcomes the lack of suitable genetic tools forC. trachomatisand should be broadly applicable for the functional characterization of other essentialC. trachomatisgenes.IMPORTANCEChlamydia trachomatisis a leading cause of infectious blindness and sexually transmitted disease. Due to the lack of robust genetic tools, the functions of manyChlamydiagenes remain uncharacterized, including the essential gene encoding the UDP-DAGn pyrophosphatase activity for the biosynthesis of lipid A, the membrane anchor of lipooligosaccharide and the predominant lipid species of the outer leaflet of the bacterial outer membrane. We designed a complementation screen against theC. trachomatisgenomic library using a conditional-lethal mutant ofE. coliand identified the missing essential gene in the lipid A biosynthetic pathway, which we designatedlpxG We show that LpxG is a member of the calcineurin-like phosphatases and displays robust UDP-DAGn pyrophosphatase activityin vitro Overexpression of LpxG inC. trachomatisleads to the accumulation of the predicted lipid intermediate and reduces bacterial infectivity, validating thein vivofunction of LpxG and highlighting the importance of regulated lipid A biosynthesis inC. trachomatis.
Resumo:
Despite knowing a familiar individual (such as a daughter) well, anecdotal evidence suggests that naming errors can occur among very familiar individuals. Here, we investigate the conditions surrounding these types of errors, or misnamings, in which a person (the misnamer) incorrectly calls a familiar individual (the misnamed) by someone else's name (the named). Across 5 studies including over 1,700 participants, we investigated the prevalence of the phenomenon of misnaming, identified factors underlying why it may occur, and tested potential mechanisms. We included undergraduates and MTurk workers and asked questions of both the misnamed and the misnamer. We find that familiar individuals are often misnamed with the name of another member of the same semantic category; family members are misnamed with another family member's name and friends are misnamed with another friend's name. Phonetic similarity between names also leads to misnamings; however, the size of this effect was smaller than that of the semantic category effect. Overall, the misnaming of familiar individuals is driven by the relationship between the misnamer, misnamed, and named; phonetic similarity between the incorrect name used by the misnamer and the correct name also plays a role in misnaming.