985 resultados para INFORMATION DISCOVERY


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automated process discovery techniques aim at extracting process models from information system logs. Existing techniques in this space are effective when applied to relatively small or regular logs, but generate spaghetti-like and sometimes inaccurate models when confronted to logs with high variability. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. This leads to a collection of process models – each one representing a variant of the business process – as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity and low fitness. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically using subprocess extraction. Splitting is performed in a controlled manner in order to achieve user-defined complexity or fitness thresholds. Experiments on real-life logs show that the technique produces collections of models substantially smaller than those extracted by applying existing trace clustering techniques, while allowing the user to control the fitness of the resulting models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

More and more traditional manufacturing companies form or join inter-organizational networks to bundle their physical products with related services to offer superior value propositions to their customers. Some of these product-related services can be digitized completely and thus fully delivered electronically. Other services require the physical integration of external factors, but can still be coordinated electronically. In both cases companies and consumers face the problem of discovering appropriate product-related service offerings in the network or market. Based on ideas from the web service discovery discipline we propose a meet-in-the-middle approach between heavy-weight semantic technologies and simple boolean search to address this issue. Our approach is able to consider semantic relations in service descriptions and queries and thus delivers better results than syntax-based search. However – unlike most semantic approaches – it does not require the use of any formal language for semantic markup and thus requires less resources and skills for both service providers and consumers. To fully realize the potentials of the proposed approach a domain ontology is needed. In this research-in-progress paper we construct such an ontology for the domain of product-service bundles through analysis and synthesis of related work on service description. This will serve as an anchor for future research to iteratively improve and evaluate the ontology through collaborative design efforts and practical application.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Evolutionary algorithms are playing an increasingly important role as search methods in cognitive science domains. In this study, methodological issues in the use of evolutionary algorithms were investigated via simulations in which procedures were systematically varied to modify the selection pressures on populations of evolving agents. Traditional roulette wheel, tournament, and variations of these selection algorithms were compared on the “needle-in-a-haystack” problem developed by Hinton and Nowlan in their 1987 study of the Baldwin effect. The task is an important one for cognitive science, as it demonstrates the power of learning as a local search technique in smoothing a fitness landscape that lacks gradient information. One aspect that has continued to foster interest in the problem is the observation of residual learning ability in simulated populations even after long periods of time. Effective evolutionary algorithms balance their search effort between broad exploration of the search space and in-depth exploitation of promising solutions already found. Issues discussed include the differential effects of rank and proportional selection, the tradeoff between migration of populations towards good solutions and maintenance of diversity, and the development of measures that illustrate how each selection algorithm affects the search process over generations. We show that both roulette wheel and tournament algorithms can be modified to appropriately balance search between exploration and exploitation, and effectively eliminate residual learning in this problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Collections of biological specimens are fundamental to scientific understanding and characterization of natural diversity - past, present and future. This paper presents a system for liberating useful information from physical collections by bringing specimens into the digital domain so they can be more readily shared, analyzed, annotated and compared. It focuses on insects and is strongly motivated by the desire to accelerate and augment current practices in insect taxonomy which predominantly use text, 2D diagrams and images to describe and characterize species. While these traditional kinds of descriptions are informative and useful, they cannot cover insect specimens "from all angles" and precious specimens are still exchanged between researchers and collections for this reason. Furthermore, insects can be complex in structure and pose many challenges to computer vision systems. We present a new prototype for a practical, cost-effective system of off-the-shelf components to acquire natural-colour 3D models of insects from around 3 mm to 30 mm in length. ("Natural-colour" is used to contrast with "false-colour", i.e., colour generated from, or applied to, gray-scale data post-acquisition.) Colour images are captured from different angles and focal depths using a digital single lens reflex (DSLR) camera rig and two-axis turntable. These 2D images are processed into 3D reconstructions using software based on a visual hull algorithm. The resulting models are compact (around 10 megabytes), afford excellent optical resolution, and can be readily embedded into documents and web pages, as well as viewed on mobile devices. The system is portable, safe, relatively affordable, and complements the sort of volumetric data that can be acquired by computed tomography. This system provides a new way to augment the description and documentation of insect species holotypes, reducing the need to handle or ship specimens. It opens up new opportunities to collect data for research, education, art, entertainment, biodiversity assessment and biosecurity control. © 2014 Nguyen et al.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses the following key messages. Taxonomy is (and taxonomists are) more important than ever in times of global change. Taxonomic endeavour is not occurring fast enough: in 250 years since the creation of the Linnean Systema Naturae, only about 20% of Earth's species have been named. We need fundamental changes to the taxonomic process and paradigm to increase taxonomic productivity by orders of magnitude. Currently, taxonomic productivity is limited principally by the rate at which we capture and manage morphological information to enable species discovery. Many recent (and welcomed) initiatives in managing and delivering biodiversity information and accelerating the taxonomic process do not address this bottleneck. Development of computational image analysis and feature extraction methods is a crucial missing capacity needed to enable taxonomists to overcome the taxonomic impediment in a meaningful time frame. Copyright © 2009 Magnolia Press.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Existing techniques for automated discovery of process models from event logs gen- erally produce flat process models. Thus, they fail to exploit the notion of subprocess as well as error handling and repetition constructs provided by contemporary process modeling notations, such as the Business Process Model and Notation (BPMN). This paper presents a technique for automated discovery of hierarchical BPMN models con- taining interrupting and non-interrupting boundary events and activity markers. The technique employs functional and inclusion dependency discovery techniques in order to elicit a process-subprocess hierarchy from the event log. Given this hierarchy and the projected logs associated to each node in the hierarchy, parent process and subprocess models are then discovered using existing techniques for flat process model discovery. Finally, the resulting models and logs are heuristically analyzed in order to identify boundary events and markers. By employing approximate dependency discovery tech- niques, it is possible to filter out noise in the event log arising for example from data entry errors or missing events. A validation with one synthetic and two real-life logs shows that process models derived by the proposed technique are more accurate and less complex than those derived with flat process discovery techniques. Meanwhile, a validation on a family of synthetically generated logs shows that the technique is resilient to varying levels of noise.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human brain connectivity is disrupted in a wide range of disorders from Alzheimer's disease to autism but little is known about which specific genes affect it. Here we conducted a genome-wide association for connectivity matrices that capture information on the density of fiber connections between 70 brain regions. We scanned a large twin cohort (N=366) with 4-Tesla high angular resolution diffusion imaging (105-gradient HARDI). Using whole brain HARDI tractography, we extracted a relatively sparse 70×70 matrix representing fiber density between all pairs of cortical regions automatically labeled in co-registered anatomical scans. Additive genetic factors accounted for 1-58% of the variance in connectivity between 90 (of 122) tested nodes. We discovered genome-wide significant associations between variants and connectivity. GWAS permutations at various levels of heritability, and split-sample replication, validated our genetic findings. The resulting genes may offer new leads for mechanisms influencing aberrant connectivity and neurodegeneration. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the ever increasing amount of eHealth data available from various eHealth systems and sources, Health Big Data Analytics promises enticing benefits such as enabling the discovery of new treatment options and improved decision making. However, concerns over the privacy of information have hindered the aggregation of this information. To address these concerns, we propose the use of Information Accountability protocols to provide patients with the ability to decide how and when their data can be shared and aggregated for use in big data research. In this paper, we discuss the issues surrounding Health Big Data Analytics and propose a consent-based model to address privacy concerns to aid in achieving the promised benefits of Big Data in eHealth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A major challenge in human genetics is to devise a systematic strategy to integrate disease-associated variants with diverse genomic and biological data sets to provide insight into disease pathogenesis and guide drug discovery for complex traits such as rheumatoid arthritis (RA)1. Here we performed a genome-wide association study meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 RA cases and 73,758 controls), by evaluating ~10 million single-nucleotide polymorphisms. We discovered 42 novel RA risk loci at a genome-wide level of significance, bringing the total to 101 (refs 2, 3, 4). We devised an in silico pipeline using established bioinformatics methods based on functional annotation5, cis-acting expression quantitative trait loci6 and pathway analyses7, 8, 9—as well as novel methods based on genetic overlap with human primary immunodeficiency, haematological cancer somatic mutations and knockout mouse phenotypes—to identify 98 biological candidate genes at these 101 risk loci. We demonstrate that these genes are the targets of approved therapies for RA, and further suggest that drugs approved for other indications may be repurposed for the treatment of RA. Together, this comprehensive genetic study sheds light on fundamental genes, pathways and cell types that contribute to RA pathogenesis, and provides empirical evidence that the genetics of RA can provide important information for drug discovery.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research is a step forward in discovering knowledge from databases of complex structure like tree or graph. Several data mining algorithms are developed based on a novel representation called Balanced Optimal Search for extracting implicit, unknown and potentially useful information like patterns, similarities and various relationships from tree data, which are also proved to be advantageous in analysing big data. This thesis focuses on analysing unordered tree data, which is robust to data inconsistency, irregularity and swift information changes, hence, in the era of big data it becomes a popular and widely used data model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Termites play a major role in foraging and degradation of plant biomass as well as cultivating bioactive microorganisms for their defense. Current advances in “omics” sciences are revealing insights into function-related presence of these symbionts, and their related biosynthetic activities and genes identified in gut symbiotic bacteria might offer a significant potential for biotechnology and biodiscovery. Actinomycetes have been the major producers of bioactive compounds with an extraordinary range of biological activities. These metabolites have been in use as anticancer agents, immune suppressants, and most notably, as antibiotics. Insect-associated actinomycetes have also been reported to produce a range of antibiotics such as dentigerumycin and mycangimycin. Advances in genomics targeting a single species of the unculturable microbial members are currently aiding an improved understanding of the symbiotic interrelationships among the gut microorganisms as well as revealing the taxonomical identity and functions of the complex multilayered symbiotic actinofloral layers. If combined with target-directed approaches, these molecular advances can provide guidance towards the design of highly selective culturing methods to generate further information related to the physiology and growth requirements of these bioactive actinomycetes associated with the termite guts. This chapter provides an overview on the termite gut symbiotic actinoflora in the light of current advances in the “omics” science, with examples of their detection and selective isolation from the guts of the Sunshine Coast regional termite Coptotermes lacteus in Queensland, Australia