995 resultados para neighbor discovery


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In vivo small molecules as necessary intermediates are involved in numerous critical metabolic pathways and biological processes associated with many essential biological functions and events. There is growing evidence that MS-based metabolomics is emerging as a powerful tool to facilitate the discovery of functional small molecules that can better our understanding of development, infection, nutrition, disease, toxicity, drug therapeutics, gene modifications and host-pathogen interaction from metabolic perspectives. However, further progress must still be made in MS-based metabolomics because of the shortcomings in the current technologies and knowledge. This technique-driven review aims to explore the discovery of in vivo functional small molecules facilitated by MS-based metabolomics and to highlight the analytic capabilities and promising applications of this discovery strategy. Moreover, the biological significance of the discovery of in vivo functional small molecules with different biological contexts is also interrogated at a metabolic perspective.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To enhance the performance of the k-nearest neighbors approach in forecasting short-term traffic volume, this paper proposed and tested a two-step approach with the ability of forecasting multiple steps. In selecting k-nearest neighbors, a time constraint window is introduced, and then local minima of the distances between the state vectors are ranked to avoid overlappings among candidates. Moreover, to control extreme values’ undesirable impact, a novel algorithm with attractive analytical features is developed based on the principle component. The enhanced KNN method has been evaluated using the field data, and our comparison analysis shows that it outperformed the competing algorithms in most cases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Event report on the Open Access and Research 2013 conference which focused on recent developments and the strategic advantages they bring to the research sector.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automated process discovery techniques aim at extracting process models from information system logs. Existing techniques in this space are effective when applied to relatively small or regular logs, but generate spaghetti-like and sometimes inaccurate models when confronted to logs with high variability. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. This leads to a collection of process models – each one representing a variant of the business process – as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity and low fitness. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically using subprocess extraction. Splitting is performed in a controlled manner in order to achieve user-defined complexity or fitness thresholds. Experiments on real-life logs show that the technique produces collections of models substantially smaller than those extracted by applying existing trace clustering techniques, while allowing the user to control the fitness of the resulting models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

More and more traditional manufacturing companies form or join inter-organizational networks to bundle their physical products with related services to offer superior value propositions to their customers. Some of these product-related services can be digitized completely and thus fully delivered electronically. Other services require the physical integration of external factors, but can still be coordinated electronically. In both cases companies and consumers face the problem of discovering appropriate product-related service offerings in the network or market. Based on ideas from the web service discovery discipline we propose a meet-in-the-middle approach between heavy-weight semantic technologies and simple boolean search to address this issue. Our approach is able to consider semantic relations in service descriptions and queries and thus delivers better results than syntax-based search. However – unlike most semantic approaches – it does not require the use of any formal language for semantic markup and thus requires less resources and skills for both service providers and consumers. To fully realize the potentials of the proposed approach a domain ontology is needed. In this research-in-progress paper we construct such an ontology for the domain of product-service bundles through analysis and synthesis of related work on service description. This will serve as an anchor for future research to iteratively improve and evaluate the ontology through collaborative design efforts and practical application.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs). Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha) and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons). Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases) and pathways (terpenoid biosynthesis) is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The striped catfish (Pangasianodon hypophthalmus) culture industry in the Mekong Delta in Vietnam has developed rapidly over the past decade. The culture industry now however, faces some significant challenges, especially related to climate change impacts notably from predicted extensive saltwater intrusion into many low topographical coastal provinces across the Mekong Delta. This problem highlights a need for development of culture stocks that can tolerate more saline culture environments as a response to expansion of saline water-intruded land. While a traditional artificial selection program can potentially address this need, understanding the genomic basis of salinity tolerance can assist development of more productive culture lines. The current study applied a transcriptomic approach using Ion PGM technology to generate expressed sequence tag (EST) resources from the intestine and swim bladder from striped catfish reared at a salinity level of 9 ppt which showed best growth performance. Total sequence data generated was 467.8 Mbp, consisting of 4,116,424 reads with an average length of 112 bp. De novo assembly was employed that generated 51,188 contigs, and allowed identification of 16,116 putative genes based on the GenBank non-redundant database. GO annotation, KEGG pathway mapping, and functional annotation of the EST sequences recovered with a wide diversity of biological functions and processes. In addition, more than 11,600 simple sequence repeats were also detected. This is the first comprehensive analysis of a striped catfish transcriptome, and provides a valuable genomic resource for future selective breeding programs and functional or evolutionary studies of genes that influence salinity tolerance in this important culture species.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider a discrete agent-based model on a one-dimensional lattice, where each agent occupies L sites and attempts movements over a distance of d lattice sites. Agents obey a strict simple exclusion rule. A discrete-time master equation is derived using a mean-field approximation and careful probability arguments. In the continuum limit, nonlinear diffusion equations that describe the average agent occupancy are obtained. Averaged discrete simulation data are generated and shown to compare very well with the solution to the derived nonlinear diffusion equations. This framework allows us to approach a lattice-free result using all the advantages of lattice methods. Since different cell types have different shapes and speeds of movement, this work offers insight into population-level behavior of collective cellular motion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bacteria have mechanisms to export proteins for diverse purposes, including colonization of hosts and pathogenesis. A small number of archetypal bacterial secretion machines have been found in several groups of bacteria and mediate a fundamentally distinct secretion process. Perhaps erroneously, proteins called 'autotransporters' have long been thought to be one of these protein secretion systems. Mounting evidence suggests that autotransporters might be substrates to be secreted, not an autonomous transporter system. We have discovered a new translocation and assembly module (TAM) that promotes efficient secretion of autotransporters in proteobacteria. Functional analysis of the TAM in Citrobacter rodentium, Salmonella enterica and Escherichia coli showed that it consists of an Omp85-family protein, TamA, in the outer membrane and TamB in the inner membrane of diverse bacterial species. The discovery of the TAM provides a new target for the development of therapies to inhibit colonization by bacterial pathogens.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Angiogenesis is indispensable for solid tumor expansion, and thus it has become a major target of cancer research and anti-cancer therapies. Deciphering the arcane actions of various cell populations during tumor angiogenesis requires sophisticated research models, which could capture the dynamics and complexity of the process. There is a continuous need for improvement of existing research models, which engages interdisciplinary approaches of tissue engineering with life sciences. Tireless efforts to develop a new model to study tumor angiogenesis result in innovative solutions, which bring us one step closer to decipher the dubious nature of cancer. This review aims to overview the recent developments, current limitations and future challenges in three-dimensional tissue-engineered models for the study of tumor angiogenesis and for the purpose of elucidating novel targets aimed at anti-cancer drug discovery.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Evolutionary algorithms are playing an increasingly important role as search methods in cognitive science domains. In this study, methodological issues in the use of evolutionary algorithms were investigated via simulations in which procedures were systematically varied to modify the selection pressures on populations of evolving agents. Traditional roulette wheel, tournament, and variations of these selection algorithms were compared on the “needle-in-a-haystack” problem developed by Hinton and Nowlan in their 1987 study of the Baldwin effect. The task is an important one for cognitive science, as it demonstrates the power of learning as a local search technique in smoothing a fitness landscape that lacks gradient information. One aspect that has continued to foster interest in the problem is the observation of residual learning ability in simulated populations even after long periods of time. Effective evolutionary algorithms balance their search effort between broad exploration of the search space and in-depth exploitation of promising solutions already found. Issues discussed include the differential effects of rank and proportional selection, the tradeoff between migration of populations towards good solutions and maintenance of diversity, and the development of measures that illustrate how each selection algorithm affects the search process over generations. We show that both roulette wheel and tournament algorithms can be modified to appropriately balance search between exploration and exploitation, and effectively eliminate residual learning in this problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Weta possess typical Ensifera ears. Each ear comprises three functional parts: two equally sized tympanal membranes, an underlying system of modified tracheal chambers, and the auditory sensory organ, the crista acustica. This organ sits within an enclosed fluid-filled channel-previously presumed to be hemolymph. The role this channel plays in insect hearing is unknown. We discovered that the fluid within the channel is not actually hemolymph, but a medium composed principally of lipid from a new class. Three-dimensional imaging of this lipid channel revealed a previously undescribed tissue structure within the channel, which we refer to as the olivarius organ. Investigations into the function of the olivarius reveal de novo lipid synthesis indicating that it is producing these lipids in situ from acetate. The auditory role of this lipid channel was investigated using Laser Doppler vibrometry of the tympanal membrane, which shows that the displacement of the membrane is significantly increased when the lipid is removed from the auditory system. Neural sensitivity of the system, however, decreased upon removal of the lipid-a surprising result considering that in a typical auditory system both the mechanical and auditory sensitivity are positively correlated. These two results coupled with 3D modelling of the auditory system lead us to hypothesize a model for weta audition, relying strongly on the presence of the lipid channel. This is the first instance of lipids being associated with an auditory system outside of the Odentocete cetaceans, demonstrating convergence for the use of lipids in hearing.