992 resultados para similarity search


30.00% 30.00%



Web databases are now pervasive. Such a database can be accessed via its query interface (usually HTML query form) only. Extracting Web query interfaces is a critical step in data integration across multiple Web databases, which creates a formal representation of a query form by extracting a set of query conditions in it. This paper presents a novel approach to extracting Web query interfaces. In this approach, a generic set of query condition rules are created to define query conditions that are semantically equivalent to SQL search conditions. Query condition rules represent the semantic roles that labels and form elements play in query conditions, and how they are hierarchically grouped into constructs of query conditions. To group labels and form elements in a query form, we explore both their structural proximity in the hierarchy of structures in the query form, which is captured by a tree of nested tags in the HTML codes of the form, and their semantic similarity, which is captured by various short texts used in labels, form elements and their properties. We have implemented the proposed approach and our experimental results show that the approach is highly effective.


30.00% 30.00%



An important feature of a database management systems (DBMS) is its client/server architecture, where managing shared memory among the clients and the server is always an tough issue. However, similarity queries are specially sensitive to this kind of architecture, since the answer sizes vary widely. Usually, the answers of similarity query are fully processed to be sent in full to the user, who often is interested in just parts of the answer, e.g. just few elements closer or farther to the query reference. Compelling the DBMS to retrieve the full answer, further ignoring its majority is at least a waste of server processing power. Paging the answer is a technique that splits the answer onto several pages, following client requests. Despite the success of paging on traditional queries, little work has been done to support it in similarity queries. In this work, we present a technique that not only provides paging in similarity range or k-nearest neighbor queries, but also supports them in two variations: the forward similarity query and the backward similarity query. They return elements either increasingly farther of increasingly closer to the query reference. The reported experiments show that, depending on the proportion of the interesting part over the full answer, both techniques allow answering queries much faster than it is obtained in the non-paged way. (C) 2010 Elsevier Inc. All rights reserved.


30.00% 30.00%



Schistosomiasis affects more than 200 million people worldwide; another 600 million are at risk of infection. The schistosomulum stage is believed to be the target of protective immunity in the attenuated cercaria vaccine model. In an attempt to identify genes up-regulated in the schistosomulum stage in relation to cercaria, we explored the Schistosoma mansoni transcriptome by looking at the relative frequency of reads in EST libraries from both stages. The 400 genes potentially up-regulated in schistosomula were analyzed as to their Gene Ontology categorization, and we have focused on those encoding-predicted proteins with no similarity to proteins of other organisms, assuming they could be parasite-specific proteins important for survival in the host. Up-regulation in schistosomulum relative to cercaria was validated with real-time reverse transcription polymerase chain reaction (RT-PCR) for five out of nine selected genes (56%). We tested their protective potential in mice through immunization with DNA vaccines followed by a parasite challenge. Worm burden reductions of 16-17% were observed for one of them, indicating its protective potential. Our results demonstrate the value and caveats of using stage-associated frequency of ESTs as an indication of differential expression coupled to DNA vaccine screening in the identification of novel proteins to be further investigated as potential vaccine candidates.


30.00% 30.00%



Optimum subwindow search for object detection aims to find a subwindow so that the contained subimage is most similar to the query object. This problem can be formulated as a four dimensional (4D) maximum entry search problem wherein each entry corresponds to the quality score of the subimage contained in a subwindow. For n x n images, a naive exhaustive search requires O(n4) sequential computations of the quality scores for all subwindows. To reduce the time complexity, we prove that, for some typical similarity functions like Euclidian metric, χ2 metric on image histograms, the associated 4D array carries some Monge structures and we utilise these properties to speed up the optimum subwindow search and the time complexity is reduced to O(n3). Furthermore, we propose a locally optimal alternating column and row search method with typical quadratic time complexity O(n2). Experiments on PASCAL VOC 2006 demonstrate that the alternating method is significantly faster than the well known efficient subwindow search (ESS) method whilst the performance loss due to local maxima problem is negligible.


30.00% 30.00%



In this paper, the zero-order Sugeno Fuzzy Inference System (FIS) that preserves the monotonicity property is studied. The sufficient conditions for the zero-order Sugeno FIS model to satisfy the monotonicity property are exploited as a set of useful governing equations to facilitate the FIS modelling process. The sufficient conditions suggest a fuzzy partition (at the rule antecedent part) and a monotonically-ordered rule base (at the rule consequent part) that can preserve the monotonicity property. The investigation focuses on the use of two Similarity Reasoning (SR)-based methods, i.e., Analogical Reasoning (AR) and Fuzzy Rule Interpolation (FRI), to deduce each conclusion separately. It is shown that AR and FRI may not be a direct solution to modelling of a multi-input FIS model that fulfils the monotonicity property, owing to the difficulty in getting a set of monotonically-ordered conclusions. As such, a Non-Linear Programming (NLP)-based SR scheme for constructing a monotonicity-preserving multi-input FIS model is proposed. In the proposed scheme, AR or FRI is first used to predict the rule conclusion of each observation. Then, a search algorithm is adopted to look for a set of consequents with minimized root means square errors as compared with the predicted conclusions. A constraint imposed by the sufficient conditions is also included in the search process. Applicability of the proposed scheme to undertaking fuzzy Failure Mode and Effect Analysis (FMEA) tasks is demonstrated. The results indicate that the proposed NLP-based SR scheme is useful for preserving the monotonicity property for building a multi-input FIS model with an incomplete rule base.


30.00% 30.00%



A complete and monotonically-ordered fuzzy rule base is necessary to maintain the monotonicity property of a Fuzzy Inference System (FIS). In this paper, a new monotone fuzzy rule relabeling technique to relabel a non-monotone fuzzy rule base provided by domain experts is proposed. Even though the Genetic Algorithm (GA)-based monotone fuzzy rule relabeling technique has been investigated in our previous work [7], the optimality of the approach could not be guaranteed. The new fuzzy rule relabeling technique adopts a simple brute force search, and it can produce an optimal result. We also formulate a new two-stage framework that encompasses a GA-based rule selection scheme, the optimization based-Similarity Reasoning (SR) scheme, and the proposed monotone fuzzy rule relabeling technique for preserving the monotonicity property of the FIS model. Applicability of the two-stage framework to a real world problem, i.e., failure mode and effect analysis, is further demonstrated. The results clearly demonstrate the usefulness of the proposed framework.


30.00% 30.00%



The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.


30.00% 30.00%



Effective techniques for organizing and visualizing large image collections are in growing demand as visual search gets increasingly popular. iMap is a treemap representation for visualizing and navigating image search and clustering results based on the evaluation of image similarity using both visual and textual information. iMap not only makes effective use of available display area to arrange images but also maintains stable update when images are inserted or removed during the query. A key challenge of using iMap lies in the difficult to follow and track the changes when updating the image arrangement as the query image changes. For many information visualization applications, showing the transition when interacting with the data is critically important as it can help users better perceive the changes and understand the underlying data. This work investigates the effectiveness of animated transition in a tiled image layout where the spiral arrangement of the images is based on their similarity. Three aspects of animated transition are considered, including animation steps, animation actions, and flying paths. Exploring and weighting the advantages and disadvantages of different methods for each aspect and in conjunction with the characteristics of the spiral image layout, we present an integrated solution, called AniMap, for animating the transition from an old layout to a new layout when a different image is selected as the query image. To smooth the animation and reduce the overlap among images during the transition, we explore different factors that might have an impact on the animation and propose our solution accordingly. We show the effectiveness of our animated transition solution by demonstrating experimental results and conducting a comparative user study.


30.00% 30.00%



A tandem mass spectral database system consists of a library of reference spectra and a search program. State-of-the-art search programs show a high tolerance for variability in compound-specific fragmentation patterns produced by collision-induced decomposition and enable sensitive and specific 'identity search'. In this communication, performance characteristics of two search algorithms combined with the 'Wiley Registry of Tandem Mass Spectral Data, MSforID' (Wiley Registry MSMS, John Wiley and Sons, Hoboken, NJ, USA) were evaluated. The search algorithms tested were the MSMS search algorithm implemented in the NIST MS Search program 2.0g (NIST, Gaithersburg, MD, USA) and the MSforID algorithm (John Wiley and Sons, Hoboken, NJ, USA). Sample spectra were acquired on different instruments and, thus, covered a broad range of possible experimental conditions or were generated in silico. For each algorithm, more than 30,000 matches were performed. Statistical evaluation of the library search results revealed that principally both search algorithms can be combined with the Wiley Registry MSMS to create a reliable identification tool. It appears, however, that a higher degree of spectral similarity is necessary to obtain a correct match with the NIST MS Search program. This characteristic of the NIST MS Search program has a positive effect on specificity as it helps to avoid false positive matches (type I errors), but reduces sensitivity. Thus, particularly with sample spectra acquired on instruments differing in their Setup from tandem-in-space type fragmentation, a comparably higher number of false negative matches (type II errors) were observed by searching the Wiley Registry MSMS.


30.00% 30.00%



Today's digital libraries (DLs) archive vast amounts of information in the form of text, videos, images, data measurements, etc. User access to DL content can rely on similarity between metadata elements, or similarity between the data itself (content-based similarity). We consider the problem of exploratory search in large DLs of time-oriented data. We propose a novel approach for overview-first exploration of data collections based on user-selected metadata properties. In a 2D layout representing entities of the selected property are laid out based on their similarity with respect to the underlying data content. The display is enhanced by compact summarizations of underlying data elements, and forms the basis for exploratory navigation of users in the data space. The approach is proposed as an interface for visual exploration, leading the user to discover interesting relationships between data items relying on content-based similarity between data items and their respective metadata labels. We apply the method on real data sets from the earth observation community, showing its applicability and usefulness.


30.00% 30.00%



To initiate homologous recombination, sequence similarity between two DNA molecules must be searched for and homology recognized. How the search for and recognition of homology occurs remains unproven. We have examined the influences of DNA topology and the polarity of RecA–single-stranded (ss)DNA filaments on the formation of synaptic complexes promoted by RecA. Using two complementary methods and various ssDNA and duplex DNA molecules as substrates, we demonstrate that topological constraints on a small circular RecA–ssDNA filament prevent it from interwinding with its duplex DNA target at the homologous region. We were unable to detect homologous pairing between a circular RecA–ssDNA filament and its relaxed or supercoiled circular duplex DNA targets. However, the formation of synaptic complexes between an invading linear RecA–ssDNA filament and covalently closed circular duplex DNAs is promoted by supercoiling of the duplex DNA. The results imply that a triplex structure formed by non-Watson–Crick hydrogen bonding is unlikely to be an intermediate in homology searching promoted by RecA. Rather, a model in which RecA-mediated homology searching requires unwinding of the duplex DNA coupled with local strand exchange is the likely mechanism. Furthermore, we show that polarity of the invading RecA–ssDNA does not affect its ability to pair and interwind with its circular target duplex DNA.


30.00% 30.00%



Previous research in visual search indicates that animal fear-relevant deviants, snakes/spiders, are found faster among non fear-relevant backgrounds, flowers/mushrooms, than vice versa. Moreover, deviant absence was indicated faster among snakes/spiders and detection time for flower/mushroom deviants, but not for snake/spider deviants, increased in larger arrays. The current research indicates that the latter 2 results do not reflect on fear-relevance, but are found only with flower/mushroom controls. These findings may reflect on factors such as background homogeneity, deviant homogeneity, or background-deviant similarity. The current research removes contradictions between previous studies that used animal and social fear-relevant stimuli and indicates that apparent search advantages for fear-relevant deviants seem likely to reflect on delayed attentional disengagement from fear-relevance on control trials.


30.00% 30.00%



Music similarity query based on acoustic content is becoming important with the ever-increasing growth of the music information from emerging applications such as digital libraries and WWW. However, relative techniques are still in their infancy and much less than satisfactory. In this paper, we present a novel index structure, called Composite Feature tree, CF-tree, to facilitate efficient content-based music search adopting multiple musical features. Before constructing the tree structure, we use PCA to transform the extracted features into a new space sorted by the importance of acoustic features. The CF-tree is a balanced multi-way tree structure where each level represents the data space at different dimensionalities. The PCA transformed data and reduced dimensions in the upper levels can alleviate suffering from dimensionality curse. To accurately mimic human perception, an extension, named CF+-tree, is proposed, which further applies multivariable regression to determine the weight of each individual feature. We conduct extensive experiments to evaluate the proposed structures against state-of-art techniques. The experimental results demonstrate superiority of our technique.


30.00% 30.00%



In April 2009, Google Images added a filter for narrowing search results by colour. Several other systems for searching image databases by colour were also released around this time. These colour-based image retrieval systems enable users to search image databases either by selecting colours from a graphical palette (i.e., query-by-colour), by drawing a representation of the colour layout sought (i.e., query-by-sketch), or both. It was comments left by readers of online articles describing these colour-based image retrieval systems that provided us with the inspiration for this research. We were surprised to learn that the underlying query-based technology used in colour-based image retrieval systems today remains remarkably similar to that of systems developed nearly two decades ago. Discovering this ageing retrieval approach, as well as uncovering a large user demographic requiring image search by colour, made us eager to research more effective approaches for colour-based image retrieval. In this thesis, we detail two user studies designed to compare the effectiveness of systems adopting similarity-based visualisations, query-based approaches, or a combination of both, for colour-based image retrieval. In contrast to query-based approaches, similarity-based visualisations display and arrange database images so that images with similar content are located closer together on screen than images with dissimilar content. This removes the need for queries, as users can instead visually explore the database using interactive navigation tools to retrieve images from the database. As we found existing evaluation approaches to be unreliable, we describe how we assessed and compared systems adopting similarity-based visualisations, query-based approaches, or both, meaningfully and systematically using our Mosaic Test - a user-based evaluation approach in which evaluation study participants complete an image mosaic of a predetermined target image using the colour-based image retrieval system under evaluation.