960 resultados para similarity queries
Resumo:
An important feature of a database management systems (DBMS) is its client/server architecture, where managing shared memory among the clients and the server is always an tough issue. However, similarity queries are specially sensitive to this kind of architecture, since the answer sizes vary widely. Usually, the answers of similarity query are fully processed to be sent in full to the user, who often is interested in just parts of the answer, e.g. just few elements closer or farther to the query reference. Compelling the DBMS to retrieve the full answer, further ignoring its majority is at least a waste of server processing power. Paging the answer is a technique that splits the answer onto several pages, following client requests. Despite the success of paging on traditional queries, little work has been done to support it in similarity queries. In this work, we present a technique that not only provides paging in similarity range or k-nearest neighbor queries, but also supports them in two variations: the forward similarity query and the backward similarity query. They return elements either increasingly farther of increasingly closer to the query reference. The reported experiments show that, depending on the proportion of the interesting part over the full answer, both techniques allow answering queries much faster than it is obtained in the non-paged way. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Modern database applications are increasingly employing database management systems (DBMS) to store multimedia and other complex data. To adequately support the queries required to retrieve these kinds of data, the DBMS need to answer similarity queries. However, the standard structured query language (SQL) does not provide effective support for such queries. This paper proposes an extension to SQL that seamlessly integrates syntactical constructions to express similarity predicates to the existing SQL syntax and describes the implementation of a similarity retrieval engine that allows posing similarity queries using the language extension in a relational DBM. The engine allows the evaluation of every aspect of the proposed extension, including the data definition language and data manipulation language statements, and employs metric access methods to accelerate the queries. Copyright (c) 2008 John Wiley & Sons, Ltd.
Resumo:
In this paper, we present a novel approach to perform similarity queries over medical images, maintaining the semantics of a given query posted by the user. Content-based image retrieval systems relying on relevance feedback techniques usually request the users to label relevant/irrelevant images. Thus, we present a highly effective strategy to survey user profiles, taking advantage of such labeling to implicitly gather the user perceptual similarity. The profiles maintain the settings desired for each user, allowing tuning of the similarity assessment, which encompasses the dynamic change of the distance function employed through an interactive process. Experiments on medical images show that the method is effective and can improve the decision making process during analysis.
Resumo:
In this work, we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) systems and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to improve the precision of the similarity queries. We refer to the association rule-based method to improve CBIR systems proposed here as Feature selection through Association Rules (FAR). To improve CAD systems, we propose the Image Diagnosis Enhancement through Association rules (IDEA) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can either accelerate the process of diagnosing or to strengthen a hypothesis, increasing the probability of a prescribed treatment be successful. Two new algorithms are proposed to support the IDEA method: to pre-process low-level features and to propose a preliminary diagnosis based on association rules. We performed several experiments to validate the proposed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
In April 2009, Google Images added a filter for narrowing search results by colour. Several other systems for searching image databases by colour were also released around this time. These colour-based image retrieval systems enable users to search image databases either by selecting colours from a graphical palette (i.e., query-by-colour), by drawing a representation of the colour layout sought (i.e., query-by-sketch), or both. It was comments left by readers of online articles describing these colour-based image retrieval systems that provided us with the inspiration for this research. We were surprised to learn that the underlying query-based technology used in colour-based image retrieval systems today remains remarkably similar to that of systems developed nearly two decades ago. Discovering this ageing retrieval approach, as well as uncovering a large user demographic requiring image search by colour, made us eager to research more effective approaches for colour-based image retrieval. In this thesis, we detail two user studies designed to compare the effectiveness of systems adopting similarity-based visualisations, query-based approaches, or a combination of both, for colour-based image retrieval. In contrast to query-based approaches, similarity-based visualisations display and arrange database images so that images with similar content are located closer together on screen than images with dissimilar content. This removes the need for queries, as users can instead visually explore the database using interactive navigation tools to retrieve images from the database. As we found existing evaluation approaches to be unreliable, we describe how we assessed and compared systems adopting similarity-based visualisations, query-based approaches, or both, meaningfully and systematically using our Mosaic Test - a user-based evaluation approach in which evaluation study participants complete an image mosaic of a predetermined target image using the colour-based image retrieval system under evaluation.
Resumo:
Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.
Resumo:
The dynamics of a dissipative vibro-impact system called impact-pair is investigated. This system is similar to Fermi-Ulam accelerator model and consists of an oscillating one-dimensional box containing a point mass moving freely between successive inelastic collisions with the rigid walls of the box. In our numerical simulations, we observed multistable regimes, for which the corresponding basins of attraction present a quite complicated structure with smooth boundary. In addition, we characterize the system in a two-dimensional parameter space by using the largest Lyapunov exponents, identifying self-similar periodic sets. Copyright (C) 2009 Silvio L.T. de Souza et al.
Resumo:
We define a new type of self-similarity for one-parameter families of stochastic processes, which applies to certain important families of processes that are not self-similar in the conventional sense. This includes Hougaard Levy processes such as the Poisson processes, Brownian motions with drift and the inverse Gaussian processes, and some new fractional Hougaard motions defined as moving averages of Hougaard Levy process. Such families have many properties in common with ordinary self-similar processes, including the form of their covariance functions, and the fact that they appear as limits in a Lamperti-type limit theorem for families of stochastic processes.
Resumo:
The ability to discriminate nestmates from non-nestmates is critical to the maintenance of the integrity of social insect colonies. Guard workers compare the chemical cues of an incoming individual with their internal template to determine whether the entrant belongs to their colony. In contrast to honeybees, Apis mellifera, stingless bees have singly mated queens and, therefore, are expected to have a higher chemical homogeneity in their colonies. We tested whether aggressive behaviour of Frieseomelitta varia guards towards nestmate and non-nestmate foragers reflects chemical similarities and dissimilarities, respectively, of cuticular hydrocarbon profiles. We also introduced individuals of Lestrimelitta limao, an obligatory robber species, to test the ability of guards to react effectively to intruders from other taxa. We verified that foraging nestmates were almost invariably accepted, while heterospecific and conspecific non-nestmates were rejected at relatively high rates. However, non-nestmate individuals with higher chemical profile similarity were likely to be accepted by guards. We conclude that guards compare the chemical cuticular blend of incoming individuals and make acceptance decisions according to the similarity of the compounds between the colonies. (c) 2007 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.
Resumo:
Skimming flows on stepped spillways are characterised by a significant rate of turbulent dissipation on the chute. Herein an advanced signal processing of traditional conductivity probe signals is developed to provide further details on the turbulent time and length scales. The technique is applied to a 22° stepped chute operating with flow Reynolds numbers between 3.8 and 7.1 E+5. The new correlation analyses yielded a characterisation of large eddies advecting the bubbles. The turbulent length scales were related to the characteristic depth Y90. Some self-similar relationships were observed systematically at both macroscopic and microscopic levels. These included the distributions of void fraction, bubble count rate, interfacial velocity and turbulence level, and turbulence time and length scales. The self-similarity results were significant because they provided a picture general enough to be used to characterise the air-water flow field in prototype spillways.
Resumo:
In an open channel, the transition from super- to sub-critical flow is a flow singularity (the hydraulic jump) characterised by a sharp rise in free-surface elevation, strong turbulence and air entrainment in the roller. A key feature of the hydraulic jump flow is the strong free-surface aeration and air-water flow turbulence. In the present study, similar experiments were conducted with identical inflow Froude numbers Fr1 using a geometric scaling ratio of 2:1. The results of the Froude-similar experiments showed some drastic scale effects in the smaller hydraulic jumps in terms of void fraction, bubble count rate and bubble chord time distributions. Void fraction distributions implied comparatively greater detrainment at low Reynolds numbers yielding some lesser aeration of the jump roller. The dimensionless bubble count rates were significantly lower in the smaller channel, especially in the mixing layer. The bubble chord time distributions were quantitatively close in both channels, and they were not scaled according to a Froude similitude. Simply the hydraulic jump remains a fascinating two-phase flow motion that is still poorly understood.
Resumo:
Two experiments examined the effects of interpersonal and group-based similarity on perceived self-other differences in persuasibility (i.e. on third-person effects, Davison, 1983). Results of Experiment 1 (N=121), based on experimentally-created groups, indicated that third-person perceptions with respect to the impact of televised product ads were accentuated when the comparison was made with interpersonally different others. Contrary to predictions, third-person perceptions were not affected by group-based similarity (i.e. ingroup or outgroup other). Results of Experiment 2 (N = 102), based an an enduring social identity, indicated that both interpersonal and group-based similarity moderated perceptions of the impact on self and other of least-liked product ads. Overall, third-person effects were more pronounced with respect to interpersonally dissimilar others. However, when social identity was salient, information about interpersonal similarity of the target did not affect perceived self-other differences with respect to ingroup targets. Results also highlighted significant differences in third-person perceptions according to the perceiver's affective evaluation of the persuasive message. (C) 1998 John Wiley & Sons, Ltd.