132 resultados para similarity queries
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
An important feature of a database management systems (DBMS) is its client/server architecture, where managing shared memory among the clients and the server is always an tough issue. However, similarity queries are specially sensitive to this kind of architecture, since the answer sizes vary widely. Usually, the answers of similarity query are fully processed to be sent in full to the user, who often is interested in just parts of the answer, e.g. just few elements closer or farther to the query reference. Compelling the DBMS to retrieve the full answer, further ignoring its majority is at least a waste of server processing power. Paging the answer is a technique that splits the answer onto several pages, following client requests. Despite the success of paging on traditional queries, little work has been done to support it in similarity queries. In this work, we present a technique that not only provides paging in similarity range or k-nearest neighbor queries, but also supports them in two variations: the forward similarity query and the backward similarity query. They return elements either increasingly farther of increasingly closer to the query reference. The reported experiments show that, depending on the proportion of the interesting part over the full answer, both techniques allow answering queries much faster than it is obtained in the non-paged way. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Modern database applications are increasingly employing database management systems (DBMS) to store multimedia and other complex data. To adequately support the queries required to retrieve these kinds of data, the DBMS need to answer similarity queries. However, the standard structured query language (SQL) does not provide effective support for such queries. This paper proposes an extension to SQL that seamlessly integrates syntactical constructions to express similarity predicates to the existing SQL syntax and describes the implementation of a similarity retrieval engine that allows posing similarity queries using the language extension in a relational DBM. The engine allows the evaluation of every aspect of the proposed extension, including the data definition language and data manipulation language statements, and employs metric access methods to accelerate the queries. Copyright (c) 2008 John Wiley & Sons, Ltd.
Resumo:
In this work, we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) systems and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to improve the precision of the similarity queries. We refer to the association rule-based method to improve CBIR systems proposed here as Feature selection through Association Rules (FAR). To improve CAD systems, we propose the Image Diagnosis Enhancement through Association rules (IDEA) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can either accelerate the process of diagnosing or to strengthen a hypothesis, increasing the probability of a prescribed treatment be successful. Two new algorithms are proposed to support the IDEA method: to pre-process low-level features and to propose a preliminary diagnosis based on association rules. We performed several experiments to validate the proposed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.
Resumo:
The dynamics of a dissipative vibro-impact system called impact-pair is investigated. This system is similar to Fermi-Ulam accelerator model and consists of an oscillating one-dimensional box containing a point mass moving freely between successive inelastic collisions with the rigid walls of the box. In our numerical simulations, we observed multistable regimes, for which the corresponding basins of attraction present a quite complicated structure with smooth boundary. In addition, we characterize the system in a two-dimensional parameter space by using the largest Lyapunov exponents, identifying self-similar periodic sets. Copyright (C) 2009 Silvio L.T. de Souza et al.
Resumo:
We define a new type of self-similarity for one-parameter families of stochastic processes, which applies to certain important families of processes that are not self-similar in the conventional sense. This includes Hougaard Levy processes such as the Poisson processes, Brownian motions with drift and the inverse Gaussian processes, and some new fractional Hougaard motions defined as moving averages of Hougaard Levy process. Such families have many properties in common with ordinary self-similar processes, including the form of their covariance functions, and the fact that they appear as limits in a Lamperti-type limit theorem for families of stochastic processes.
Resumo:
The ability to discriminate nestmates from non-nestmates is critical to the maintenance of the integrity of social insect colonies. Guard workers compare the chemical cues of an incoming individual with their internal template to determine whether the entrant belongs to their colony. In contrast to honeybees, Apis mellifera, stingless bees have singly mated queens and, therefore, are expected to have a higher chemical homogeneity in their colonies. We tested whether aggressive behaviour of Frieseomelitta varia guards towards nestmate and non-nestmate foragers reflects chemical similarities and dissimilarities, respectively, of cuticular hydrocarbon profiles. We also introduced individuals of Lestrimelitta limao, an obligatory robber species, to test the ability of guards to react effectively to intruders from other taxa. We verified that foraging nestmates were almost invariably accepted, while heterospecific and conspecific non-nestmates were rejected at relatively high rates. However, non-nestmate individuals with higher chemical profile similarity were likely to be accepted by guards. We conclude that guards compare the chemical cuticular blend of incoming individuals and make acceptance decisions according to the similarity of the compounds between the colonies. (c) 2007 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.
Resumo:
We introduced a spectral clustering algorithm based on the bipartite graph model for the Manufacturing Cell Formation problem in [Oliveira S, Ribeiro JFF, Seok SC. A spectral clustering algorithm for manufacturing cell formation. Computers and Industrial Engineering. 2007 [submitted for publication]]. It constructs two similarity matrices; one for parts and one for machines. The algorithm executes a spectral clustering algorithm on each separately to find families of parts and cells of machines. The similarity measure in the approach utilized limited information between parts and between machines. This paper reviews several well-known similarity measures which have been used for Group Technology. Computational clustering results are compared by various performance measures. (C) 2008 The Society of Manufacturing Engineers. Published by Elsevier Ltd. All rights reserved.
Resumo:
A long-standing challenge of content-based image retrieval (CBIR) systems is the definition of a suitable distance function to measure the similarity between images in an application context which complies with the human perception of similarity. In this paper, we present a new family of distance functions, called attribute concurrence influence distances (AID), which serve to retrieve images by similarity. These distances address an important aspect of the psychophysical notion of similarity in comparisons of images: the effect of concurrent variations in the values of different image attributes. The AID functions allow for comparisons of feature vectors by choosing one of two parameterized expressions: one targeting weak attribute concurrence influence and the other for strong concurrence influence. This paper presents the mathematical definition and implementation of the AID family for a two-dimensional feature space and its extension to any dimension. The composition of the AID family with L (p) distance family is considered to propose a procedure to determine the best distance for a specific application. Experimental results involving several sets of medical images demonstrate that, taking as reference the perception of the specialist in the field (radiologist), the AID functions perform better than the general distance functions commonly used in CBIR.
Resumo:
Analysis of floristic similarity relationships between plant communities can detect patterns of species occurrence and also explain conditioning factors. Searching for such patterns, floristic similarity relationships among Atlantic Forest sites situated at Ibiuna Plateau, Sao Paulo state, Brazil, were analyzed by multivariate techniques. Twenty one forest fragments and six sites within a continuous Forest Reserve were included in the analyses. Floristic composition and structure of the tree community (minimum dbh 5 cm) were assessed using the point centered quarter method. Two methods were used for multivariate analysis: Detrended Correspondence Analysis (DCA) and Two-Way Indicator Species Analysis (TWINSPAN). Similarity relationships among the study areas were based on the successional stage of the community and also on spatial proximity. The more similar the successional stage of the communities, the higher the floristic similarity between them, especially if the communities are geographically close. A floristic gradient from north to south was observed, suggesting a transition between biomes, since northern indicator species are mostly heliophytes, occurring also in cerrado vegetation and seasonal semideciduous forest, while southern indicator species are mostly typical ombrophilous and climax species from typical dense evergreen Atlantic Forest.
Resumo:
Only a small fraction of spectra acquired in LC-MS/MS runs matches peptides from target proteins upon database searches. The remaining, operationally termed background, spectra originate from a variety of poorly controlled sources and affect the throughput and confidence of database searches. Here, we report an algorithm and its software implementation that rapidly removes background spectra, regardless of their precise origin. The method estimates the dissimilarity distance between screened MS/MS spectra and unannotated spectra from a partially redundant background library compiled from several control and blank runs. Filtering MS/MS queries enhanced the protein identification capacity when searches lacked spectrum to sequence matching specificity. In sequence-similarity searches it reduced by, on average, 30-fold the number of orphan hits, which were not explicitly related to background protein contaminants and required manual validation. Removing high quality background MS/MS spectra, while preserving in the data set the genuine spectra from target proteins, decreased the false positive rate of stringent database searches and improved the identification of low-abundance proteins.
Resumo:
Each square complex matrix is unitarily similar to an upper triangular matrix with diagonal entries in any prescribed order. Let A = [a(ij)] and B = [b(ij)] be upper triangular n x n matrices that are not similar to direct sums of square matrices of smaller sizes, or are in general position and have the same main diagonal. We prove that A and B are unitarily similar if and only if parallel to h(A(k))parallel to = parallel to h(B(k))parallel to for all h is an element of C vertical bar x vertical bar and k = 1, ..., n, where A(k) := [a(ij)](i.j=1)(k) and B(k) := [b(ij)](i.j=1)(k) are the leading principal k x k submatrices of A and B, and parallel to . parallel to is the Frobenius norm. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
A square matrix is nonderogatory if its Jordan blocks have distinct eigenvalues. We give canonical forms for (1) nonderogatory complex matrices up to unitary similarity, and (2) pairs of complex matrices up to similarity, in which one matrix has distinct eigenvalues. The types of these canonical forms are given by undirected and, respectively, directed graphs with no undirected cycles. (C) 2011 Elsevier Inc. All rights reserved.
2D QSAR and similarity studies on cruzain inhibitors aimed at improving selectivity over cathepsin L
Resumo:
Hologram quantitative structure-activity relationships (HQSAR) were applied to a data set of 41 cruzain inhibitors. The best HQSAR model (Q(2) = 0.77; R-2 = 0.90) employing Surflex-Sim, as training and test sets generator, was obtained using atoms, bonds, and connections as fragment distinctions and 4-7 as fragment size. This model was then used to predict the potencies of 12 test set compounds, giving satisfactory predictive R-2 value of 0,88. The contribution maps obtained from the best HQSAR model are in agreement with the biological activities of the study compounds. The Trypanosoma cruzi cruzain shares high similarity with the mammalian homolog cathepsin L. The selectivity toward cruzam was checked by a database of 123 compounds, which corresponds to the 41 cruzain inhibitors used in the HQSAR model development plus 82 cathepsin L inhibitors. We screened these compounds by ROCS (Rapid Overlay of Chemical Structures), a Gaussian-shape volume overlap filter that can rapidly identify shapes that match the query molecule. Remarkably, ROCS was able to rank the first 37 hits as being only cruzain inhibitors. In addition, the area under the curve (AUC) obtained with ROCS was 0.96, indicating that the method was very efficient to distinguishing between cruzain and cathepsin L inhibitors. (c) 2007 Elsevier Ltd. All rights reserved.