882 resultados para query rewriting
Resumo:
We present external memory data structures for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in R-d, compute the aggregate of the weights of the points that lie inside a d-dimensional orthogonal query rectangle. The aggregates we consider in this paper include COUNT, sum, and MAX. First, we develop a structure for answering two-dimensional range-COUNT queries that uses O(N/B) disk blocks and answers a query in O(log(B) N) I/Os, where N is the number of input points and B is the disk block size. The structure can be extended to obtain a near-linear-size structure for answering range-sum queries using O(log(B) N) I/Os, and a linear-size structure for answering range-MAX queries in O(log(B)(2) N) I/Os. Our structures can be made dynamic and extended to higher dimensions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The problem of identifying user intent has received considerable attention in recent years, particularly in the context of improving the search experience via query contextualization. Intent can be characterized by multiple dimensions, which are often not observed from query words alone. Accurate identification of Intent from query words remains a challenging problem primarily because it is extremely difficult to discover these dimensions. The problem is often significantly compounded due to lack of representative training sample. We present a generic, extensible framework for learning the multi-dimensional representation of user intent from the query words. The approach models the latent relationships between facets using tree structured distribution which leads to an efficient and convergent algorithm, FastQ, for identifying the multi-faceted intent of users based on just the query words. We also incorporated WordNet to extend the system capabilities to queries which contain words that do not appear in the training data. Empirical results show that FastQ yields accurate identification of intent when compared to a gold standard.
Resumo:
This paper considers the problem of identifying the footprints of communication of multiple transmitters in a given geographical area. To do this, a number of sensors are deployed at arbitrary but known locations in the area, and their individual decisions regarding the presence or absence of the transmitters' signal are combined at a fusion center to reconstruct the spatial spectral usage map. One straightforward scheme to construct this map is to query each of the sensors and cluster the sensors that detect the primary's signal. However, using the fact that a typical transmitter footprint map is a sparse image, two novel compressive sensing based schemes are proposed, which require significantly fewer number of transmissions compared to the querying scheme. A key feature of the proposed schemes is that the measurement matrix is constructed from a pseudo-random binary phase shift applied to the decision of each sensor prior to transmission. The measurement matrix is thus a binary ensemble which satisfies the restricted isometry property. The number of measurements needed for accurate footprint reconstruction is determined using compressive sampling theory. The three schemes are compared through simulations in terms of a performance measure that quantifies the accuracy of the reconstructed spatial spectral usage map. It is found that the proposed sparse reconstruction technique-based schemes significantly outperform the round-robin scheme.
Resumo:
Background: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. Methodology/Principal Findings: We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of similar to 100% and Mathew's correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. Conclusions/Significance: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the `bridging' role of related families.
Resumo:
Identifying symmetry in scalar fields is a recent area of research in scientific visualization and computer graphics communities. Symmetry detection techniques based on abstract representations of the scalar field use only limited geometric information in their analysis. Hence they may not be suited for applications that study the geometric properties of the regions in the domain. On the other hand, methods that accumulate local evidence of symmetry through a voting procedure have been successfully used for detecting geometric symmetry in shapes. We extend such a technique to scalar fields and use it to detect geometrically symmetric regions in synthetic as well as real-world datasets. Identifying symmetry in the scalar field can significantly improve visualization and interactive exploration of the data. We demonstrate different applications of the symmetry detection method to scientific visualization: query-based exploration of scalar fields, linked selection in symmetric regions for interactive visualization, and classification of geometrically symmetric regions and its application to anomaly detection.
Resumo:
In the process of service provisioning, providing required service to the user without user intervention, with reduction of the cognitive over loading is a real challenge. In this paper we propose a user centred context aware collaborative service provisioning system, which make use of context along with collaboration to provide the required service to the user dynamically. The system uses a novel approach of query expansion along with interactive and rating matrix based collaboration. Performance of the system is evaluated in Mobile-Commerce environment. The results show that the system is time efficient and perform with better precision and recall in comparison with context aware system.
Resumo:
Approximate Nearest Neighbour Field maps are commonly used by computer vision and graphics community to deal with problems like image completion, retargetting, denoising, etc. In this paper, we extend the scope of usage of ANNF maps to medical image analysis, more specifically to optic disk detection in retinal images. In the analysis of retinal images, optic disk detection plays an important role since it simplifies the segmentation of optic disk and other retinal structures. The proposed approach uses FeatureMatch, an ANNF algorithm, to find the correspondence between a chosen optic disk reference image and any given query image. This correspondence provides a distribution of patches in the query image that are closest to patches in the reference image. The likelihood map obtained from the distribution of patches in query image is used for optic disk detection. The proposed approach is evaluated on five publicly available DIARETDB0, DIARETDB1, DRIVE, STARE and MESSIDOR databases, with total of 1540 images. We show, experimentally, that our proposed approach achieves an average detection accuracy of 99% and an average computation time of 0.2 s per image. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
The complexity in visualizing volumetric data often limits the scope of direct exploration of scalar fields. Isocontour extraction is a popular method for exploring scalar fields because of its simplicity in presenting features in the data. In this paper, we present a novel representation of contours with the aim of studying the similarity relationship between the contours. The representation maps contours to points in a high-dimensional transformation-invariant descriptor space. We leverage the power of this representation to design a clustering based algorithm for detecting symmetric regions in a scalar field. Symmetry detection is a challenging problem because it demands both segmentation of the data and identification of transformation invariant segments. While the former task can be addressed using topological analysis of scalar fields, the latter requires geometry based solutions. Our approach combines the two by utilizing the contour tree for segmenting the data and the descriptor space for determining transformation invariance. We discuss two applications, query driven exploration and asymmetry visualization, that demonstrate the effectiveness of the approach.
Resumo:
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as Protein Blocks (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa.
Resumo:
Given a Boolean function , we say a triple (x, y, x + y) is a triangle in f if . A triangle-free function contains no triangle. If f differs from every triangle-free function on at least points, then f is said to be -far from triangle-free. In this work, we analyze the query complexity of testers that, with constant probability, distinguish triangle-free functions from those -far from triangle-free. Let the canonical tester for triangle-freeness denotes the algorithm that repeatedly picks x and y uniformly and independently at random from , queries f(x), f(y) and f(x + y), and checks whether f(x) = f(y) = f(x + y) = 1. Green showed that the canonical tester rejects functions -far from triangle-free with constant probability if its query complexity is a tower of 2's whose height is polynomial in . Fox later improved the height of the tower in Green's upper bound to . A trivial lower bound of on the query complexity is immediate. In this paper, we give the first non-trivial lower bound for the number of queries needed. We show that, for every small enough , there exists an integer such that for all there exists a function depending on all n variables which is -far from being triangle-free and requires queries for the canonical tester. We also show that the query complexity of any general (possibly adaptive) one-sided tester for triangle-freeness is at least square root of the query complexity of the corresponding canonical tester. Consequently, this means that any one-sided tester for triangle-freeness must make at least queries.
Resumo:
Multicast in wireless sensor networks (WSNs) is an efficient way to spread the same data to multiple sensor nodes. It becomes more effective due to the broadcast nature of wireless link, where a message transmitted from one source is inherently received by all one-hop receivers, and therefore, there is no need to transmit the message one by one. Reliable multicast in WSNs is desirable for critical tasks like code updation and query based data collection. The erroneous nature of wireless medium coupled with limited resource of sensor nodes, makes the design of reliable multicast protocol a challenging task. In this work, we propose a time division multiple access (TDMA) based energy aware media access and control (TEA-MAC) protocol for reliable multicast in WSNs. The TDMA eliminates collisions, overhearing and idle listening, which are the main sources of reliability degradation and energy consumption. Furthermore, the proposed protocol is parametric in the sense that it can be used to trade-off reliability with energy and delay as per the requirement of the underlying applications. The performance of TEA-MAC has been evaluated by simulating it using Castalia network simulator. Simulation results show that TEA-MAC is able to considerably improve the performance of multicast communication in WSNs.
Resumo:
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration-where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to learn how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few ``top-'' results: This result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper, we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.
Resumo:
Integran este número de la revista ponencias presentadas en Studia Hispanica Medievalia VIII : Actas de las X Jornadas Internacionales de Literatura Española Medieval, 2011, y de Homenaje al Quinto Centenario del Cancionero General de Hernando del Castillo.
Resumo:
Integran este número de la revista ponencias presentadas en Studia Hispanica Medievalia VIII: Actas de las IX Jornadas Internacionales de Literatura Española Medieval, 2008, y de Homenaje al Quinto Centenario de Amadis de Gaula
Resumo:
Resumen: En el dominio de la educación existe gran cantidad y diversidad de material multimedial que puede ser utilizado en la enseñanza y que constituye una importante contribución al proceso enseñanzaaprendizaje. Mucho de este material es accesible a través de diferentes repositorios de objetos de aprendizaje, donde cada objeto tiene metadatos descriptivos. Estos metadatos permiten recuperar aquellos objetos que satisfagan no sólo el tema de la consulta, sino también el perfil de usuario, teniendo en cuenta sus características y preferencias. En este trabajo se presenta una propuesta de un sistema recomendador de objetos de aprendizaje que ayuda a un usuario a encontrar