139 resultados para query
Resumo:
Estimates of predicate selectivities by database query optimizers often differ significantly from those actually encountered during query execution, leading to poor plan choices and inflated response times. In this paper, we investigate mitigating this problem by replacing selectivity error-sensitive plan choices with alternative plans that provide robust performance. Our approach is based on the recent observation that even the complex and dense "plan diagrams" associated with industrial-strength optimizers can be efficiently reduced to "anorexic" equivalents featuring only a few plans, without materially impacting query processing quality. Extensive experimentation with a rich set of TPC-H and TPC-DS-based query templates in a variety of database environments indicate that plan diagram reduction typically retains plans that are substantially resistant to selectivity errors on the base relations. However, it can sometimes also be severely counter-productive, with the replacements performing much worse. We address this problem through a generalized mathematical characterization of plan cost behavior over the parameter space, which lends itself to efficient criteria of when it is safe to reduce. Our strategies are fully non-invasive and have been implemented in the Picasso optimizer visualization tool.
Resumo:
Fully structured and matured open source spatial and temporal analysis technology seems to be the official carrier of the future for planning of the natural resources especially in the developing nations. This technology has gained enormous momentum because of technical superiority, affordability and ability to join expertise from all sections of the society. Sustainable development of a region depends on the integrated planning approaches adopted in decision making which requires timely and accurate spatial data. With the increased developmental programmes, the need for appropriate decision support system has increased in order to analyse and visualise the decisions associated with spatial and temporal aspects of natural resources. In this regard Geographic Information System (GIS) along with remote sensing data support the applications that involve spatial and temporal analysis on digital thematic maps and the remotely sensed images. Open source GIS would help in wide scale applications involving decisions at various hierarchical levels (for example from village panchayat to planning commission) on economic viability, social acceptance apart from technical feasibility. GRASS (Geographic Resources Analysis Support System, http://wgbis.ces.iisc.ernet.in/grass) is an open source GIS that works on Linux platform (freeware), but most of the applications are in command line argument, necessitating a user friendly and cost effective graphical user interface (GUI). Keeping these aspects in mind, Geographic Resources Decision Support System (GRDSS) has been developed with functionality such as raster, topological vector, image processing, statistical analysis, geographical analysis, graphics production, etc. This operates through a GUI developed in Tcltk (Tool command language / Tool kit) under Linux as well as with a shell in X-Windows. GRDSS include options such as Import /Export of different data formats, Display, Digital Image processing, Map editing, Raster Analysis, Vector Analysis, Point Analysis, Spatial Query, which are required for regional planning such as watershed Analysis, Landscape Analysis etc. This is customised to Indian context with an option to extract individual band from the IRS (Indian Remote Sensing Satellites) data, which is in BIL (Band Interleaved by Lines) format. The integration of PostgreSQL (a freeware) in GRDSS aids as an efficient database management system.
Resumo:
The INFORMATION SYSTEM with user friendly GUI’s (Graphical user Interface) is developed to maintain the flora data and generate reports for Sharavathi River Basin. The database consists of the information related to trees, herbs, shrubs and climbers. The data is based on the primary field survey and the information available in flora of Shimoga, Karnataka and Hassan flora. User friendly query options based on dichotomous keys are provided to help user to retrieve the data while data entry options aid in updating and editing the database at family, genus and species levels.
Resumo:
A "plan diagram" is a pictorial enumeration of the execution plan choices of a database query optimizer over the relational selectivity space. We have shown recently that, for industrial-strength database engines, these diagrams are often remarkably complex and dense, with a large number of plans covering the space. However, they can often be reduced to much simpler pictures, featuring significantly fewer plans, without materially affecting the query processing quality. Plan reduction has useful implications for the design and usage of query optimizers, including quantifying redundancy in the plan search space, enhancing useability of parametric query optimization, identifying error-resistant and least-expected-cost plans, and minimizing the overheads of multi-plan approaches. We investigate here the plan reduction issue from theoretical, statistical and empirical perspectives. Our analysis shows that optimal plan reduction, w.r.t. minimizing the number of plans, is an NP-hard problem in general, and remains so even for a storage-constrained variant. We then present a greedy reduction algorithm with tight and optimal performance guarantees, whose complexity scales linearly with the number of plans in the diagram for a given resolution. Next, we devise fast estimators for locating the best tradeoff between the reduction in plan cardinality and the impact on query processing quality. Finally, extensive experimentation with a suite of multi-dimensional TPCH-based query templates on industrial-strength optimizers demonstrates that complex plan diagrams easily reduce to "anorexic" (small absolute number of plans) levels incurring only marginal increases in the estimated query processing costs.
Resumo:
The Ulam’s problem is a two person game in which one of the player tries to search, in minimum queries, a number thought by the other player. Classically the problem scales polynomially with the size of the number. The quantum version of the Ulam’s problem has a query complexity that is independent of the dimension of the search space. The experimental implementation of the quantum Ulam’s problem in a Nuclear Magnetic Resonance Information Processor with 3 quantum bits is reported here.
Resumo:
Fragment Finder 2.0 is a web-based interactive computing server which can be used to retrieve structurally similar protein fragments from 25 and 90% nonredundant data sets. The computing server identifies structurally similar fragments using the protein backbone C alpha angles. In addition, the identified fragments can be superimposed using either of the two structural superposition programs, STAMP and PROFIT, provided in the server. The freely available Java plug-in Jmol has been interfaced with the server for the visualization of the query and superposed fragments. The server is the updated version of a previously developed search engine and employs an in-house-developed fast pattern matching algorithm. This server can be accessed freely over the World Wide Web through the URL http://cluster.physics.iisc.ernet.in/ff/.
Resumo:
A computational pipeline PocketAnnotate for functional annotation of proteins at the level of binding sites has been proposed in this study. The pipeline integrates three in-house algorithms for site-based function annotation: PocketDepth, for prediction of binding sites in protein structures; PocketMatch, for rapid comparison of binding sites and PocketAlign, to obtain detailed alignment between pair of binding sites. A novel scheme has been developed to rapidly generate a database of non-redundant binding sites. For a given input protein structure, putative ligand-binding sites are identified, matched in real time against the database and the query substructure aligned with the promising hits, to obtain a set of possible ligands that the given protein could bind to. The input can be either whole protein structures or merely the substructures corresponding to possible binding sites. Structure-based function annotation at the level of binding sites thus achieved could prove very useful for cases where no obvious functional inference can be obtained based purely on sequence or fold-level analyses. An attempt has also been made to analyse proteins of no known function from Protein Data Bank. PocketAnnotate would be a valuable tool for the scientific community and contribute towards structure-based functional inference. The web server can be freely accessed at http://proline.biochem.iisc.ernet.in/pocketannotate/.
Resumo:
We present external memory data structures for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in R-d, compute the aggregate of the weights of the points that lie inside a d-dimensional orthogonal query rectangle. The aggregates we consider in this paper include COUNT, sum, and MAX. First, we develop a structure for answering two-dimensional range-COUNT queries that uses O(N/B) disk blocks and answers a query in O(log(B) N) I/Os, where N is the number of input points and B is the disk block size. The structure can be extended to obtain a near-linear-size structure for answering range-sum queries using O(log(B) N) I/Os, and a linear-size structure for answering range-MAX queries in O(log(B)(2) N) I/Os. Our structures can be made dynamic and extended to higher dimensions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The problem of identifying user intent has received considerable attention in recent years, particularly in the context of improving the search experience via query contextualization. Intent can be characterized by multiple dimensions, which are often not observed from query words alone. Accurate identification of Intent from query words remains a challenging problem primarily because it is extremely difficult to discover these dimensions. The problem is often significantly compounded due to lack of representative training sample. We present a generic, extensible framework for learning the multi-dimensional representation of user intent from the query words. The approach models the latent relationships between facets using tree structured distribution which leads to an efficient and convergent algorithm, FastQ, for identifying the multi-faceted intent of users based on just the query words. We also incorporated WordNet to extend the system capabilities to queries which contain words that do not appear in the training data. Empirical results show that FastQ yields accurate identification of intent when compared to a gold standard.
Resumo:
This paper considers the problem of identifying the footprints of communication of multiple transmitters in a given geographical area. To do this, a number of sensors are deployed at arbitrary but known locations in the area, and their individual decisions regarding the presence or absence of the transmitters' signal are combined at a fusion center to reconstruct the spatial spectral usage map. One straightforward scheme to construct this map is to query each of the sensors and cluster the sensors that detect the primary's signal. However, using the fact that a typical transmitter footprint map is a sparse image, two novel compressive sensing based schemes are proposed, which require significantly fewer number of transmissions compared to the querying scheme. A key feature of the proposed schemes is that the measurement matrix is constructed from a pseudo-random binary phase shift applied to the decision of each sensor prior to transmission. The measurement matrix is thus a binary ensemble which satisfies the restricted isometry property. The number of measurements needed for accurate footprint reconstruction is determined using compressive sampling theory. The three schemes are compared through simulations in terms of a performance measure that quantifies the accuracy of the reconstructed spatial spectral usage map. It is found that the proposed sparse reconstruction technique-based schemes significantly outperform the round-robin scheme.
Resumo:
Background: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. Methodology/Principal Findings: We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of similar to 100% and Mathew's correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. Conclusions/Significance: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the `bridging' role of related families.
Resumo:
Identifying symmetry in scalar fields is a recent area of research in scientific visualization and computer graphics communities. Symmetry detection techniques based on abstract representations of the scalar field use only limited geometric information in their analysis. Hence they may not be suited for applications that study the geometric properties of the regions in the domain. On the other hand, methods that accumulate local evidence of symmetry through a voting procedure have been successfully used for detecting geometric symmetry in shapes. We extend such a technique to scalar fields and use it to detect geometrically symmetric regions in synthetic as well as real-world datasets. Identifying symmetry in the scalar field can significantly improve visualization and interactive exploration of the data. We demonstrate different applications of the symmetry detection method to scientific visualization: query-based exploration of scalar fields, linked selection in symmetric regions for interactive visualization, and classification of geometrically symmetric regions and its application to anomaly detection.
Resumo:
In the process of service provisioning, providing required service to the user without user intervention, with reduction of the cognitive over loading is a real challenge. In this paper we propose a user centred context aware collaborative service provisioning system, which make use of context along with collaboration to provide the required service to the user dynamically. The system uses a novel approach of query expansion along with interactive and rating matrix based collaboration. Performance of the system is evaluated in Mobile-Commerce environment. The results show that the system is time efficient and perform with better precision and recall in comparison with context aware system.
Resumo:
Approximate Nearest Neighbour Field maps are commonly used by computer vision and graphics community to deal with problems like image completion, retargetting, denoising, etc. In this paper, we extend the scope of usage of ANNF maps to medical image analysis, more specifically to optic disk detection in retinal images. In the analysis of retinal images, optic disk detection plays an important role since it simplifies the segmentation of optic disk and other retinal structures. The proposed approach uses FeatureMatch, an ANNF algorithm, to find the correspondence between a chosen optic disk reference image and any given query image. This correspondence provides a distribution of patches in the query image that are closest to patches in the reference image. The likelihood map obtained from the distribution of patches in query image is used for optic disk detection. The proposed approach is evaluated on five publicly available DIARETDB0, DIARETDB1, DRIVE, STARE and MESSIDOR databases, with total of 1540 images. We show, experimentally, that our proposed approach achieves an average detection accuracy of 99% and an average computation time of 0.2 s per image. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
The complexity in visualizing volumetric data often limits the scope of direct exploration of scalar fields. Isocontour extraction is a popular method for exploring scalar fields because of its simplicity in presenting features in the data. In this paper, we present a novel representation of contours with the aim of studying the similarity relationship between the contours. The representation maps contours to points in a high-dimensional transformation-invariant descriptor space. We leverage the power of this representation to design a clustering based algorithm for detecting symmetric regions in a scalar field. Symmetry detection is a challenging problem because it demands both segmentation of the data and identification of transformation invariant segments. While the former task can be addressed using topological analysis of scalar fields, the latter requires geometry based solutions. Our approach combines the two by utilizing the contour tree for segmenting the data and the descriptor space for determining transformation invariance. We discuss two applications, query driven exploration and asymmetry visualization, that demonstrate the effectiveness of the approach.