812 resultados para Efficient query
Resumo:
...Diese Dissertation zeigt, wie wir Datenbankmanagementsysteme bauen können, die heterogene Prozessoren effizient und zuverlässig zur Beschleunigung der Anfrageverarbeitung nutzen können. Daher untersuchen wir typische Entwurfsentscheidungen von coprozessorbeschleunigten Datenbankmanagementsystemen und leiten darauf aufbauend eine generische Architektur für solche Systeme ab. Unsere Untersuchungen zeigen, dass eines der wichtigsten Probleme für solche Datenbankmanagementsysteme die Entscheidung ist, welche Operatoren einer Anfrage auf welchem Prozessor ausgeführt werden sollen...
Resumo:
Current technology permits connecting local networks via high-bandwidth telephone lines. Central coordinator nodes may use Intelligent Networks to manage data flow over dialed data lines, e.g. ISDN, and to establish connections between LANs. This dissertation focuses on cost minimization and on establishing operational policies for query distribution over heterogeneous, geographically distributed databases. Based on our study of query distribution strategies, public network tariff policies, and database interface standards we propose methods for communication cost estimation, strategies for the reduction of bandwidth allocation, and guidelines for central to node communication protocols. Our conclusion is that dialed data lines offer a cost effective alternative for the implementation of distributed database query systems, and that existing commercial software may be adapted to support query processing in heterogeneous distributed database systems. ^
Resumo:
In many advanced applications, data are described by multiple high-dimensional features. Moreover, different queries may weight these features differently; some may not even specify all the features. In this paper, we propose our solution to support efficient query processing in these applications. We devise a novel representation that compactly captures f features into two components: The first component is a 2D vector that reflects a distance range ( minimum and maximum values) of the f features with respect to a reference point ( the center of the space) in a metric space and the second component is a bit signature, with two bits per dimension, obtained by analyzing each feature's descending energy histogram. This representation enables two levels of filtering: The first component prunes away points that do not share similar distance ranges, while the bit signature filters away points based on the dimensions of the relevant features. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B+-tree for this purpose. We also propose a KNN search algorithm that exploits the access orders of critical dimensions of highly selective features and partial distances to prune the search space more effectively. Our extensive experiments on both real-life and synthetic data sets show that the proposed solution offers significant performance advantages over sequential scan and retrieval methods using single and multiple VA-files.
Resumo:
Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.
Resumo:
Ontology-based data access (OBDA) systems use ontologies to provide views over relational databases. Most of these systems work with ontologies implemented in description logic families of reduced expressiveness, what allows applying efficient query rewriting techniques for query answering. In this paper we describe a set of optimisations that are applicable with one of the most expressive families used in this context (ELHIO¬). Our resulting system exhibits a behaviour that is comparable to the one shown by systems that handle less expressive logics.
Resumo:
A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.
Resumo:
Moving objects database systems are the most challenging sub-category among Spatio-Temporal database systems. A database system that updates in real-time the location information of GPS-equipped moving vehicles has to meet even stricter requirements. Currently existing data storage models and indexing mechanisms work well only when the number of moving objects in the system is relatively small. This dissertation research aimed at the real-time tracking and history retrieval of massive numbers of vehicles moving on road networks. A total solution has been provided for the real-time update of the vehicles' location and motion information, range queries on current and history data, and prediction of vehicles' movement in the near future. ^ To achieve these goals, a new approach called Segmented Time Associated to Partitioned Space (STAPS) was first proposed in this dissertation for building and manipulating the indexing structures for moving objects databases. ^ Applying the STAPS approach, an indexing structure of associating a time interval tree to each road segment was developed for real-time database systems of vehicles moving on road networks. The indexing structure uses affordable storage to support real-time data updates and efficient query processing. The data update and query processing performance it provides is consistent without restrictions such as a time window or assuming linear moving trajectories. ^ An application system design based on distributed system architecture with centralized organization was developed to maximally support the proposed data and indexing structures. The suggested system architecture is highly scalable and flexible. Finally, based on a real-world application model of vehicles moving in region-wide, main issues on the implementation of such a system were addressed. ^
Resumo:
Moving objects database systems are the most challenging sub-category among Spatio-Temporal database systems. A database system that updates in real-time the location information of GPS-equipped moving vehicles has to meet even stricter requirements. Currently existing data storage models and indexing mechanisms work well only when the number of moving objects in the system is relatively small. This dissertation research aimed at the real-time tracking and history retrieval of massive numbers of vehicles moving on road networks. A total solution has been provided for the real-time update of the vehicles’ location and motion information, range queries on current and history data, and prediction of vehicles’ movement in the near future. To achieve these goals, a new approach called Segmented Time Associated to Partitioned Space (STAPS) was first proposed in this dissertation for building and manipulating the indexing structures for moving objects databases. Applying the STAPS approach, an indexing structure of associating a time interval tree to each road segment was developed for real-time database systems of vehicles moving on road networks. The indexing structure uses affordable storage to support real-time data updates and efficient query processing. The data update and query processing performance it provides is consistent without restrictions such as a time window or assuming linear moving trajectories. An application system design based on distributed system architecture with centralized organization was developed to maximally support the proposed data and indexing structures. The suggested system architecture is highly scalable and flexible. Finally, based on a real-world application model of vehicles moving in region-wide, main issues on the implementation of such a system were addressed.
Resumo:
Multiresolution Triangular Mesh (MTM) models are widely used to improve the performance of large terrain visualization by replacing the original model with a simplified one. MTM models, which consist of both original and simplified data, are commonly stored in spatial database systems due to their size. The relatively slow access speed of disks makes data retrieval the bottleneck of such terrain visualization systems. Existing spatial access methods proposed to address this problem rely on main-memory MTM models, which leads to significant overhead during query processing. In this paper, we approach the problem from a new perspective and propose a novel MTM called direct mesh that is designed specifically for secondary storage. It supports available indexing methods natively and requires no modification to MTM structure. Experiment results, which are based on two real-world data sets, show an average performance improvement of 5-10 times over the existing methods.
Resumo:
Music similarity query based on acoustic content is becoming important with the ever-increasing growth of the music information from emerging applications such as digital libraries and WWW. However, relative techniques are still in their infancy and much less than satisfactory. In this paper, we present a novel index structure, called Composite Feature tree, CF-tree, to facilitate efficient content-based music search adopting multiple musical features. Before constructing the tree structure, we use PCA to transform the extracted features into a new space sorted by the importance of acoustic features. The CF-tree is a balanced multi-way tree structure where each level represents the data space at different dimensionalities. The PCA transformed data and reduced dimensions in the upper levels can alleviate suffering from dimensionality curse. To accurately mimic human perception, an extension, named CF+-tree, is proposed, which further applies multivariable regression to determine the weight of each individual feature. We conduct extensive experiments to evaluate the proposed structures against state-of-art techniques. The experimental results demonstrate superiority of our technique.
Resumo:
The development of new technologies that use peer-to-peer networks grows every day, with the object to supply the need of sharing information, resources and services of databases around the world. Among them are the peer-to-peer databases that take advantage of peer-to-peer networks to manage distributed knowledge bases, allowing the sharing of information semantically related but syntactically heterogeneous. However, it is a challenge to ensure the efficient search for information without compromising the autonomy of each node and network flexibility, given the structural characteristics of these networks. On the other hand, some studies propose the use of ontology semantics by assigning standardized categorization of information. The main original contribution of this work is the approach of this problem with a proposal for optimization of queries supported by the Ant Colony algorithm and classification though ontologies. The results show that this strategy enables the semantic support to the searches in peer-to-peer databases, aiming to expand the results without compromising network performance. © 2011 IEEE.
Resumo:
In vielen Bereichen der industriellen Fertigung, wie zum Beispiel in der Automobilindustrie, wer- den digitale Versuchsmodelle (sog. digital mock-ups) eingesetzt, um die Entwicklung komplexer Maschinen m ̈oglichst gut durch Computersysteme unterstu ̈tzen zu k ̈onnen. Hierbei spielen Be- wegungsplanungsalgorithmen eine wichtige Rolle, um zu gew ̈ahrleisten, dass diese digitalen Pro- totypen auch kollisionsfrei zusammengesetzt werden k ̈onnen. In den letzten Jahrzehnten haben sich hier sampling-basierte Verfahren besonders bew ̈ahrt. Diese erzeugen eine große Anzahl von zuf ̈alligen Lagen fu ̈r das ein-/auszubauende Objekt und verwenden einen Kollisionserken- nungsmechanismus, um die einzelnen Lagen auf Gu ̈ltigkeit zu u ̈berpru ̈fen. Daher spielt die Kollisionserkennung eine wesentliche Rolle beim Design effizienter Bewegungsplanungsalgorith- men. Eine Schwierigkeit fu ̈r diese Klasse von Planern stellen sogenannte “narrow passages” dar, schmale Passagen also, die immer dort auftreten, wo die Bewegungsfreiheit der zu planenden Objekte stark eingeschr ̈ankt ist. An solchen Stellen kann es schwierig sein, eine ausreichende Anzahl von kollisionsfreien Samples zu finden. Es ist dann m ̈oglicherweise n ̈otig, ausgeklu ̈geltere Techniken einzusetzen, um eine gute Performance der Algorithmen zu erreichen.rnDie vorliegende Arbeit gliedert sich in zwei Teile: Im ersten Teil untersuchen wir parallele Kollisionserkennungsalgorithmen. Da wir auf eine Anwendung bei sampling-basierten Bewe- gungsplanern abzielen, w ̈ahlen wir hier eine Problemstellung, bei der wir stets die selben zwei Objekte, aber in einer großen Anzahl von unterschiedlichen Lagen auf Kollision testen. Wir im- plementieren und vergleichen verschiedene Verfahren, die auf Hu ̈llk ̈operhierarchien (BVHs) und hierarchische Grids als Beschleunigungsstrukturen zuru ̈ckgreifen. Alle beschriebenen Verfahren wurden auf mehreren CPU-Kernen parallelisiert. Daru ̈ber hinaus vergleichen wir verschiedene CUDA Kernels zur Durchfu ̈hrung BVH-basierter Kollisionstests auf der GPU. Neben einer un- terschiedlichen Verteilung der Arbeit auf die parallelen GPU Threads untersuchen wir hier die Auswirkung verschiedener Speicherzugriffsmuster auf die Performance der resultierenden Algo- rithmen. Weiter stellen wir eine Reihe von approximativen Kollisionstests vor, die auf den beschriebenen Verfahren basieren. Wenn eine geringere Genauigkeit der Tests tolerierbar ist, kann so eine weitere Verbesserung der Performance erzielt werden.rnIm zweiten Teil der Arbeit beschreiben wir einen von uns entworfenen parallelen, sampling- basierten Bewegungsplaner zur Behandlung hochkomplexer Probleme mit mehreren “narrow passages”. Das Verfahren arbeitet in zwei Phasen. Die grundlegende Idee ist hierbei, in der er- sten Planungsphase konzeptionell kleinere Fehler zuzulassen, um die Planungseffizienz zu erh ̈ohen und den resultierenden Pfad dann in einer zweiten Phase zu reparieren. Der hierzu in Phase I eingesetzte Planer basiert auf sogenannten Expansive Space Trees. Zus ̈atzlich haben wir den Planer mit einer Freidru ̈ckoperation ausgestattet, die es erlaubt, kleinere Kollisionen aufzul ̈osen und so die Effizienz in Bereichen mit eingeschr ̈ankter Bewegungsfreiheit zu erh ̈ohen. Optional erlaubt unsere Implementierung den Einsatz von approximativen Kollisionstests. Dies setzt die Genauigkeit der ersten Planungsphase weiter herab, fu ̈hrt aber auch zu einer weiteren Perfor- mancesteigerung. Die aus Phase I resultierenden Bewegungspfade sind dann unter Umst ̈anden nicht komplett kollisionsfrei. Um diese Pfade zu reparieren, haben wir einen neuartigen Pla- nungsalgorithmus entworfen, der lokal beschr ̈ankt auf eine kleine Umgebung um den bestehenden Pfad einen neuen, kollisionsfreien Bewegungspfad plant.rnWir haben den beschriebenen Algorithmus mit einer Klasse von neuen, schwierigen Metall- Puzzlen getestet, die zum Teil mehrere “narrow passages” aufweisen. Unseres Wissens nach ist eine Sammlung vergleichbar komplexer Benchmarks nicht ̈offentlich zug ̈anglich und wir fan- den auch keine Beschreibung von vergleichbar komplexen Benchmarks in der Motion-Planning Literatur.
Resumo:
Industrial software systems are large and complex, both in terms of the software entities and their relationships. Consequently, understanding how a software system works requires the ability to pose queries over the design-level entities of the system. Traditionally, this task has been supported by simple tools (e.g., grep) combined with the programmer's intuition and experience. Recently, however, specialized code query technologies have matured to the point where they can be used in industrial situations, providing more intelligent, timely, and efficient responses to developer queries. This working session aims to explore the state of the art in code query technologies, and discover new ways in which these technologies may be useful in program comprehension. The session brings together researchers and practitioners. We survey existing techniques and applications, trying to understand the strengths and weaknesses of the various approaches, and sketch out new frontiers that hold promise.
Resumo:
With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.
Resumo:
Spatial data are particularly useful in mobile environments. However, due to the low bandwidth of most wireless networks, developing large spatial database applications becomes a challenging process. In this paper, we provide the first attempt to combine two important techniques, multiresolution spatial data structure and semantic caching, towards efficient spatial query processing in mobile environments. Based on the study of the characteristics of multiresolution spatial data (MSD) and multiresolution spatial query, we propose a new semantic caching model called Multiresolution Semantic Caching (MSC) for caching MSD in mobile environments. MSC enriches the traditional three-category query processing in semantic cache to five categories, thus improving the performance in three ways: 1) a reduction in the amount and complexity of the remainder queries; 2) the redundant transmission of spatial data already residing in a cache is avoided; 3) a provision for satisfactory answers before 100% query results have been transmitted to the client side. Our extensive experiments on a very large and complex real spatial database show that MSC outperforms the traditional semantic caching models significantly