863 resultados para Query processing


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Current technology permits connecting local networks via high-bandwidth telephone lines. Central coordinator nodes may use Intelligent Networks to manage data flow over dialed data lines, e.g. ISDN, and to establish connections between LANs. This dissertation focuses on cost minimization and on establishing operational policies for query distribution over heterogeneous, geographically distributed databases. Based on our study of query distribution strategies, public network tariff policies, and database interface standards we propose methods for communication cost estimation, strategies for the reduction of bandwidth allocation, and guidelines for central to node communication protocols. Our conclusion is that dialed data lines offer a cost effective alternative for the implementation of distributed database query systems, and that existing commercial software may be adapted to support query processing in heterogeneous distributed database systems. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Homomorphic encryption is a particular type of encryption method that enables computing over encrypted data. This has a wide range of real world ramifications such as being able to blindly compute a search result sent to a remote server without revealing its content. In the first part of this thesis, we discuss how database search queries can be made secure using a homomorphic encryption scheme based on the ideas of Gahi et al. Gahi’s method is based on the integer-based fully homomorphic encryption scheme proposed by Dijk et al. We propose a new database search scheme called the Homomorphic Query Processing Scheme, which can be used with the ring-based fully homomorphic encryption scheme proposed by Braserski. In the second part of this thesis, we discuss the cybersecurity of the smart electric grid. Specifically, we use the Homomorphic Query Processing scheme to construct a keyword search technique in the smart grid. Our work is based on the Public Key Encryption with Keyword Search (PEKS) method introduced by Boneh et al. and a Multi-Key Homomorphic Encryption scheme proposed by L´opez-Alt et al. A summary of the results of this thesis (specifically the Homomorphic Query Processing Scheme) is published at the 14th Canadian Workshop on Information Theory (CWIT).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Database query languages on relations (for example SQL) make it possible to join two relations. This operation is very common in desktop/server database systems but unfortunately query processing systems in networked embedded computer systems currently do not support this operation; specifically, the query processing systems TAG, TinyDB, Cougar do not support this. We show how a prioritized medium access control (MAC) protocol can be used to efficiently execute the database operation join for networked embedded computer systems where all computer nodes are in a single broadcast domain.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present a P2P-based database sharing system that provides information sharing capabilities through keyword-based search techniques. Our system requires neither a global schema nor schema mappings between different databases, and our keyword-based search algorithms are robust in the presence of frequent changes in the content and membership of peers. To facilitate data integration, we introduce keyword join operator to combine partial answers containing different keywords into complete answers. We also present an efficient algorithm that optimize the keyword join operations for partial answer integration. Our experimental study on both real and synthetic datasets demonstrates the effectiveness of our algorithms, and the efficiency of the proposed query processing strategies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents the 2005 Miracle’s team approach to the Ad-Hoc Information Retrieval tasks. The goal for the experiments this year was twofold: to continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extraction, paragraph extraction, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second-order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query. In the multilingual track, we concentrated our work on the merging process of the results of monolingual runs to get the overall multilingual result, relying on available translations. In both cross-lingual tracks, we have used available translation resources, and in some cases we have used a combination approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The main goal of the bilingual and monolingual participation of the MIRACLE team in CLEF 2004 was to test the effect of combination approaches on information retrieval. The starting point was a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. A second order combination was also tested, mainly by averaging or selective combination of the documents retrieved by different approaches for a particular query.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness verification using an automatic method.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La gestión del conocimiento (KM) es el proceso de recolectar datos en bruto para su análisis y filtrado, con la finalidad de obtener conocimiento útil a partir de dichos datos. En este proyecto se pretende hacer un estudio sobre la gestión de la información en las redes de sensores inalámbricos como inicio para sentar las bases para la gestión del conocimiento en las mismas. Las redes de sensores inalámbricos (WSN) son redes compuestas por sensores (también conocidos como motas) distribuidos sobre un área, cuya misión es monitorizar una o varias condiciones físicas del entorno. Las redes de sensores inalámbricos se caracterizan por tener restricciones de consumo para los sensores que utilizan baterías, por su capacidad para adaptarse a cambios y ser escalables, y también por su habilidad para hacer frente a fallos en los sensores. En este proyecto se hace un estudio sobre la gestión de la información en redes de sensores inalámbricos. Se comienza introduciendo algunos conceptos básicos: arquitectura, pila de protocolos, topologías de red, etc.… Después de esto, se ha enfocado el estudio hacia TinyDB, el cual puede ser considerado como parte de las tecnologías más avanzadas en el estado del arte de la gestión de la información en redes de sensores inalámbricos. TinyDB es un sistema de procesamiento de consultas para extraer información de una red de sensores. Proporciona una interfaz similar a SQL y permite trabajar con consultas contra la red de sensores inalámbricos como si se tratara de una base de datos tradicional. Además, TinyDB implementa varias optimizaciones para manejar los datos eficientemente. En este proyecto se describe también la implementación de una sencilla aplicación basada en redes de sensores inalámbricos. Las motas en la aplicación son capaces de medir la corriente a través de un cable. El objetivo de esta aplicación es monitorizar el consumo de energía en diferentes zonas de un área industrial o doméstico, utilizando redes de sensores inalámbricas. Además, se han implementado las optimizaciones más importantes que se han aprendido en el análisis de la plataforma TinyDB. Para desarrollar esta aplicación se ha utilizado como sensores la plataforma open-source de creación de prototipos electrónicos Arduino, y el ordenador de placa reducida Raspberry Pi como coordinador. ABSTRACT. Knowledge management (KM) is the process of collecting raw data for analysis and filtering, to get a useful knowledge from this data. In this project the information management in wireless sensor networks is studied as starting point before knowledge management. Wireless sensor networks (WSN) are networks which consists of sensors (also known as motes) distributed over an area, to monitor some physical conditions of the environment. Wireless sensor networks are characterized by power consumption constrains for sensors which are using batteries, by the ability to be adaptable to changes and to be scalable, and by the ability to cope sensor failures. In this project it is studied information management in wireless sensor networks. The document starts introducing basic concepts: architecture, stack of protocols, network topology… After this, the study has been focused on TinyDB, which can be considered as part of the most advanced technologies in the state of the art of information management in wireless sensor networks. TinyDB is a query processing system for extracting information from a network of sensors. It provides a SQL-like interface and it lets us to work with queries against the wireless sensor network like if it was a traditional database. In addition, TinyDB implements a lot of optimizations to manage data efficiently. In this project, it is implemented a simple wireless sensor network application too. Application’s motes are able to measure amperage through a cable. The target of the application is, by using a wireless sensor network and these sensors, to monitor energy consumption in different areas of a house. Additionally, it is implemented the most important optimizations that we have learned from the analysis of TinyDB platform. To develop this application it is used Arduino open-source electronics prototyping platform as motes, and Raspberry Pi single-board computer as coordinator.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this article, we present several novel techniques to effectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important level-two topological relations: contains, contained, overlap, and disjoint. We first present a novel framework to construct a multiscale Euler histogram in 2D space with the guarantee of the exact summarization results for aligned windows in constant time. To minimize the storage space in such a multiscale Euler histogram, an approximate algorithm with the approximate ratio 19/12 is presented, while the problem is shown NP-hard generally. To conform to a limited storage space where a multiscale histogram may be allowed to have only k Euler histograms, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy in approximately summarizing aligned windows. Then, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. We also investigate the problem of nonaligned windows and the problem of effectively partitioning the data space to support nonaligned window queries. Finally, we extend our techniques to 3D space. Our extensive experiments against both synthetic and real world datasets demonstrate that the approximate multiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost efficiency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for many popular real datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Terrain can be approximated by a triangular mesh consisting millions of 3D points. Multiresolution triangular mesh (MTM) structures are designed to support applications that use terrain data at variable levels of detail (LOD). Typically, an MTM adopts a tree structure where a parent node represents a lower-resolution approximation of its descendants. Given a region of interest (ROI) and a LOD, the process of retrieving the required terrain data from the database is to traverse the MTM tree from the root to reach all the nodes satisfying the ROI and LOD conditions. This process, while being commonly used for multiresolution terrain visualization, is inefficient as either a large number of sequential I/O operations or fetching a large amount of extraneous data is incurred. Various spatial indexes have been proposed in the past to address this problem, however level-by-level tree traversal remains a common practice in order to obtain topological information among the retrieved terrain data. A new MTM data structure called direct mesh is proposed. We demonstrate that with direct mesh the amount of data retrieval can be substantially reduced. Comparing with existing MTM indexing methods, a significant performance improvement has been observed for real-life terrain data.