516 resultados para queries
Resumo:
The data structure of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. This research develops a methodology for evaluating, ex ante, the relative desirability of alternative data structures for end user queries. This research theorizes that the data structure that yields the lowest weighted average complexity for a representative sample of information requests is the most desirable data structure for end user queries. The theory was tested in an experiment that compared queries from two different relational database schemas. As theorized, end users querying the data structure associated with the less complex queries performed better Complexity was measured using three different Halstead metrics. Each of the three metrics provided excellent predictions of end user performance. This research supplies strong evidence that organizations can use complexity metrics to evaluate, ex ante, the desirability of alternate data structures. Organizations can use these evaluations to enhance the efficient and effective retrieval of information by creating data structures that minimize end user query complexity.
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
Many queries sent to search engines refer to specific locations in the world. Location-based queries try to find local services and facilities around the user’s environment or in a particular area. This paper reviews the specifications of geospatial queries and discusses the similarities and differences between location-based queries and other queries. We introduce nine patterns for location-based queries containing either a service name alone or a service name accompanied by a location name. Our survey indicates that at least 22% of the Web queries have a geospatial dimension and most of these can be considered as location-based queries. We propose that location-based queries should be treated different from general queries to produce more relevant results.
Perception, intuition and database queries: Personality factors affecting database query performance
Resumo:
The Meta-Object Facility (MOF) provides a standardized framework for object-oriented models. An instance of a MOF model contains objects and links whose interfaces are entirely derived from that model. Information contained in these objects can be accessed directly, however, in order to realize the Model-Driven Architecture@trade; (MDA), we must have a mechanism for representing and evaluating structured queries on these instances. The MOF Query Language (MQL) is a language that extends the UML's Object Constraint Language (OCL) to provide more expressive power, such as higher-order queries, parametric polymorphism and argument polymorphism. Not only do these features allow more powerful queries, but they also encourage a greater degree of modularization and re-use, resulting in faster prototyping and facilitating automated integrity analysis. This paper presents an overview of the motivations for developing MQL and also discusses its abstract syntax, presented as a MOF model, and its semantics
Resumo:
This paper aims to identify the communication goal(s) of a user's information-seeking query out of a finite set of within-domain goals in natural language queries. It proposes using Tree-Augmented Naive Bayes networks (TANs) for goal detection. The problem is formulated as N binary decisions, and each is performed by a TAN. Comparative study has been carried out to compare the performance with Naive Bayes, fully-connected TANs, and multi-layer neural networks. Experimental results show that TANs consistently give better results when tested on the ATIS and DARPA Communicator corpora.
Resumo:
The problem of finding the optimal join ordering executing a query to a relational database management system is a combinatorial optimization problem, which makes deterministic exhaustive solution search unacceptable for queries with a great number of joined relations. In this work an adaptive genetic algorithm with dynamic population size is proposed for optimizing large join queries. The performance of the algorithm is compared with that of several classical non-deterministic optimization algorithms. Experiments have been performed optimizing several random queries against a randomly generated data dictionary. The proposed adaptive genetic algorithm with probabilistic selection operator outperforms in a number of test runs the canonical genetic algorithm with Elitist selection as well as two common random search strategies and proves to be a viable alternative to existing non-deterministic optimization approaches.
Resumo:
Large read-only or read-write transactions with a large read set and a small write set constitute an important class of transactions used in such applications as data mining, data warehousing, statistical applications, and report generators. Such transactions are best supported with optimistic concurrency, because locking of large amounts of data for extended periods of time is not an acceptable solution. The abort rate in regular optimistic concurrency algorithms increases exponentially with the size of the transaction. The algorithm proposed in this dissertation solves this problem by using a new transaction scheduling technique that allows a large transaction to commit safely with significantly greater probability that can exceed several orders of magnitude versus regular optimistic concurrency algorithms. A performance simulation study and a formal proof of serializability and external consistency of the proposed algorithm are also presented.^ This dissertation also proposes a new query optimization technique (lazy queries). Lazy Queries is an adaptive query execution scheme which optimizes itself as the query runs. Lazy queries can be used to find an intersection of sub-queries in a very efficient way, which does not require full execution of large sub-queries nor does it require any statistical knowledge about the data.^ An efficient optimistic concurrency control algorithm used in a massively parallel B-tree with variable-length keys is introduced. B-trees with variable-length keys can be effectively used in a variety of database types. In particular, we show how such a B-tree was used in our implementation of a semantic object-oriented DBMS. The concurrency control algorithm uses semantically safe optimistic virtual "locks" that achieve very fine granularity in conflict detection. This algorithm ensures serializability and external consistency by using logical clocks and backward validation of transactional queries. A formal proof of correctness of the proposed algorithm is also presented. ^
Resumo:
This project is about retrieving data in range without allowing the server to read it, when the database is stored in the server. Basically, our goal is to build a database that allows the client to maintain the confidentiality of the data stored, despite all the data is stored in a different location from the client's hard disk. This means that all the information written on the hard disk can be easily read by another person who can do anything with it. Given that, we need to encrypt that data from eavesdroppers or other people. This is because they could sell it or log into accounts and use them for stealing money or identities. In order to achieve this, we need to encrypt the data stored in the hard drive, so that only the possessor of the key can easily read the information stored, while all the others are going to read only encrypted data. Obviously, according to that, all the data management must be done by the client, otherwise any malicious person can easily retrieve it and use it for any malicious intention. All the methods analysed here relies on encrypting data in transit. In the end of this project we analyse 2 theoretical and practical methods for the creation of the above databases and then we tests them with 3 datasets and with 10, 100 and 1000 queries. The scope of this work is to retrieve a trend that can be useful for future works based on this project.
Resumo:
This thesis focuses on the private membership test (PMT) problem and presents three single server protocols to resolve this problem. In the presented solutions, a client can perform an inclusion test for some record x in a server's database, without revealing his record. Moreover after executing the protocols, the contents of server's database remain secret. In each of these solutions, a different cryptographic protocol is utilized to construct a privacy preserving variant of Bloom filter. The three suggested solutions are slightly different from each other, from privacy perspective and also from complexity point of view. Therefore, their use cases are different and it is impossible to choose one that is clearly the best between all three. We present the software developments of the three protocols by utilizing various pseudocodes. The performance of our implementation is measured based on a real case scenario. This thesis is a spin-off from the Academy of Finland research project "Cloud Security Services".
Bioqueries: a collaborative environment to create, explore and share SPARQL queries in Life Sciences
Resumo:
Bioqueries provides a collaborative environment to create, explore, execute, clone and share SPARQL queries (including Federated Queries). Federated SPARQL queries can retrieve information from more than one data source.
Bioqueries: a collaborative environment to create, explore and share SPARQL queries in Life Sciences
Resumo:
Bioqueries provides a collaborative environment to create, explore, execute, clone and share SPARQL queries (including Federated Queries). Federated SPARQL queries can retrieve information from more than one data source.
Resumo:
Universidade Estadual de Campinas. Faculdade de Educação Física
Resumo:
Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.
Resumo:
Orthodox teaching and practice on nutrition and health almost always focuses on nutrients, or else on foods and drinks. Thus, diets that are high in folate and in green leafy vegetables are recommended, whereas diets high in saturated fat and in full-fat milk and other dairy products are not recommended. Food guides such as the US Food Guide Pyramid are designed to encourage consumption of healthier foods, by which is usually meant those higher in vitamins, minerals and other nutrients seen as desirable.What is generally overlooked in such approaches, which currently dominate official and other authoritative information and education programmes, and also food and nutrition public health policies, is food processing. It is now generally acknowledged that the current pandemic of obesity and related chronic diseases has as one of its important causes increased consumption of convenience including pre-prepared foods(1,2). However, the issue of food processing is largely ignored or minimised in education and information about food, nutrition and health, and also in public health policies.A short commentary cannot be comprehensive, and a general proposal such as that made here is bound to have some problems and exceptions. Also, the social, cultural, economic and environmental consequences of food processing are not discussed here. Readers comments and queries are invited