865 resultados para Query Optimization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiresolution (or multi-scale) techniques make it possible for Web-based GIS applications to access large dataset. The performance of such systems relies on data transmission over network and multiresolution query processing. In the literature the latter has received little research attention so far, and the existing methods are not capable of processing large dataset. In this paper, we aim to improve multiresolution query processing in an online environment. A cost model for such query is proposed first, followed by three strategies for its optimization. Significant theoretical improvement can be observed when comparing against available methods. Application of these strategies is also discussed, and similar performance enhancement can be expected if implemented in online GIS applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The need for a convergence between semi-structured data management and Information Retrieval techniques is manifest to the scientific community. In order to fulfil this growing request, W3C has recently proposed XQuery Full Text, an IR-oriented extension of XQuery. However, the issue of query optimization requires the study of important properties like query equivalence and containment; to this aim, a formal representation of document and queries is needed. The goal of this thesis is to establish such formal background. We define a data model for XML documents and propose an algebra able to represent most of XQuery Full-Text expressions. We show how an XQuery Full-Text expression can be translated into an algebraic expression and how an algebraic expression can be optimized.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Non-failure analysis aims at inferring that predicate calis in a program will never fail. This type of information has many applications in functional/logic programming. It is essential for determining lower bounds on the computational cost of calis, useful in the context of program parallelization, instrumental in partial evaluation and other program transformations, and has also been used in query optimization. In this paper, we re-cast the non-failure analysis proposed by Debray et al. as an abstract interpretation, which not only allows to investígate it from a standard and well understood theoretical framework, but has also several practical advantages. It allows us to incorpórate non-failure analysis into a standard, generic abstract interpretation engine. The analysis thus benefits from the fixpoint propagation algorithm, which leads to improved information propagation. Also, the analysis takes advantage of the multi-variance of the generic engine, so that it is now able to infer sepárate non-failure information for different cali patterns. Moreover, the implementation is simpler, and allows to perform non-failure and covering analyses alongside other analyses, such as those for modes and types, in the same framework. Finally, besides the precisión improvements and the additional simplicity, our implementation (in the Ciao/CiaoPP multiparadigm programming system) also shows better efRciency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It is generally recognized that information about the runtime cost of computations can be useful for a variety of applications, including program transformation, granularity control during parallel execution, and query optimization in deductive databases. Most of the work to date on compile-time cost estimation of logic programs has focused on the estimation of upper bounds on costs. However, in many applications, such as parallel implementations on distributed-memory machines, one would prefer to work with lower bounds instead. The problem with estimating lower bounds is that in general, it is necessary to account for the possibility of failure of head unification, leading to a trivial lower bound of 0. In this paper, we show how, given type and mode information about procedures in a logic program, it is possible to (semi-automatically) derive nontrivial lower bounds on their computational costs. We also discuss the cost analysis for the special and frequent case of divide-and-conquer programs and show how —as a pragmatic short-term solution —it may be possible to obtain useful results simply by identifying and treating divide-and-conquer programs specially.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Information about the computational cost of programs is potentially useful for a variety of purposes, including selecting among different algorithms, guiding program transformations, in granularity control and mapping decisions in parallelizing compilers, and query optimization in deductive databases. Cost analysis of logic programs is complicated by nondeterminism: on the one hand, procedures can return múltiple Solutions, making it necessary to estímate the number of solutions in order to give nontrivial upper bound cost estimates; on the other hand, the possibility of failure has to be taken into account while estimating lower bounds. Here we discuss techniques to address these problems to some extent.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this article, we present several novel techniques to effectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important level-two topological relations: contains, contained, overlap, and disjoint. We first present a novel framework to construct a multiscale Euler histogram in 2D space with the guarantee of the exact summarization results for aligned windows in constant time. To minimize the storage space in such a multiscale Euler histogram, an approximate algorithm with the approximate ratio 19/12 is presented, while the problem is shown NP-hard generally. To conform to a limited storage space where a multiscale histogram may be allowed to have only k Euler histograms, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy in approximately summarizing aligned windows. Then, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. We also investigate the problem of nonaligned windows and the problem of effectively partitioning the data space to support nonaligned window queries. Finally, we extend our techniques to 3D space. Our extensive experiments against both synthetic and real world datasets demonstrate that the approximate multiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost efficiency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for many popular real datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this paper, we present several novel techniques to eectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important topological relations: contains, contained, overlap, and disjoint. We rst present a novel framework to construct a multiscale histogram composed of multiple Euler histograms with the guarantee of the exact summarization results for aligned windows in constant time. Then we present an approximate algorithm, with the approximate ratio 19/12, to minimize the storage spaces of such multiscale Euler histograms, although the problem is generally NP-hard. To conform to a limited storage space where only k Euler histograms are allowed, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy. Finally, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. Our extensive experiments against both synthetic and real world datasets demonstrated that the approximate mul- tiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost effciency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for the real datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of finding the optimal join ordering executing a query to a relational database management system is a combinatorial optimization problem, which makes deterministic exhaustive solution search unacceptable for queries with a great number of joined relations. In this work an adaptive genetic algorithm with dynamic population size is proposed for optimizing large join queries. The performance of the algorithm is compared with that of several classical non-deterministic optimization algorithms. Experiments have been performed optimizing several random queries against a randomly generated data dictionary. The proposed adaptive genetic algorithm with probabilistic selection operator outperforms in a number of test runs the canonical genetic algorithm with Elitist selection as well as two common random search strategies and proves to be a viable alternative to existing non-deterministic optimization approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Large read-only or read-write transactions with a large read set and a small write set constitute an important class of transactions used in such applications as data mining, data warehousing, statistical applications, and report generators. Such transactions are best supported with optimistic concurrency, because locking of large amounts of data for extended periods of time is not an acceptable solution. The abort rate in regular optimistic concurrency algorithms increases exponentially with the size of the transaction. The algorithm proposed in this dissertation solves this problem by using a new transaction scheduling technique that allows a large transaction to commit safely with significantly greater probability that can exceed several orders of magnitude versus regular optimistic concurrency algorithms. A performance simulation study and a formal proof of serializability and external consistency of the proposed algorithm are also presented.^ This dissertation also proposes a new query optimization technique (lazy queries). Lazy Queries is an adaptive query execution scheme which optimizes itself as the query runs. Lazy queries can be used to find an intersection of sub-queries in a very efficient way, which does not require full execution of large sub-queries nor does it require any statistical knowledge about the data.^ An efficient optimistic concurrency control algorithm used in a massively parallel B-tree with variable-length keys is introduced. B-trees with variable-length keys can be effectively used in a variety of database types. In particular, we show how such a B-tree was used in our implementation of a semantic object-oriented DBMS. The concurrency control algorithm uses semantically safe optimistic virtual "locks" that achieve very fine granularity in conflict detection. This algorithm ensures serializability and external consistency by using logical clocks and backward validation of transactional queries. A formal proof of correctness of the proposed algorithm is also presented. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The development of new technologies that use peer-to-peer networks grows every day, with the object to supply the need of sharing information, resources and services of databases around the world. Among them are the peer-to-peer databases that take advantage of peer-to-peer networks to manage distributed knowledge bases, allowing the sharing of information semantically related but syntactically heterogeneous. However, it is a challenge to ensure the efficient search for information without compromising the autonomy of each node and network flexibility, given the structural characteristics of these networks. On the other hand, some studies propose the use of ontology semantics by assigning standardized categorization of information. The main original contribution of this work is the approach of this problem with a proposal for optimization of queries supported by the Ant Colony algorithm and classification though ontologies. The results show that this strategy enables the semantic support to the searches in peer-to-peer databases, aiming to expand the results without compromising network performance. © 2011 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose - The purpose of this paper is to identify the most popular techniques used to rank a web page highly in Google. Design/methodology/approach - The paper presents the results of a study into 50 highly optimized web pages that were created as part of a Search Engine Optimization competition. The study focuses on the most popular techniques that were used to rank highest in this competition, and includes an analysis on the use of PageRank, number of pages, number of in-links, domain age and the use of third party sites such as directories and social bookmarking sites. A separate study was made into 50 non-optimized web pages for comparison. Findings - The paper provides insight into the techniques that successful Search Engine Optimizers use to ensure a page ranks highly in Google. Recognizes the importance of PageRank and links as well as directories and social bookmarking sites. Research limitations/implications - Only the top 50 web sites for a specific query were analyzed. Analysing more web sites and comparing with similar studies in different competition would provide more concrete results. Practical implications - The paper offers a revealing insight into the techniques used by industry experts to rank highly in Google, and the success or other-wise of those techniques. Originality/value - This paper fulfils an identified need for web sites and e-commerce sites keen to attract a wider web audience.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While designing systems and products requires a deep understanding of influences that achieve desirable performance, the need for an efficient and systematic decision-making approach drives the need for optimization strategies. This paper provides the motivation for this topic as well as a description of applications in Computing Center of Madrid city Council. Optimization applications can be found in almost all areas of engineering. Typical problems in process, working with a database, arise in query design, entity model design and concurrent processes. This paper proposes a solution to optimize a night process dealing with millions of records with an overall performance of about eight times in computation time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Insulin was used as model protein to developed innovative Solid Lipid Nanoparticles (SLNs) for the delivery of hydrophilic biotech drugs, with potential use in medicinal chemistry. SLNs were prepared by double emulsion with the purpose of promoting stability and enhancing the protein bioavailability. Softisan(®)100 was selected as solid lipid matrix. The surfactants (Tween(®)80, Span(®)80 and Lipoid(®)S75) and insulin were chosen applying a 2(2) factorial design with triplicate of central point, evaluating the influence of dependents variables as polydispersity index (PI), mean particle size (z-AVE), zeta potential (ZP) and encapsulation efficiency (EE) by factorial design using the ANOVA test. Therefore, thermodynamic stability, polymorphism and matrix crystallinity were checked by Differential Scanning Calorimetry (DSC) and Wide Angle X-ray Diffraction (WAXD), whereas the effect of toxicity of SLNs was check in HepG2 and Caco-2 cells. Results showed a mean particle size (z-AVE) width between 294.6 nm and 627.0 nm, a PI in the range of 0.425-0.750, ZP about -3 mV, and the EE between 38.39% and 81.20%. After tempering the bulk lipid (mimicking the end process of production), the lipid showed amorphous characteristics, with a melting point of ca. 30 °C. The toxicity of SLNs was evaluated in two distinct cell lines (HEPG-2 and Caco-2), showing to be dependent on the concentration of particles in HEPG-2 cells, while no toxicity in was reported in Caco-2 cells. SLNs were stable for 24 h in in vitro human serum albumin (HSA) solution. The resulting SLNs fabricated by double emulsion may provide a promising approach for administration of protein therapeutics and antigens.