93 resultados para Query Processing
Resumo:
Query processing over the Internet involving autonomous data sources is a major task in data integration. It requires the estimated costs of possible queries in order to select the best one that has the minimum cost. In this context, the cost of a query is affected by three factors: network congestion, server contention state, and complexity of the query. In this paper, we study the effects of both the network congestion and server contention state on the cost of a query. We refer to these two factors together as system contention states. We present a new approach to determining the system contention states by clustering the costs of a sample query. For each system contention state, we construct two cost formulas for unary and join queries respectively using the multiple regression process. When a new query is submitted, its system contention state is estimated first using either the time slides method or the statistical method. The cost of the query is then calculated using the corresponding cost formulas. The estimated cost of the query is further adjusted to improve its accuracy. Our experiments show that our methods can produce quite accurate cost estimates of the submitted queries to remote data sources over the Internet.
Resumo:
A RkNN query returns all objects whose nearest k neighbors
contain the query object. In this paper, we consider RkNN
query processing in the case where the distances between
attribute values are not necessarily metric. Dissimilarities
between objects could then be a monotonic aggregate of dissimilarities
between their values, such aggregation functions
being specified at query time. We outline real world cases
that motivate RkNN processing in such scenarios. We consider
the AL-Tree index and its applicability in RkNN query
processing. We develop an approach that exploits the group
level reasoning enabled by the AL-Tree in RkNN processing.
We evaluate our approach against a Naive approach
that performs sequential scans on contiguous data and an
improved block-based approach that we provide. We use
real-world datasets and synthetic data with varying characteristics
for our experiments. This extensive empirical
evaluation shows that our approach is better than existing
methods in terms of computational and disk access costs,
leading to significantly better response times.
Resumo:
Web databases are now pervasive. Such a database can be accessed via its query interface (usually HTML query form) only. Extracting Web query interfaces is a critical step in data integration across multiple Web databases, which creates a formal representation of a query form by extracting a set of query conditions in it. This paper presents a novel approach to extracting Web query interfaces. In this approach, a generic set of query condition rules are created to define query conditions that are semantically equivalent to SQL search conditions. Query condition rules represent the semantic roles that labels and form elements play in query conditions, and how they are hierarchically grouped into constructs of query conditions. To group labels and form elements in a query form, we explore both their structural proximity in the hierarchy of structures in the query form, which is captured by a tree of nested tags in the HTML codes of the form, and their semantic similarity, which is captured by various short texts used in labels, form elements and their properties. We have implemented the proposed approach and our experimental results show that the approach is highly effective.
Resumo:
We propose a novel admission control policy for database queries. Our methodology uses system measurements of CPU utilization and query backlogs to determine interference between queries in execution on the same database server. Query interference may arise due to the concurrent access of hardware and software resources and can affect performance in positive and negative ways. Specifically our admission control considers the mix of jobs in service and prioritizes the query classes consuming CPU resources more efficiently. The policy ignores I/O subsystems and is therefore highly appropriate for in-memory databases. We validate our approach in trace-driven simulation and show performance increases of query slowdowns and throughputs compared to first-come first-served and shortest expected processing time first scheduling. Simulation experiments are parameterized from system traces of a SAP HANA in-memory database installation with TPC-H type workloads. © 2012 IEEE.
Resumo:
With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group together satisfy a query.
We define the problem of retrieving a group of spatio-textual objects such that the group's keywords cover the query's keywords and such that the objects are nearest to the query location and have the smallest inter-object distances. Specifically, we study three instantiations of this problem, all of which are NP-hard. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. In addition, we solve the problems of retrieving top-k groups of three instantiations, and study a weighted version of the problem that incorporates object weights. We present empirical studies that offer insight into the efficiency of the solutions, as well as the accuracy of the approximate solutions.
Resumo:
Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date geo-textual objects (e.g., geo-tagged Tweets) such that their locations meet users’ need and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “dengue fever headache.” In this demonstration, we present SOPS, the Spatial-Keyword Publish/Subscribe System, that is capable of efficiently processing spatial keyword continuous queries. SOPS supports two types of queries: (1) Boolean Range Continuous (BRC) query that can be used to subscribe the geo-textual objects satisfying a boolean keyword expression and falling in a specified spatial region; (2) Temporal Spatial-Keyword Top-k Continuous (TaSK) query that continuously maintains up-to-date top-k most relevant results over a stream of geo-textual objects. SOPS enables users to formulate their queries and view the real-time results over a stream of geotextual objects by browser-based user interfaces. On the server side, we propose solutions to efficiently processing a large number of BRC queries (tens of millions) and TaSK queries over a stream of geo-textual objects.
Resumo:
A search query, being a very concise grounding of user intent, could potentially have many possible interpretations. Search engines hedge their bets by diversifying top results to cover multiple such possibilities so that the user is likely to be satisfied, whatever be her intended interpretation. Diversified Query Expansion is the problem of diversifying query expansion suggestions, so that the user can specialize the query to better suit her intent, even before perusing search results. We propose a method, Select-Link-Rank, that exploits semantic information from Wikipedia to generate diversified query expansions. SLR does collective processing of terms and Wikipedia entities in an integrated framework, simultaneously diversifying query expansions and entity recommendations. SLR starts with selecting informative terms from search results of the initial query, links them to Wikipedia entities, performs a diversity-conscious entity scoring and transfers such scoring to the term space to arrive at query expansion suggestions. Through an extensive empirical analysis and user study, we show that our method outperforms the state-of-the-art diversified query expansion and diversified entity recommendation techniques.
Resumo:
Poly--lactide (PLLA) is one of the most significant members of a group of polymers regarded as bioabsorbable. Degradation of PLLA proceeds through hydrolysis of the ester bonds in the polymer chains and is influenced significantly by the polymer's molecular weight and crystallinity. To evaluate the effects of processing and sterilisation on these properties, PLLA pellets were either compression moulded or extruded, subjected to annealing at 120°C for 4 h and sterilised by ethylene oxide (EtO) gas. Procedures were used to evaluate the mechanical properties, molecular weight and crystallinity. Upon processing, the crystallinity of the material fell from 61% for the PLLA pellets to 12% and 20% for the compressed and extruded components, respectively. After annealing, crystallinity increased to 43% for the compression-moulded material and 40% for the extruded material. Crystallinity further increased upon EtO sterilisation. A slight decrease in molecular weight was observed for the extruded material through processing, annealing and sterilisation. Young's modulus generally increased with increasing crystallinity, and extension at break and tensile strength decreased. The results from this investigation suggest that PLLA is sensitive to processing and sterilisation, altering properties critical to its degradation rate.