924 resultados para XML, Information, Retrieval, Query, Language


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Integrating information in the molecular biosciences involves more than the cross-referencing of sequences or structures. Experimental protocols, results of computational analyses, annotations and links to relevant literature form integral parts of this information, and impart meaning to sequence or structure. In this review, we examine some existing approaches to integrating information in the molecular biosciences. We consider not only technical issues concerning the integration of heterogeneous data sources and the corresponding semantic implications, but also the integration of analytical results. Within the broad range of strategies for integration of data and information, we distinguish between platforms and developments. We discuss two current platforms and six current developments, and identify what we believe to be their strengths and limitations. We identify key unsolved problems in integrating information in the molecular biosciences, and discuss possible strategies for addressing them including semantic integration using ontologies, XML as a data model, and graphical user interfaces as integrative environments.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Multiresolution Triangular Mesh (MTM) models are widely used to improve the performance of large terrain visualization by replacing the original model with a simplified one. MTM models, which consist of both original and simplified data, are commonly stored in spatial database systems due to their size. The relatively slow access speed of disks makes data retrieval the bottleneck of such terrain visualization systems. Existing spatial access methods proposed to address this problem rely on main-memory MTM models, which leads to significant overhead during query processing. In this paper, we approach the problem from a new perspective and propose a novel MTM called direct mesh that is designed specifically for secondary storage. It supports available indexing methods natively and requires no modification to MTM structure. Experiment results, which are based on two real-world data sets, show an average performance improvement of 5-10 times over the existing methods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In many advanced applications, data are described by multiple high-dimensional features. Moreover, different queries may weight these features differently; some may not even specify all the features. In this paper, we propose our solution to support efficient query processing in these applications. We devise a novel representation that compactly captures f features into two components: The first component is a 2D vector that reflects a distance range ( minimum and maximum values) of the f features with respect to a reference point ( the center of the space) in a metric space and the second component is a bit signature, with two bits per dimension, obtained by analyzing each feature's descending energy histogram. This representation enables two levels of filtering: The first component prunes away points that do not share similar distance ranges, while the bit signature filters away points based on the dimensions of the relevant features. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B+-tree for this purpose. We also propose a KNN search algorithm that exploits the access orders of critical dimensions of highly selective features and partial distances to prune the search space more effectively. Our extensive experiments on both real-life and synthetic data sets show that the proposed solution offers significant performance advantages over sequential scan and retrieval methods using single and multiple VA-files.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we compare a well-known semantic spacemodel, Latent Semantic Analysis (LSA) with another model, Hyperspace Analogue to Language (HAL) which is widely used in different area, especially in automatic query refinement. We conduct this comparative analysis to prove our hypothesis that with respect to ability of extracting the lexical information from a corpus of text, LSA is quite similar to HAL. We regard HAL and LSA as black boxes. Through a Pearsonrsquos correlation analysis to the outputs of these two black boxes, we conclude that LSA highly co-relates with HAL and thus there is a justification that LSA and HAL can potentially play a similar role in the area of facilitating automatic query refinement. This paper evaluates LSA in a new application area and contributes an effective way to compare different semantic space models.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A progressive spatial query retrieves spatial data based on previous queries (e.g., to fetch data in a more restricted area with higher resolution). A direct query, on the other side, is defined as an isolated window query. A multi-resolution spatial database system should support both progressive queries and traditional direct queries. It is conceptually challenging to support both types of query at the same time, as direct queries favour location-based data clustering, whereas progressive queries require fragmented data clustered by resolutions. Two new scaleless data structures are proposed in this paper. Experimental results using both synthetic and real world datasets demonstrate that the query processing time based on the new multiresolution approaches is comparable and often better than multi-representation data structures for both types of queries.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Music similarity query based on acoustic content is becoming important with the ever-increasing growth of the music information from emerging applications such as digital libraries and WWW. However, relative techniques are still in their infancy and much less than satisfactory. In this paper, we present a novel index structure, called Composite Feature tree, CF-tree, to facilitate efficient content-based music search adopting multiple musical features. Before constructing the tree structure, we use PCA to transform the extracted features into a new space sorted by the importance of acoustic features. The CF-tree is a balanced multi-way tree structure where each level represents the data space at different dimensionalities. The PCA transformed data and reduced dimensions in the upper levels can alleviate suffering from dimensionality curse. To accurately mimic human perception, an extension, named CF+-tree, is proposed, which further applies multivariable regression to determine the weight of each individual feature. We conduct extensive experiments to evaluate the proposed structures against state-of-art techniques. The experimental results demonstrate superiority of our technique.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Retrieving large amounts of information over wide area networks, including the Internet, is problematic due to issues arising from latency of response, lack of direct memory access to data serving resources, and fault tolerance. This paper describes a design pattern for solving the issues of handling results from queries that return large amounts of data. Typically these queries would be made by a client process across a wide area network (or Internet), with one or more middle-tiers, to a relational database residing on a remote server. The solution involves implementing a combination of data retrieval strategies, including the use of iterators for traversing data sets and providing an appropriate level of abstraction to the client, double-buffering of data subsets, multi-threaded data retrieval, and query slicing. This design has recently been implemented and incorporated into the framework of a commercial software product developed at Oracle Corporation.