930 resultados para Similarity queries
Resumo:
and supplements
Resumo:
Mode of access: Internet.
Resumo:
Editors: 1891-1901, W.P.W. Phillimore (with S.J. Madge, 1896-1901)
Resumo:
3d ser., v.1-3 also called v.21-23
Resumo:
Se presenta el desarrollo de una interface de recuperación de información para catálogos en línea de acceso público (plataforma CDS/ISIS), basada en el concepto de similaridad para generar los resultados de una búsqueda ordenados por posible relevancia. Se expresan los fundamentos teóricos involucrados, para luego detallar la forma en que se efectuó su aplicación tecnológica, explícita a nivel de programación. Para finalizar se esbozan los problemas de implementación según el entorno
Resumo:
The data structure of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. This research develops a methodology for evaluating, ex ante, the relative desirability of alternative data structures for end user queries. This research theorizes that the data structure that yields the lowest weighted average complexity for a representative sample of information requests is the most desirable data structure for end user queries. The theory was tested in an experiment that compared queries from two different relational database schemas. As theorized, end users querying the data structure associated with the less complex queries performed better Complexity was measured using three different Halstead metrics. Each of the three metrics provided excellent predictions of end user performance. This research supplies strong evidence that organizations can use complexity metrics to evaluate, ex ante, the desirability of alternate data structures. Organizations can use these evaluations to enhance the efficient and effective retrieval of information by creating data structures that minimize end user query complexity.
Resumo:
Applications of the axisymmetric Boussinesq equation to groundwater hydrology and reservoir engineering have long been recognised. An archetypal example is invasion by drilling fluid into a permeable bed where there is initially no such fluid present, a circumstance of some importance in the oil industry. It is well known that the governing Boussinesq model can be reduced to a nonlinear ordinary differential equation using a similarity variable, a transformation that is valid for a certain time-dependent flux at the origin. Here, a new analytical approximation is obtained for this case. The new solution,, which has a simple form, is demonstrated to be highly accurate. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
Scorpion toxins are important experimental tools for characterization of vast array of ion channels and serve as scaffolds for drug design. General public database entries contain limited annotation whereby rich structure-function information from mutation studies is typically not available. SCORPION2 contains more than 800 records of native and mutant toxin sequences enriched with binding affinity and toxicity information, 624 three-dimensional structures and some 500 references. SCORPION2 has a set of search and prediction tools that allow users to extract and perform specific queries: text searches of scorpion toxin records, sequence similarity search, extraction of sequences, visualization of scorpion toxin structures, analysis of toxic activity, and functional annotation of previously uncharacterized scorpion toxins. The SCORPION2 database is available at http://sdmc.i2r.a-star.edu.sg/scorpion/. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
One way to achieve the large sample sizes required for genetic studies of complex traits is to combine samples collected by different groups. It is not often clear, however, whether this practice is reasonable from a genetic perspective. To assess the comparability of samples from the Australian and the Netherlands twin studies, we estimated F,, (the proportion of total genetic variability attributable to genetic differences between cohorts) based on 359 short tandem repeat polymorphisms in 1068 individuals. IF,, was estimated to be 0.30% between the Australian and the Netherlands cohorts, a smaller value than between many European groups. We conclude that it is reasonable to combine the Australian and the Netherlands samples for joint genetic analyses.
Resumo:
Humans play a role in deciding the fate of species in the current extinction wave. Because of the previous Similarity Principle, physical attractiveness and likeability, it has been argued that public choice favours the survival of species that satisfy these criteria at the expense of other species. This paper empirically tests this argument by considering a hypothetical ‘Ark’ situation. Surveys of 204 members of the Australian public inquired whether they are in favour of the survival of each of 24 native mammal, bird and reptile species (prior to and after information provision about each species). The species were ranked by percentage of ‘yes’ votes received. Species composition by taxon in various fractions of the ranking was determined. If the previous Similarity Principle holds, mammals should rank highly and dominate the top fractions of animals saved in the hierarchical list. We find that although mammals would be over-represented in the ‘Ark’, birds and reptiles are unlikely to be excluded when social choice is based on numbers ‘voting’ for the survival of each species. Support for the previous Similarity Principle is apparent particularly after information provision. Public policy implications of this are noted and recommendations are given.
Resumo:
In this paper, we propose a novel high-dimensional index method, the BM+-tree, to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a rotary binary hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the M+-tree. Compared with the key dimension concept in the M+-tree, the binary hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Experimental results using two types of real data sets illustrate a significantly improved filtering efficiency.
Resumo:
Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.
Resumo:
Many queries sent to search engines refer to specific locations in the world. Location-based queries try to find local services and facilities around the user’s environment or in a particular area. This paper reviews the specifications of geospatial queries and discusses the similarities and differences between location-based queries and other queries. We introduce nine patterns for location-based queries containing either a service name alone or a service name accompanied by a location name. Our survey indicates that at least 22% of the Web queries have a geospatial dimension and most of these can be considered as location-based queries. We propose that location-based queries should be treated different from general queries to produce more relevant results.