20 resultados para Similarity Query
em Aston University Research Archive
Resumo:
In April 2009, Google Images added a filter for narrowing search results by colour. Several other systems for searching image databases by colour were also released around this time. These colour-based image retrieval systems enable users to search image databases either by selecting colours from a graphical palette (i.e., query-by-colour), by drawing a representation of the colour layout sought (i.e., query-by-sketch), or both. It was comments left by readers of online articles describing these colour-based image retrieval systems that provided us with the inspiration for this research. We were surprised to learn that the underlying query-based technology used in colour-based image retrieval systems today remains remarkably similar to that of systems developed nearly two decades ago. Discovering this ageing retrieval approach, as well as uncovering a large user demographic requiring image search by colour, made us eager to research more effective approaches for colour-based image retrieval. In this thesis, we detail two user studies designed to compare the effectiveness of systems adopting similarity-based visualisations, query-based approaches, or a combination of both, for colour-based image retrieval. In contrast to query-based approaches, similarity-based visualisations display and arrange database images so that images with similar content are located closer together on screen than images with dissimilar content. This removes the need for queries, as users can instead visually explore the database using interactive navigation tools to retrieve images from the database. As we found existing evaluation approaches to be unreliable, we describe how we assessed and compared systems adopting similarity-based visualisations, query-based approaches, or both, meaningfully and systematically using our Mosaic Test - a user-based evaluation approach in which evaluation study participants complete an image mosaic of a predetermined target image using the colour-based image retrieval system under evaluation.
Resumo:
PowerAqua is a Question Answering system, which takes as input a natural language query and is able to return answers drawn from relevant semantic resources found anywhere on the Semantic Web. In this paper we provide two novel contributions: First, we detail a new component of the system, the Triple Similarity Service, which is able to match queries effectively to triples found in different ontologies on the Semantic Web. Second, we provide a first evaluation of the system, which in addition to providing data about PowerAqua's competence, also gives us important insights into the issues related to using the Semantic Web as the target answer set in Question Answering. In particular, we show that, despite the problems related to the noisy and incomplete conceptualizations, which can be found on the Semantic Web, good results can already be obtained.
Resumo:
Jaccard has been the choice similarity metric in ecology and forensic psychology for comparison of sites or offences, by species or behaviour. This paper applies a more powerful hierarchical measure - taxonomic similarity (s), recently developed in marine ecology - to the task of behaviourally linking serial crime. Forensic case linkage attempts to identify behaviourally similar offences committed by the same unknown perpetrator (called linked offences). s considers progressively higher-level taxa, such that two sites show some similarity even without shared species. We apply this index by analysing 55 specific offence behaviours classified hierarchically. The behaviours are taken from 16 sexual offences by seven juveniles where each offender committed two or more offences. We demonstrate that both Jaccard and s show linked offences to be significantly more similar than unlinked offences. With up to 20% of the specific behaviours removed in simulations, s is equally or more effective at distinguishing linked offences than where Jaccard uses a full data set. Moreover, s retains significant difference between linked and unlinked pairs, with up to 50% of the specific behaviours removed. As police decision-making often depends upon incomplete data, s has clear advantages and its application may extend to other crime types. Copyright © 2007 John Wiley & Sons, Ltd.
Resumo:
There is evidence for both advantages and disadvantages in normal recognition of living over nonliving things. This paradox has been attributed to high levels of perceptual similarity within living categories having a different effect on performance in different contexts. However, since living things are intrinsically more similar to each other, previous studies could not determine whether the various category effects were due to perceptual similarity, or to other characteristics of living things. We used novel animal and vehicle stimuli that were matched for similarity to measure the influence of perceptual similarity in different contexts. We found that displaying highly similar objects in blocked sets reduced their perceived similarity, eliminating the detrimental effect on naming performance. Experiment 1 demonstrated a disadvantage for highly similar objects in name learning and name verification using mixed groups of similar and dissimilar animals and vehicles. Experiment 2 demonstrated no disadvantage for the same highly similar objects when they were blocked, e.g., similar animals presented alone. Thus, perceptual similarity, rather than other characteristics particular to living things, is affected by context, and could create apparent category effects under certain testing conditions.
Resumo:
Database systems have a user interface one of the components of which will normally be a query language which is based on a particular data model. Typically data models provide primitives to define, manipulate and query databases. Often these primitives are designed to form self-contained query languages. This thesis describes a prototype implementation of a system which allows users to specify queries against the database in a query language whose primitives are not those provided by the actual model on which the database system is based, but those provided by a different data model. The implementation chosen is the Functional Query Language Front End (FQLFE). This uses the Daplex functional data model and query language. Using FQLFE, users can specify the underlying database (based on the relational model) in terms of Daplex. Queries against this specified view can then be made in Daplex. FQLFE transforms these queries into the query language (Quel) of the underlying target database system (Ingres). The automation of part of the Daplex function definition phase is also described and its implementation discussed.
Resumo:
We address the question of how to communicate among distributed processes valuessuch as real numbers, continuous functions and geometrical solids with arbitrary precision, yet efficiently. We extend the established concept of lazy communication using streams of approximants by introducing explicit queries. We formalise this approach using protocols of a query-answer nature. Such protocols enable processes to provide valid approximations with certain accuracy and focusing on certain locality as demanded by the receiving processes through queries. A lattice-theoretic denotational semantics of channel and process behaviour is developed. Thequery space is modelled as a continuous lattice in which the top element denotes the query demanding all the information, whereas other elements denote queries demanding partial and/or local information. Answers are interpreted as elements of lattices constructed over suitable domains of approximations to the exact objects. An unanswered query is treated as an error anddenoted using the top element. The major novel characteristic of our semantic model is that it reflects the dependency of answerson queries. This enables the definition and analysis of an appropriate concept of convergence rate, by assigning an effort indicator to each query and a measure of information content to eachanswer. Thus we capture not only what function a process computes, but also how a process transforms the convergence rates from its inputs to its outputs. In future work these indicatorscan be used to capture further computational complexity measures. A robust prototype implementation of our model is available.
Resumo:
We develop and study the concept of dataflow process networks as used for exampleby Kahn to suit exact computation over data types related to real numbers, such as continuous functions and geometrical solids. Furthermore, we consider communicating these exact objectsamong processes using protocols of a query-answer nature as introduced in our earlier work. This enables processes to provide valid approximations with certain accuracy and focusing on certainlocality as demanded by the receiving processes through queries. We define domain-theoretical denotational semantics of our networks in two ways: (1) directly, i. e. by viewing the whole network as a composite process and applying the process semantics introduced in our earlier work; and (2) compositionally, i. e. by a fixed-point construction similarto that used by Kahn from the denotational semantics of individual processes in the network. The direct semantics closely corresponds to the operational semantics of the network (i. e. it iscorrect) but very difficult to study for concrete networks. The compositional semantics enablescompositional analysis of concrete networks, assuming it is correct. We prove that the compositional semantics is a safe approximation of the direct semantics. Wealso provide a method that can be used in many cases to establish that the two semantics fully coincide, i. e. safety is not achieved through inactivity or meaningless answers. The results are extended to cover recursively-defined infinite networks as well as nested finitenetworks. A robust prototype implementation of our model is available.
Resumo:
Modelling class B G-protein-coupled receptors (GPCRs) using class A GPCR structural templates is difficult due to lack of homology. The plant GPCR, GCR1, has homology to both class A and class B GPCRs. We have used this to generate a class A-class B alignment, and by incorporating maximum lagged correlation of entropy and hydrophobicity into a consensus score, we have been able to align receptor transmembrane regions. We have applied this analysis to generate active and inactive homology models of the class B calcitonin gene-related peptide (CGRP) receptor, and have supported it with site-directed mutagenesis data using 122 CGRP receptor residues and 144 published mutagenesis results on other class B GPCRs. The variation of sequence variability with structure, the analysis of polarity violations, the alignment of group-conserved residues and the mutagenesis results at 27 key positions were particularly informative in distinguishing between the proposed and plausible alternative alignments. Furthermore, we have been able to associate the key molecular features of the class B GPCR signalling machinery with their class A counterparts for the first time. These include the [K/R]KLH motif in intracellular loop 1, [I/L]xxxL and KxxK at the intracellular end of TM5 and TM6, the NPXXY/VAVLY motif on TM7 and small group-conserved residues in TM1, TM2, TM3 and TM7. The equivalent of the class A DRY motif is proposed to involve Arg(2.39), His(2.43) and Glu(3.46), which makes a polar lock with T(6.37). These alignments and models provide useful tools for understanding class B GPCR function.
Resumo:
In this paper, we propose a new similarity measure to compute the pair-wise similarity of text-based documents based on patterns of the words in the documents. First we develop a kappa measure for pair-wise comparison of documents then we use ordered weighting averaging operator to define a document similarity measure for a set of documents.
Resumo:
Formulating complex queries is hard, especially when users cannot understand all the data structures of multiple complex knowledge bases. We see a gap between simplistic but user friendly tools and formal query languages. Building on an example comparison search, we propose an approach in which reusable search components take an intermediary role between the user interface and formal query languages.
Resumo:
This dissertation studies the caching of queries and how to cache in an efficient way, so that retrieving previously accessed data does not need any intermediary nodes between the data-source peer and the querying peer in super-peer P2P network. A precise algorithm was devised that demonstrated how queries can be deconstructed to provide greater flexibility for reusing their constituent elements. It showed how subsequent queries can make use of more than one previous query and any part of those queries to reconstruct direct data communication with one or more source peers that have supplied data previously. In effect, a new query can search and exploit the entire cached list of queries to construct the list of the data locations it requires that might match any locations previously accessed. The new method increases the likelihood of repeat queries being able to reuse earlier queries and provides a viable way of by-passing shared data indexes in structured networks. It could also increase the efficiency of unstructured networks by reducing traffic and the propensity for network flooding. In addition, performance evaluation for predicting query routing performance by using a UML sequence diagram is introduced. This new method of performance evaluation provides designers with information about when it is most beneficial to use caching and how the peer connections can optimize its exploitation.
Resumo:
The expansion of the Internet has made the task of searching a crucial one. Internet users, however, have to make a great effort in order to formulate a search query that returns the required results. Many methods have been devised to assist in this task by helping the users modify their query to give better results. In this paper we propose an interactive method for query expansion. It is based on the observation that documents are often found to contain terms with high information content, which can summarise their subject matter. We present experimental results, which demonstrate that our approach significantly shortens the time required in order to accomplish a certain task by performing web searches.
Resumo:
Background - Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. Results - The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 AROC for the MHCBench data sets (up from 0.756), and an average of 0.96 AROC for multiple alleles of the MHCPEP database. Conclusion - The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems.
Resumo:
A set of 38 epitopes and 183 non-epitopes, which bind to alleles of the HLA-A3 supertype, was subjected to a combination of comparative molecular similarity indices analysis (CoMSIA) and soft independent modeling of class analogy (SIMCA). During the process of T cell recognition, T cell receptors (TCR) interact with the central section of the bound nonamer peptide; thus only positions 4−8 were considered in the study. The derived model distinguished 82% of the epitopes and 73% of the non-epitopes after cross-validation in five groups. The overall preference from the model is for polar amino acids with high electron density and the ability to form hydrogen bonds. These so-called “aggressive” amino acids are flanked by small-sized residues, which enable such residues to protrude from the binding cleft and take an active role in TCR-mediated T cell recognition. Combinations of “aggressive” and “passive” amino acids in the middle part of epitopes constitute a putative TCR binding motif
Resumo:
Epitope identification is the basis of modern vaccine design. The present paper studied the supermotif of the HLA-A3 superfamily, using comparative molecular similarity indices analysis (CoMSIA). Four alleles with high phenotype frequencies were used: A*1101, A*0301, A*3101 and A*6801. Five physicochemical properties—steric bulk, electrostatic potential, local hydro-phobicity, hydrogen-bond donor and acceptor abilities—were considered and ‘all fields’ models were produced for each of the alleles. The models have a moderate level of predictivity and there is a good correlation between the data. A revised HLA-A3 supermotif was defined based on the comparison of favoured and disfavoured properties for each position of the MHC bound peptide. The present study demonstrated that CoMSIA is an effective tool for studying peptide–MHC interactions.