919 resultados para query reformulation, search pattern, search strategy


Relevância:

50.00% 50.00%

Publicador:

Resumo:

In the last few years, there has been a wide development in the research on textual information systems. The goal is to improve these systems in order to allow an easy localization, treatment and access to the information stored in digital format (Digital Databases, Documental Databases, and so on). There are lots of applications focused on information access (for example, Web-search systems like Google or Altavista). However, these applications have problems when they must access to cross-language information, or when they need to show information in a language different from the one of the query. This paper explores the use of syntactic-sematic patterns as a method to access to multilingual information, and revise, in the case of Information Retrieval, where it is possible and useful to employ patterns when it comes to the multilingual and interactive aspects. On the one hand, the multilingual aspects that are going to be studied are the ones related to the access to documents in different languages from the one of the query, as well as the automatic translation of the document, i.e. a machine translation system based on patterns. On the other hand, this paper is going to go deep into the interactive aspects related to the reformulation of a query based on the syntactic-semantic pattern of the request.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper aims to identify the Mediterranean States’ potential in adopting a regional strategy on climate change adaptation. The author proposes a Mediterranean Strategy on Adaptation to Climate Change as the first step to a political/legal regional approach to climate change issues that would supplement the multilateral process under the United Nations Framework Convention on Climate Change and the Kyoto Protocol. According to the author such a strategy would enhance cooperation between the EU and other Mediterranean states in various ways. The experience of the EU in regulating climate change and its ever growing knowledge-base on its impacts could serve to guide the other Mediterranean states’ and help bridge their knowledge-base gap on the topic. On the other hand, the support and cooperation of the EU’s Mediterranean partners would provide an opportunity for the EU to address better the challenges the climate change threatens to bring in its southernmost regions. The strategy could eventually even pave the way for the very first regional treaty on climate change that could be negotiated under the auspices of the Regional Seas Programme and the Union for the Mediterranean.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In multimedia retrieval, a query is typically interactively refined towards the ‘optimal’ answers by exploiting user feedback. However, in existing work, in each iteration, the refined query is re-evaluated. This is not only inefficient but fails to exploit the answers that may be common between iterations. In this paper, we introduce a new approach called SaveRF (Save random accesses in Relevance Feedback) for iterative relevance feedback search. SaveRF predicts the potential candidates for the next iteration and maintains this small set for efficient sequential scan. By doing so, repeated candidate accesses can be saved, hence reducing the number of random accesses. In addition, efficient scan on the overlap before the search starts also tightens the search space with smaller pruning radius. We implemented SaveRF and our experimental study on real life data sets show that it can reduce the I/O cost significantly.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

For a submitted query to multiple search engines finding relevant results is an important task. This paper formulates the problem of aggregation and ranking of multiple search engines results in the form of a minimax linear programming model. Besides the novel application, this study detects the most relevant information among a return set of ranked lists of documents retrieved by distinct search engines. Furthermore, two numerical examples aree used to illustrate the usefulness of the proposed approach.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Over recent years, evidence has been accumulating in favour of the importance of long-term information as a variable which can affect the success of short-term recall. Lexicality, word frequency, imagery and meaning have all been shown to augment short term recall performance. Two competing theories as to the causes of this long-term memory influence are outlined and tested in this thesis. The first approach is the order-encoding account, which ascribes the effect to the usage of resources at encoding, hypothesising that word lists which require less effort to process will benefit from increased levels of order encoding, in turn enhancing recall success. The alternative view, trace redintegration theory, suggests that order is automatically encoded phonologically, and that long-term information can only influence the interpretation of the resultant memory trace. The free recall experiments reported here attempted to determine the importance of order encoding as a facilitatory framework and to determine the locus of the effects of long-term information in free recall. Experiments 1 and 2 examined the effects of word frequency and semantic categorisation over a filled delay, and experiments 3 and 4 did the same for immediate recall. Free recall was improved by both long-term factors tested. Order information was not used over a short filled delay, but was evident in immediate recall. Furthermore, it was found that both long-term factors increased the amount of order information retained. Experiment 5 induced an order encoding effect over a filled delay, leaving a picture of short-term processes which are closely associated with long-term processes, and which fit conceptions of short-term memory being part of language processes rather better than either the encoding or the retrieval-based models. Experiments 6 and 7 aimed to determine to what extent phonological processes were responsible for the pattern of results observed. Articulatory suppression affected the encoding of order information where speech rate had no direct influence, suggesting that it is ease of lexical access which is the most important factor in the influence of long-term memory on immediate recall tasks. The evidence presented in this thesis does not offer complete support for either the retrieval-based account or the order encoding account of long-term influence. Instead, the evidence sits best with models that are based upon language-processing. The path urged for future research is to find ways in which this diffuse model can be better specified, and which can take account of the versatility of the human brain.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Difficulties in visual attention are increasingly being linked to dyslexia. To date, the majority of studies have inferred functionality of attention from response times to stimuli presented for an indefinite duration. However, in paradigms that use reaction times to investigate the ability to orient attention, a delayed reaction time could also indicate difficulties in signal enhancement or noise exclusion once oriented. Thus, in order to investigate attention modulation and visual crowding effects in dyslexia, this study measured stimulus discrimination accuracy to rapidly presented displays. Adults with dyslexia (AwD) and controls discriminated the orientation of a target in an array of different numbers of - and differently spaced - vertically orientated distractors. Results showed that AwD: were disproportionately impacted by (i) close spacing and (ii) increased numbers of stimuli, (iii) did use pre-cues to modulate attention, but (iv) used cues less successfully to counter effects of increasing numbers of distractors. A greater dependence on pre-cues, larger effects of crowding and the impact of increased numbers of distractors all correlated significantly with measures of literacy. These findings extend previous studies of visual crowding of letters in dyslexia to non-complex stimuli. Overall, AwD do not use cues less, but they do use cues less successfully. We conclude that visual attention is an important factor to consider in the aetiology of dyslexia. The results challenge existing theoretical accounts of visual attention deficits, which alone are unable to comprehensively explain the pattern of findings demonstrated here.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

When a query is passed to multiple search engines, each search engine returns a ranked list of documents. Researchers have demonstrated that combining results, in the form of a "metasearch engine", produces a significant improvement in coverage and search effectiveness. This paper proposes a linear programming mathematical model for optimizing the ranked list result of a given group of Web search engines for an issued query. An application with a numerical illustration shows the advantages of the proposed method. © 2011 Elsevier Ltd. All rights reserved.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

While semantic search technologies have been proven to work well in specific domains, they still have to confront two main challenges to scale up to the Web in its entirety. In this work we address this issue with a novel semantic search system that a) provides the user with the capability to query Semantic Web information using natural language, by means of an ontology-based Question Answering (QA) system [14] and b) complements the specific answers retrieved during the QA process with a ranked list of documents from the Web [3]. Our results show that ontology-based semantic search capabilities can be used to complement and enhance keyword search technologies.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Existing semantic search tools have been primarily designed to enhance the performance of traditional search technologies but with little support for ordinary end users who are not necessarily familiar with domain specific semantic data, ontologies, or SQL-like query languages. This paper presents SemSearch, a search engine, which pays special attention to this issue by providing several means to hide the complexity of semantic search from end users and thus make it easy to use and effective.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Background Qualitative research makes an important contribution to our understanding of health and healthcare. However, qualitative evidence can be difficult to search for and identify, and the effectiveness of different types of search strategies is unknown. Methods Three search strategies for qualitative research in the example area of support for breast-feeding were evaluated using six electronic bibliographic databases. The strategies were based on using thesaurus terms, free-text terms and broad-based terms. These strategies were combined with recognised search terms for support for breast-feeding previously used in a Cochrane review. For each strategy, we evaluated the recall (potentially relevant records found) and precision (actually relevant records found). Results A total yield of 7420 potentially relevant records was retrieved by the three strategies combined. Of these, 262 were judged relevant. Using one strategy alone would miss relevant records. The broad-based strategy had the highest recall and the thesaurus strategy the highest precision. Precision was generally poor: 96% of records initially identified as potentially relevant were deemed irrelevant. Searching for qualitative research involves trade-offs between recall and precision. Conclusions These findings confirm that strategies that attempt to maximise the number of potentially relevant records found are likely to result in a large number of false positives. The findings also suggest that a range of search terms is required to optimise searching for qualitative evidence. This underlines the problems of current methods for indexing qualitative research in bibliographic databases and indicates where improvements need to be made.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Reverse engineering is usually the stepping stone of a variety of at-tacks aiming at identifying sensitive information (keys, credentials, data, algo-rithms) or vulnerabilities and flaws for broader exploitation. Software applica-tions are usually deployed as identical binary code installed on millions of com-puters, enabling an adversary to develop a generic reverse-engineering strategy that, if working on one code instance, could be applied to crack all the other in-stances. A solution to mitigate this problem is represented by Software Diversity, which aims at creating several structurally different (but functionally equivalent) binary code versions out of the same source code, so that even if a successful attack can be elaborated for one version, it should not work on a diversified ver-sion. In this paper, we address the problem of maximizing software diversity from a search-based optimization point of view. The program to protect is subject to a catalogue of transformations to generate many candidate versions. The problem of selecting the subset of most diversified versions to be deployed is formulated as an optimisation problem, that we tackle with different search heuristics. We show the applicability of this approach on some popular Android apps.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Nurse rostering is a complex scheduling problem that affects hospital personnel on a daily basis all over the world. This paper presents a new component-based approach with adaptive perturbations, for a nurse scheduling problem arising at a major UK hospital. The main idea behind this technique is to decompose a schedule into its components (i.e. the allocated shift pattern of each nurse), and then mimic a natural evolutionary process on these components to iteratively deliver better schedules. The worthiness of all components in the schedule has to be continuously demonstrated in order for them to remain there. This demonstration employs a dynamic evaluation function which evaluates how well each component contributes towards the final objective. Two perturbation steps are then applied: the first perturbation eliminates a number of components that are deemed not worthy to stay in the current schedule; the second perturbation may also throw out, with a low level of probability, some worthy components. The eliminated components are replenished with new ones using a set of constructive heuristics using local optimality criteria. Computational results using 52 data instances demonstrate the applicability of the proposed approach in solving real-world problems.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.