876 resultados para Keyword search
Resumo:
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration-where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to learn how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few ``top-'' results: This result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper, we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.
Resumo:
We present a system for keyword search on Cantonese conversational telephony audio, collected for the IARPA Babel program, that achieves good performance by combining postings lists produced by diverse speech recognition systems from three different research groups. We describe the keyword search task, the data on which the work was done, four different speech recognition systems, and our approach to system combination for keyword search. We show that the combination of four systems outperforms the best single system by 7%, achieving an actual term-weighted value of 0.517. © 2013 IEEE.
Resumo:
In this paper, we present a P2P-based database sharing system that provides information sharing capabilities through keyword-based search techniques. Our system requires neither a global schema nor schema mappings between different databases, and our keyword-based search algorithms are robust in the presence of frequent changes in the content and membership of peers. To facilitate data integration, we introduce keyword join operator to combine partial answers containing different keywords into complete answers. We also present an efficient algorithm that optimize the keyword join operations for partial answer integration. Our experimental study on both real and synthetic datasets demonstrates the effectiveness of our algorithms, and the efficiency of the proposed query processing strategies.
Resumo:
We consider the following problem: members in a dynamic group retrieve their encrypted data from an untrusted server based on keywords and without any loss of data confidentiality and member’s privacy. In this paper, we investigate common secure indices for conjunctive keyword-based retrieval over encrypted data, and construct an efficient scheme from Wang et al. dynamic accumulator, Nyberg combinatorial accumulator and Kiayias et al. public-key encryption system. The proposed scheme is trapdoorless and keyword-field free. The security is proved under the random oracle, decisional composite residuosity and extended strong RSA assumptions.
Resumo:
We consider the following problem: users of an organization wish to outsource the storage of sensitive data to a large database server. It is assumed that the server storing the data is untrusted so the data stored have to be encrypted. We further suppose that the manager of the organization has the right to access all data, but a member of the organization can not access any data alone. The member must collaborate with other members to search for the desired data. In this paper, we investigate the notion of threshold privacy preserving keyword search (TPPKS) and define its security requirements. We construct a TPPKS scheme and show the proof of security under the assumptions of intractability of discrete logarithm, decisional Diffie-Hellman and computational Diffie-Hellman problems.
Resumo:
This study examines the efficiency of search engine advertising strategies employed by firms. The research setting is the online retailing industry, which is characterized by extensive use of Web technologies and high competition for market share and profitability. For Internet retailers, search engines are increasingly serving as an information gateway for many decision-making tasks. In particular, Search engine advertising (SEA) has opened a new marketing channel for retailers to attract new customers and improve their performance. In addition to natural (organic) search marketing strategies, search engine advertisers compete for top advertisement slots provided by search brokers such as Google and Yahoo! through keyword auctions. The rationale being that greater visibility on a search engine during a keyword search will capture customers' interest in a business and its product or service offerings. Search engines account for most online activities today. Compared with the slow growth of traditional marketing channels, online search volumes continue to grow at a steady rate. According to the Search Engine Marketing Professional Organization, spending on search engine marketing by North American firms in 2008 was estimated at $13.5 billion. Despite the significant role SEA plays in Web retailing, scholarly research on the topic is limited. Prior studies in SEA have focused on search engine auction mechanism design. In contrast, research on the business value of SEA has been limited by the lack of empirical data on search advertising practices. Recent advances in search and retail technologies have created datarich environments that enable new research opportunities at the interface of marketing and information technology. This research uses extensive data from Web retailing and Google-based search advertising and evaluates Web retailers' use of resources, search advertising techniques, and other relevant factors that contribute to business performance across different metrics. The methods used include Data Envelopment Analysis (DEA), data mining, and multivariate statistics. This research contributes to empirical research by analyzing several Web retail firms in different industry sectors and product categories. One of the key findings is that the dynamics of sponsored search advertising vary between multi-channel and Web-only retailers. While the key performance metrics for multi-channel retailers include measures such as online sales, conversion rate (CR), c1ick-through-rate (CTR), and impressions, the key performance metrics for Web-only retailers focus on organic and sponsored ad ranks. These results provide a useful contribution to our organizational level understanding of search engine advertising strategies, both for multi-channel and Web-only retailers. These results also contribute to current knowledge in technology-driven marketing strategies and provide managers with a better understanding of sponsored search advertising and its impact on various performance metrics in Web retailing.
Resumo:
Custom coded module for TikiWiki v4.2, allowing keyword search by Category. Auto-populates categories as new ones are created.
Resumo:
Our research project develops an intranet search engine with concept- browsing functionality, where the user is able to navigate the conceptual level in an interactive, automatically generated knowledge map. This knowledge map visualizes tacit, implicit knowledge, extracted from the intranet, as a network of semantic concepts. Inductive and deductive methods are combined; a text ana- lytics engine extracts knowledge structures from data inductively, and the en- terprise ontology provides a backbone structure to the process deductively. In addition to performing conventional keyword search, the user can browse the semantic network of concepts and associations to find documents and data rec- ords. Also, the user can expand and edit the knowledge network directly. As a vision, we propose a knowledge-management system that provides concept- browsing, based on a knowledge warehouse layer on top of a heterogeneous knowledge base with various systems interfaces. Such a concept browser will empower knowledge workers to interact with knowledge structures.
Resumo:
While semantic search technologies have been proven to work well in specific domains, they still have to confront two main challenges to scale up to the Web in its entirety. In this work we address this issue with a novel semantic search system that a) provides the user with the capability to query Semantic Web information using natural language, by means of an ontology-based Question Answering (QA) system [14] and b) complements the specific answers retrieved during the QA process with a ranked list of documents from the Web [3]. Our results show that ontology-based semantic search capabilities can be used to complement and enhance keyword search technologies.
Resumo:
Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.
Resumo:
Background: Work-related injuries in Australia are estimated to cost around $57.5 billion annually, however there are currently insufficient surveillance data available to support an evidence-based public health response. Emergency departments (ED) in Australia are a potential source of information on work-related injuries though most ED’s do not have an ‘Activity Code’ to identify work-related cases with information about the presenting problem recorded in a short free text field. This study compared methods for interrogating text fields for identifying work-related injuries presenting at emergency departments to inform approaches to surveillance of work-related injury.---------- Methods: Three approaches were used to interrogate an injury description text field to classify cases as work-related: keyword search, index search, and content analytic text mining. Sensitivity and specificity were examined by comparing cases flagged by each approach to cases coded with an Activity code during triage. Methods to improve the sensitivity and/or specificity of each approach were explored by adjusting the classification techniques within each broad approach.---------- Results: The basic keyword search detected 58% of cases (Specificity 0.99), an index search detected 62% of cases (Specificity 0.87), and the content analytic text mining (using adjusted probabilities) approach detected 77% of cases (Specificity 0.95).---------- Conclusions The findings of this study provide strong support for continued development of text searching methods to obtain information from routine emergency department data, to improve the capacity for comprehensive injury surveillance.
Resumo:
This paper gives an overview of the INEX 2008 Ad Hoc Track. The main goals of the Ad Hoc Track were two-fold. The first goal was to investigate the value of the internal document structure (as provided by the XML mark-up) for retrieving relevant information. This is a continuation of INEX 2007 and, for this reason, the retrieval results are liberalized to arbitrary passages and measures were chosen to fairly compare systems retrieving elements, ranges of elements, and arbitrary passages. The second goal was to compare focused retrieval to article retrieval more directly than in earlier years. For this reason, standard document retrieval rankings have been derived from all runs, and evaluated with standard measures. In addition, a set of queries targeting Wikipedia have been derived from a proxy log, and the runs are also evaluated against the clicked Wikipedia pages. The INEX 2008 Ad Hoc Track featured three tasks: For the Focused Task a ranked-list of nonoverlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the results for the three tasks, and examine the relative effectiveness of element and passage retrieval. This is examined in the context of content only (CO, or Keyword) search as well as content and structure (CAS, or structured) search. Finally, we look at the ability of focused retrieval techniques to rank articles, using standard document retrieval techniques, both against the judged topics as well as against queries and clicks from a proxy log.
Resumo:
The DADAISM project brings together researchers from the diverse fields of archaeology, human computer interaction, image processing, image search and retrieval, and text mining to create a rich interactive system to address the problems of researchers finding images relevant to their research. In the age of digital photography, thousands of images are taken of archaeological artefacts. These images could help archaeologists enormously in their tasks of classification and identification if they could be related to one another effectively. They would yield many new insights on a range of archaeological problems. However, these images are currently greatly underutilized for two key reasons. Firstly, the current paradigm for interaction with image collections is basic keyword search or, at best, simple faceted search. Secondly, even if these interactions are possible, the metadata related to the majority of images of archaeological artefacts is scarce in information relating to the content of the image and the nature of the artefact, and is time intensive to enter manually. DADAISM will transform the way in which archaeologists interact with online image collections. It will deploy user-centred design methodologies to create an interactive system that goes well beyond current systems for working with images, and will support archaeologists’ tasks of finding, organising, relating and labelling images as well as other relevant sources of information such as grey literature documents.
Resumo:
Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system. © 2013 IEEE.