999 resultados para range query


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Language Modeling (LM) has been successfully applied to Information Retrieval (IR). However, most of the existing LM approaches only rely on term occurrences in documents, queries and document collections. In traditional unigram based models, terms (or words) are usually considered to be independent. In some recent studies, dependence models have been proposed to incorporate term relationships into LM, so that links can be created between words in the same sentence, and term relationships (e.g. synonymy) can be used to expand the document model. In this study, we further extend this family of dependence models in the following two ways: (1) Term relationships are used to expand query model instead of document model, so that query expansion process can be naturally implemented; (2) We exploit more sophisticated inferential relationships extracted with Information Flow (IF). Information flow relationships are not simply pairwise term relationships as those used in previous studies, but are between a set of terms and another term. They allow for context-dependent query expansion. Our experiments conducted on TREC collections show that we can obtain large and significant improvements with our approach. This study shows that LM is an appropriate framework to implement effective query expansion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In information retrieval, a user's query is often not a complete representation of their real information need. The user's information need is a cognitive construction, however the use of cognitive models to perform query expansion have had little study. In this paper, we present a cognitively motivated query expansion technique that uses semantic features for use in ad hoc retrieval. This model is evaluated against a state-of-the-art query expansion technique. The results show our approach provides significant improvements in retrieval effectiveness for the TREC data sets tested.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The growing importance and need of data processing for information extraction is vital for Web databases. Due to the sheer size and volume of databases, retrieval of relevant information as needed by users has become a cumbersome process. Information seekers are faced by information overloading - too many result sets are returned for their queries. Moreover, too few or no results are returned if a specific query is asked. This paper proposes a ranking algorithm that gives higher preference to a user’s current search and also utilizes profile information in order to obtain the relevant results for a user’s query.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Usability is a multi-dimensional characteristic of a computer system. This paper focuses on usability as a measurement of interaction between the user and the system. The research employs a task-oriented approach to evaluate the usability of a meta search engine. This engine encourages and accepts queries of unlimited size expressed in natural language. A variety of conventional metrics developed by academic and industrial research, including ISO standards,, are applied to the information retrieval process consisting of sequential tasks. Tasks range from formulating (long) queries to interpreting and retaining search results. Results of the evaluation and analysis of the operation log indicate that obtaining advanced search engine results can be accomplished simultaneously with enhancing the usability of the interactive process. In conclusion, we discuss implications for interactive information retrieval system design and directions for future usability research. © 2008 Academy Publisher.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Major Web search engines, such as AltaVista, are essential tools in the quest to locate online information. This article reports research that used transaction log analysis to examine the characteristics and changes in AltaVista Web searching that occurred from 1998 to 2002. The research questions we examined are (1) What are the changes in AltaVista Web searching from 1998 to 2002? (2) What are the current characteristics of AltaVista searching, including the duration and frequency of search sessions? (3) What changes in the information needs of AltaVista users occurred between 1998 and 2002? The results of our research show (1) a move toward more interactivity with increases in session and query length, (2) with 70% of session durations at 5 minutes or less, the frequency of interaction is increasing, but it is happening very quickly, and (3) a broadening range of Web searchers' information needs, with the most frequent terms accounting for less than 1% of total term usage. We discuss the implications of these findings for the development of Web search engines. © 2005 Wiley Periodicals, Inc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nowadays, everyone can effortlessly access a range of information on the World Wide Web (WWW). As information resources on the web continue to grow tremendously, it becomes progressively more difficult to meet high expectations of users and find relevant information. Although existing search engine technologies can find valuable information, however, they suffer from the problems of information overload and information mismatch. This paper presents a hybrid Web Information Retrieval approach allowing personalised search using ontology, user profile and collaborative filtering. This approach finds the context of user query with least user’s involvement, using ontology. Simultaneously, this approach uses time-based automatic user profile updating with user’s changing behaviour. Subsequently, this approach uses recommendations from similar users using collaborative filtering technique. The proposed method is evaluated with the FIRE 2010 dataset and manually generated dataset. Empirical analysis reveals that Precision, Recall and F-Score of most of the queries for many users are improved with proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose – To investigate and identify the patterns of interaction between searchers and search engine during web searching. Design/methodology/approach – The authors examined 2,465,145 interactions from 534,507 users of Dogpile.com submitted on May 6, 2005, and compared query reformulation patterns. They investigated the type of query modifications and query modification transitions within sessions. Findings – The paper identifies three strong query reformulation transition patterns: between specialization and generalization; between video and audio, and between content change and system assistance. In addition, the findings show that web and images content were the most popular media collections. Originality/value – This research sheds light on the more complex aspects of web searching involving query modifications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In information retrieval (IR) research, more and more focus has been placed on optimizing a query language model by detecting and estimating the dependencies between the query and the observed terms occurring in the selected relevance feedback documents. In this paper, we propose a novel Aspect Language Modeling framework featuring term association acquisition, document segmentation, query decomposition, and an Aspect Model (AM) for parameter optimization. Through the proposed framework, we advance the theory and practice of applying high-order and context-sensitive term relationships to IR. We first decompose a query into subsets of query terms. Then we segment the relevance feedback documents into chunks using multiple sliding windows. Finally we discover the higher order term associations, that is, the terms in these chunks with high degree of association to the subsets of the query. In this process, we adopt an approach by combining the AM with the Association Rule (AR) mining. In our approach, the AM not only considers the subsets of a query as “hidden” states and estimates their prior distributions, but also evaluates the dependencies between the subsets of a query and the observed terms extracted from the chunks of feedback documents. The AR provides a reasonable initial estimation of the high-order term associations by discovering the associated rules from the document chunks. Experimental results on various TREC collections verify the effectiveness of our approach, which significantly outperforms a baseline language model and two state-of-the-art query language models namely the Relevance Model and the Information Flow model

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis provides a query model suitable for context sensitive access to a wide range of distributed linked datasets which are available to scientists using the Internet. The model is designed based on scientific research standards which require scientists to provide replicable methods in their publications. Although there are query models available that provide limited replicability, they do not contextualise the process whereby different scientists select dataset locations based on their trust and physical location. In different contexts, scientists need to perform different data cleaning actions, independent of the overall query, and the model was designed to accommodate this function. The query model was implemented as a prototype web application and its features were verified through its use as the engine behind a major scientific data access site, Bio2RDF.org. The prototype showed that it was possible to have context sensitive behaviour for each of the three mirrors of Bio2RDF.org using a single set of configuration settings. The prototype provided executable query provenance that could be attached to scientific publications to fulfil replicability requirements. The model was designed to make it simple to independently interpret and execute the query provenance documents using context specific profiles, without modifying the original provenance documents. Experiments using the prototype as the data access tool in workflow management systems confirmed that the design of the model made it possible to replicate results in different contexts with minimal additions, and no deletions, to query provenance documents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this study, we explore the population genetics of the Russian wheat aphid (RWA) (Diuraphis noxia), one of the world’s most invasive agricultural pests, in north-western China. We have analysed the data of 10 microsatellite loci and mitochondrial sequences from 27 populations sampled over 2 years in China. The results confirm that the RWAs are holocyclic in China with high genetic diversity indicating widespread sexual reproduction. Distinct differences in microsatellite genetic diversity and distribution revealed clear geographic isolation between RWA populations in northern and southern Xinjiang, China, with gene flow interrupted across extensive desert regions. Despite frequent grain transportation from north to south in this region, little evidence for RWA translocation as a result of human agricultural activities was found. Consequently, frequent gene flow among northern populations most likely resulted from natural dispersal, potentially facilitated by wind currents. We also found evidence for the longterm existence and expansion of RWAs in China, despite local opinion that it is an exotic species only present in China since 1975. Our estimated date of RWA expansion throughout China coincides with the debut of wheat domestication and cultivation practices in western Asia in the Holocene. We conclude that western China represents the limit of the far eastern native range of this species. This study is the most comprehensive molecular genetic investigation of the RWA in its native range undertaken to date and provides valuable insights into the history of the association of this aphid with domesticated cereals and wild grasses.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the recent past, there are some social issues when personal sensitive data in medical database were exposed. The personal sensitive data should be protected and access must be accounted for. Protecting the sensitive information is possible by encrypting such information. The challenge is querying the encrypted information when making the decision. Encrypted query is practically somewhat tedious task. So we present the more effective method using bucket index and bloom filter technology. We find that our proposed method shows low memory and fast efficiency comparatively. Simulation approaches on data encryption techniques to improve health care decision making processes are presented in this paper as a case scenario.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the medical and healthcare arena, patients‟ data is not just their own personal history but also a valuable large dataset for finding solutions for diseases. While electronic medical records are becoming popular and are used in healthcare work places like hospitals, as well as insurance companies, and by major stakeholders such as physicians and their patients, the accessibility of such information should be dealt with in a way that preserves privacy and security. Thus, finding the best way to keep the data secure has become an important issue in the area of database security. Sensitive medical data should be encrypted in databases. There are many encryption/ decryption techniques and algorithms with regard to preserving privacy and security. Currently their performance is an important factor while the medical data is being managed in databases. Another important factor is that the stakeholders should decide more cost-effective ways to reduce the total cost of ownership. As an alternative, DAS (Data as Service) is a popular outsourcing model to satisfy the cost-effectiveness but it takes a consideration that the encryption/ decryption modules needs to be handled by trustworthy stakeholders. This research project is focusing on the query response times in a DAS model (AES-DAS) and analyses the comparison between the outsourcing model and the in-house model which incorporates Microsoft built-in encryption scheme in a SQL Server. This research project includes building a prototype of medical database schemas. There are 2 types of simulations to carry out the project. The first stage includes 6 databases in order to carry out simulations to measure the performance between plain-text, Microsoft built-in encryption and AES-DAS (Data as Service). Particularly, the AES-DAS incorporates implementations of symmetric key encryption such as AES (Advanced Encryption Standard) and a Bucket indexing processor using Bloom filter. The results are categorised such as character type, numeric type, range queries, range queries using Bucket Index and aggregate queries. The second stage takes the scalability test from 5K to 2560K records. The main result of these simulations is that particularly as an outsourcing model, AES-DAS using the Bucket index shows around 3.32 times faster than a normal AES-DAS under the 70 partitions and 10K record-sized databases. Retrieving Numeric typed data takes shorter time than Character typed data in AES-DAS. The aggregation query response time in AES-DAS is not as consistent as that in MS built-in encryption scheme. The scalability test shows that the DBMS reaches in a certain threshold; the query response time becomes rapidly slower. However, there is more to investigate in order to bring about other outcomes and to construct a secured EMR (Electronic Medical Record) more efficiently from these simulations.