356 resultados para Similarity queries
Resumo:
This paper gives an overview of the INEX 2009 Ad Hoc Track. The main goals of the Ad Hoc Track were three-fold. The first goal was to investigate the impact of the collection scale and markup, by using a new collection that is again based on a the Wikipedia but is over 4 times larger, with longer articles and additional semantic annotations. For this reason the Ad Hoc track tasks stayed unchanged, and the Thorough Task of INEX 2002–2006 returns. The second goal was to study the impact of more verbose queries on retrieval effectiveness, by using the available markup as structural constraints—now using both the Wikipedia’s layout-based markup, as well as the enriched semantic markup—and by the use of phrases. The third goal was to compare different result granularities by allowing systems to retrieve XML elements, ranges of XML elements, or arbitrary passages of text. This investigates the value of the internal document structure (as provided by the XML mark-up) for retrieving relevant information. The INEX 2009 Ad Hoc Track featured four tasks: For the Thorough Task a ranked-list of results (elements or passages) by estimated relevance was needed. For the Focused Task a ranked-list of non-overlapping results (elements or passages) was needed. For the Relevant in Context Task non-overlapping results (elements or passages) were returned grouped by the article from which they came. For the Best in Context Task a single starting point (element start tag or passage start) for each article was needed. We discuss the setup of the track, and the results for the four tasks.
Resumo:
Experimental results for a reactive non-buoyant plume of nitric oxide (NO) in a turbulent grid flow doped with ozone (O3) are presented. The Damkohler number (Nd) for the experiment is of order unity indicating the turbulence and chemistry have similar timescales and both affect the chemical reaction rate. Continuous measurements of two components of velocity using hot-wire anemometry and the two reactants using chemiluminescent analysers have been made. A spatial resolution for the reactants of four Kolmogorov scales has been possible because of the novel design of the experiment. Measurements at this resolution for a reactive plume are not found in the literature. The experiment has been conducted relatively close to the grid in the region where self-similarity of the plume has not yet developed. Statistics of a conserved scalar, deduced from both reactive and non-reactive scalars by conserved scalar theory, are used to establish the mixing field of the plume, which is found to be consistent with theoretical considerations and with those found by other investigators in non-reative flows. Where appropriate the reactive species means and higher moments, probability density functions, joint statistics and spectra are compared with their respective frozen, equilibrium and reaction-dominated limits deduced from conserved scalar theory. The theoretical limits bracket reactive scalar statistics where this should be so according to conserved scalar theory. Both reactants approach their equilibrium limits with greater distance downstream. In the region of measurement, the plume reactant behaves as the reactant not in excess and the ambient reactant behaves as the reactant in excess. The reactant covariance lies outside its frozen and equilibrium limits for this value of Vd. The reaction rate closure of Toor (1969) is compared with the measured reaction rate. The gradient model is used to obtain turbulent diffusivities from turbulent fluxes. Diffusivity of a non-reactive scalar is found to be close to that measured in non-reactive flows by others.
Resumo:
This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.
Resumo:
Most recommendation methods employ item-item similarity measures or use ratings data to generate recommendations. These methods use traditional two dimensional models to find inter relationships between alike users and products. This paper proposes a novel recommendation method using the multi-dimensional model, tensor, to group similar users based on common search behaviour, and then finding associations within such groups for making effective inter group recommendations. Web log data is multi-dimensional data. Unlike vector based methods, tensors have the ability to highly correlate and find latent relationships between such similar instances, consisting of users and searches. Non redundant rules from such associations of user-searches are then used for making recommendations to the users.
Resumo:
The growing importance and need of data processing for information extraction is vital for Web databases. Due to the sheer size and volume of databases, retrieval of relevant information as needed by users has become a cumbersome process. Information seekers are faced by information overloading - too many result sets are returned for their queries. Moreover, too few or no results are returned if a specific query is asked. This paper proposes a ranking algorithm that gives higher preference to a user’s current search and also utilizes profile information in order to obtain the relevant results for a user’s query.
Resumo:
We propose to use the Tensor Space Modeling (TSM) to represent and analyze the user’s web log data that consists of multiple interests and spans across multiple dimensions. Further we propose to use the decomposition factors of the Tensors for clustering the users based on similarity of search behaviour. Preliminary results show that the proposed method outperforms the traditional Vector Space Model (VSM) based clustering.
Resumo:
As business process management technology matures, organisations acquire more and more business process models. The resulting collections can consist of hundreds, even thousands of models and their management poses real challenges. One of these challenges concerns model retrieval where support should be provided for the formulation and efficient execution of business process model queries. As queries based on only structural information cannot deal with all querying requirements in practice, there should be support for queries that require knowledge of process model semantics. In this paper we formally define a process model query language that is based on semantic relationships between tasks. This query language is independent of the particular process modelling notation used, but we will demonstrate how it can be used in the context of Petri nets by showing how the semantic relationships can be determined for these nets in such a way that state space explosion is avoided as much as possible. An experiment with three large process model repositories shows that queries expressed in our language can be evaluated efficiently.
Resumo:
The Web has become a worldwide repository of information which individuals, companies, and organizations utilize to solve or address various information problems. Many of these Web users utilize automated agents to gather this information for them. Some assume that this approach represents a more sophisticated method of searching. However, there is little research investigating how Web agents search for online information. In this research, we first provide a classification for information agent using stages of information gathering, gathering approaches, and agent architecture. We then examine an implementation of one of the resulting classifications in detail, investigating how agents search for information on Web search engines, including the session, query, term, duration and frequency of interactions. For this temporal study, we analyzed three data sets of queries and page views from agents interacting with the Excite and AltaVista search engines from 1997 to 2002, examining approximately 900,000 queries submitted by over 3,000 agents. Findings include: (1) agent sessions are extremely interactive, with sometimes hundreds of interactions per second (2) agent queries are comparable to human searchers, with little use of query operators, (3) Web agents are searching for a relatively limited variety of information, wherein only 18% of the terms used are unique, and (4) the duration of agent-Web search engine interaction typically spans several hours. We discuss the implications for Web information agents and search engines.
Resumo:
The goals of this research were to answer three questions. How predominant is religious searching online? How do people interact with Web search engines when searching for religious information? How effective are these interactions in locating relevant information? Specifically, referring to a US demographic, we analyzed five data sets from Web search engine, collected between 1997 and 2005, of over a million queries each in order to investigate religious searching on the Web. Results point to four key findings. First, there is no evidence of a decrease in religious Web-searching behaviors. Religious interest is a persistent topic of Web searching. Second, those seeking religious information on the Web are becoming slightly more interactive in their searching. Third, there is no evidence for a move away from mainstream religions toward non-mainstream religions since the majority of the search terms are associated with established religions. Fourth, our work does not support the hypothesis that traditional religious affiliation is associated with lower adoption of or sophistication with technology. These factors point to the Web as a potentially usefully communication medium for a variety of religious organizations. © 2009 Elsevier Ltd. All rights reserved.
Resumo:
Usability is a multi-dimensional characteristic of a computer system. This paper focuses on usability as a measurement of interaction between the user and the system. The research employs a task-oriented approach to evaluate the usability of a meta search engine. This engine encourages and accepts queries of unlimited size expressed in natural language. A variety of conventional metrics developed by academic and industrial research, including ISO standards,, are applied to the information retrieval process consisting of sequential tasks. Tasks range from formulating (long) queries to interpreting and retaining search results. Results of the evaluation and analysis of the operation log indicate that obtaining advanced search engine results can be accomplished simultaneously with enhancing the usability of the interactive process. In conclusion, we discuss implications for interactive information retrieval system design and directions for future usability research. © 2008 Academy Publisher.
Resumo:
Purpose - The web is now a significant component of the recruitment and job search process. However, very little is known about how companies and job seekers use the web, and the ultimate effectiveness of this process. The specific research questions guiding this study are: how do people search for job-related information on the web? How effective are these searches? And how likely are job seekers to find an appropriate job posting or application? Design/methodology/approach - The data used to examine these questions come from job seekers submitting job-related queries to a major web search engine at three points in time over a five-year period. Findings - Results indicate that individuals seeking job information generally submit only one query with several terms and over 45 percent of job-seeking queries contain a specific location reference. Of the documents retrieved, findings suggest that only 52 percent are relevant and only 40 percent of job-specific searches retrieve job postings. Research limitations/implications - This study provides an important contribution to web research and online recruiting literature. The data come from actual web searches, providing a realistic glimpse into how job seekers are actually using the web. Practical implications - The results of this research can assist organizations in seeking to use the web as part of their recruiting efforts, in designing corporate recruiting web sites, and in developing web systems to support job seeking and recruiting. Originality/value - This research is one of the first studies to investigate job searching on the web using longitudinal real world data. © Emerald Group Publishing Limited.
Resumo:
Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is not well understood. In this study, we investigate the usage of Dogpile.com, a major Web metasearch engine, with the aim of discovering how Web searchers interact with metasearch engines. We report results examining 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005 and compare these results with findings from other Web searching studies. We collect data on geographical location of searchers, use of system feedback, content selection, sessions, queries, and term usage. Findings show that Dogpile.com searchers are mainly from the USA (84% of searchers), use about 3 terms per query (mean = 2.85), implement system feedback moderately (8.4% of users), and generally (56% of users) spend less than one minute interacting with the Web search engine. Overall, metasearchers seem to have higher degrees of interaction than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of metasearching may be what define the differences from other forms of Web searching. We discuss the implications of our findings in relation to metasearch for Web searchers, search engines, and content providers.
Resumo:
Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces
Resumo:
Detecting query reformulations within a session by a Web searcher is an important area of research for designing more helpful searching systems and targeting content to particular users. Methods explored by other researchers include both qualitative (i.e., the use of human judges to manually analyze query patterns on usually small samples) and nondeterministic algorithms, typically using large amounts of training data to predict query modification during sessions. In this article, we explore three alternative methods for detection of session boundaries. All three methods are computationally straightforward and therefore easily implemented for detection of session changes. We examine 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005. We compare session analysis using (a) Internet Protocol address and cookie; (b) Internet Protocol address, cookie, and a temporal limit on intrasession interactions; and (c) Internet Protocol address, cookie, and query reformulation patterns. Overall, our analysis shows that defining sessions by query reformulation along with Internet Protocol address and cookie provides the best measure, resulting in an 82% increase in the count of sessions. Regardless of the method used, the mean session length was fewer than three queries, and the mean session duration was less than 30 min. Searchers most often modified their query by changing query terms (nearly 23% of all query modifications) rather than adding or deleting terms. Implications are that for measuring searching traffic, unique sessions may be a better indicator than the common metric of unique visitors. This research also sheds light on the more complex aspects of Web searching involving query modifications and may lead to advances in searching tools.
Resumo:
Existing recommendation systems often recommend products to users by capturing the item-to-item and user-to-user similarity measures. These types of recommendation systems become inefficient in people-to-people networks for people to people recommendation that require two way relationship. Also, existing recommendation methods use traditional two dimensional models to find inter relationships between alike users and items. It is not efficient enough to model the people-to-people network with two-dimensional models as the latent correlations between the people and their attributes are not utilized. In this paper, we propose a novel tensor decomposition-based recommendation method for recommending people-to-people based on users profiles and their interactions. The people-to-people network data is multi-dimensional data which when modeled using vector based methods tend to result in information loss as they capture either the interactions or the attributes of the users but not both the information. This paper utilizes tensor models that have the ability to correlate and find latent relationships between similar users based on both information, user interactions and user attributes, in order to generate recommendations. Empirical analysis is conducted on a real-life online dating dataset. As demonstrated in results, the use of tensor modeling and decomposition has enabled the identification of latent correlations between people based on their attributes and interactions in the network and quality recommendations have been derived using the 'alike' users concept.