919 resultados para query reformulation, search pattern, search strategy
Resumo:
User-Web interactions have emerged as an important research in the field of information science. In this study, we examine extensively the Web searching performed by general users. Our goal is to investigate the effects of users’ cognitive styles on their Web search behavior in relation to two broad components: Information Searching and Information Processing Approaches. We use questionnaires, a measure of cognitive style, Web session logs and think-aloud as the data collection instruments. Our study findings show wholistic Web users tend to adopt a top-down approach to Web searching, where the users searched for a generic topic, and then reformulate their queries to search for specific information. They tend to prefer reading to process information. Analytic users tend to prefer a bottom-up approach to information searching and they process information by scanning search result pages.
Resumo:
The Web has become a worldwide repository of information which individuals, companies, and organizations utilize to solve or address various information problems. Many of these Web users utilize automated agents to gather this information for them. Some assume that this approach represents a more sophisticated method of searching. However, there is little research investigating how Web agents search for online information. In this research, we first provide a classification for information agent using stages of information gathering, gathering approaches, and agent architecture. We then examine an implementation of one of the resulting classifications in detail, investigating how agents search for information on Web search engines, including the session, query, term, duration and frequency of interactions. For this temporal study, we analyzed three data sets of queries and page views from agents interacting with the Excite and AltaVista search engines from 1997 to 2002, examining approximately 900,000 queries submitted by over 3,000 agents. Findings include: (1) agent sessions are extremely interactive, with sometimes hundreds of interactions per second (2) agent queries are comparable to human searchers, with little use of query operators, (3) Web agents are searching for a relatively limited variety of information, wherein only 18% of the terms used are unique, and (4) the duration of agent-Web search engine interaction typically spans several hours. We discuss the implications for Web information agents and search engines.
Resumo:
The quality of discovered features in relevance feedback (RF) is the key issue for effective search query. Most existing feedback methods do not carefully address the issue of selecting features for noise reduction. As a result, extracted noisy features can easily contribute to undesirable effectiveness. In this paper, we propose a novel feature extraction method for query formulation. This method first extract term association patterns in RF as knowledge for feature extraction. Negative RF is then used to improve the quality of the discovered knowledge. A novel information filtering (IF) model is developed to evaluate the proposed method. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed model achieved encouraging performance compared to state-of-the-art IF models.
Resumo:
Entity-oriented search has become an essential component of modern search engines. It focuses on retrieving a list of entities or information about the specific entities instead of documents. In this paper, we study the problem of finding entity related information, referred to as attribute-value pairs, that play a significant role in searching target entities. We propose a novel decomposition framework combining reduced relations and the discriminative model, Conditional Random Field (CRF), for automatically finding entity-related attribute-value pairs from free text documents. This decomposition framework allows us to locate potential text fragments and identify the hidden semantics, in the form of attribute-value pairs for user queries. Empirical analysis shows that the decomposition framework outperforms pattern-based approaches due to its capability of effective integration of syntactic and semantic features.
Resumo:
Entity-oriented retrieval aims to return a list of relevant entities rather than documents to provide exact answers for user queries. The nature of entity-oriented retrieval requires identifying the semantic intent of user queries, i.e., understanding the semantic role of query terms and determining the semantic categories which indicate the class of target entities. Existing methods are not able to exploit the semantic intent by capturing the semantic relationship between terms in a query and in a document that contains entity related information. To improve the understanding of the semantic intent of user queries, we propose concept-based retrieval method that not only automatically identifies the semantic intent of user queries, i.e., Intent Type and Intent Modifier but introduces concepts represented by Wikipedia articles to user queries. We evaluate our proposed method on entity profile documents annotated by concepts from Wikipedia category and list structure. Empirical analysis reveals that the proposed method outperforms several state-of-the-art approaches.
Resumo:
The aim of spoken term detection (STD) is to find all occurrences of a specified query term in a large audio database. This process is usually divided into two steps: indexing and search. In a previous study, it was shown that knowing the topic of an audio document would help to improve the accuracy of indexing step which results in a better performance for STD system. In this paper, we propose the use of topic information not only in the indexing step, but also in the search step. Results of our experiments show that topic information could also be used in search step to improve the STD accuracy.
Resumo:
For people with cognitive disabilities, technology is more often thought of as a support mechanism, rather than a source of division that may require intervention to equalize access across the cognitive spectrum. This paper presents a first attempt at formalizing the digital gap created by the generalization of search engines. This was achieved through the development of a mapping of cognitive abilities required by users to execute low- level tasks during a standard Web search task. The mapping demonstrates how critical these abilities are to successfully use search engines with an adequate level of independence. It will lead to a set of design guidelines for search engine interfaces that will allow for the engagement of users of all abilities, and also, more importantly, in search algorithms such as query suggestion and measure of relevance (i.e. ranking).
Resumo:
The implementation of three-phase sinusoidal pulse-width-modulated inverter control strategy using microprocessor is discussed in this paper. To save CPU time, the DMA technique is used for transferring the switching pattern from memory to the pulse amplifier and isolation circuits of individual thyristors in the inverter bridge. The method of controlling both voltage and frequency is discussed here.
Resumo:
Buffer zones are vegetated strip-edges of agricultural fields along watercourses. As linear habitats in agricultural ecosystems, buffer strips dominate and play a leading ecological role in many areas. This thesis focuses on the plant species diversity of the buffer zones in a Finnish agricultural landscape. The main objective of the present study is to identify the determinants of floral species diversity in arable buffer zones from local to regional levels. This study was conducted in a watershed area of a farmland landscape of southern Finland. The study area, Lepsämänjoki, is situated in the Nurmijärvi commune 30 km to the north of Helsinki, Finland. The biotope mosaics were mapped in GIS. A total of 59 buffer zones were surveyed, of which 29 buffer strips surveyed were also sampled by plot. Firstly, two diversity components (species richness and evenness) were investigated to determine whether the relationship between the two is equal and predictable. I found no correlation between species richness and evenness. The relationship between richness and evenness is unpredictable in a small-scale human-shaped ecosystem. Ordination and correlation analyses show that richness and evenness may result from different ecological processes, and thus should be considered separately. Species richness correlated negatively with phosphorus content, and species evenness correlated negatively with the ratio of organic carbon to total nitrogen in soil. The lack of a consistent pattern in the relationship between these two components may be due to site-specific variation in resource utilization by plant species. Within-habitat configuration (width, length, and area) were investigated to determine which is more effective for predicting species richness. More species per unit area increment could be obtained from widening the buffer strip than from lengthening it. The width of the strips is an effective determinant of plant species richness. The increase in species diversity with an increase in the width of buffer strips may be due to cross-sectional habitat gradients within the linear patches. This result can serve as a reference for policy makers, and has application value in agricultural management. In the framework of metacommunity theory, I found that both mass effect(connectivity) and species sorting (resource heterogeneity) were likely to explain species composition and diversity on a local and regional scale. The local and regional processes were interactively dominated by the degree to which dispersal perturbs local communities. In the lowly and intermediately connected regions, species sorting was of primary importance to explain species diversity, while the mass effect surpassed species sorting in the highly connected region. Increasing connectivity in communities containing high habitat heterogeneity can lead to the homogenization of local communities, and consequently, to lower regional diversity, while local species richness was unrelated to the habitat connectivity. Of all species found, Anthriscus sylvestris, Phalaris arundinacea, and Phleum pretense significantly responded to connectivity, and showed high abundance in the highly connected region. We suggest that these species may play a role in switching the force from local resources to regional connectivity shaping the community structure. On the landscape context level, the different responses of local species richness and evenness to landscape context were investigated. Seven landscape structural parameters served to indicate landscape context on five scales. On all scales but the smallest scales, the Shannon-Wiener diversity of land covers (H') correlated positively with the local richness. The factor (H') showed the highest correlation coefficients in species richness on the second largest scale. The edge density of arable field was the only predictor that correlated with species evenness on all scales, which showed the highest predictive power on the second smallest scale. The different predictive power of the factors on different scales showed a scaledependent relationship between the landscape context and local plant species diversity, and indicated that different ecological processes determine species richness and evenness. The local richness of species depends on a regional process on large scales, which may relate to the regional species pool, while species evenness depends on a fine- or coarse-grained farming system, which may relate to the patch quality of the habitats of field edges near the buffer strips. My results suggested some guidelines of species diversity conservation in the agricultural ecosystem. To maintain a high level of species diversity in the strips, a high level of phosphorus in strip soil should be avoided. Widening the strips is the most effective mean to improve species richness. Habitat connectivity is not always favorable to species diversity because increasing connectivity in communities containing high habitat heterogeneity can lead to the homogenization of local communities (beta diversity) and, consequently, to lower regional diversity. Overall, a synthesis of local and regional factors emerged as the model that best explain variations in plant species diversity. The studies also suggest that the effects of determinants on species diversity have a complex relationship with scale.
Resumo:
The implementation of three-phase sinusoidal pulse-width-modulated inverter control strategy using microprocessor is discussed in this paper. To save CPU time, the DMA technique is used for transferring the switching pattern from memory to the pulse amplifier and isolation circuits of individual thyristors in the inverter bridge. The method of controlling both voltage and frequency is discussed here.
Resumo:
Current smartphones have a storage capacity of several gigabytes. More and more information is stored on mobile devices. To meet the challenge of information organization, we turn to desktop search. Users often possess multiple devices, and synchronize (subsets of) information between them. This makes file synchronization more important. This thesis presents Dessy, a desktop search and synchronization framework for mobile devices. Dessy uses desktop search techniques, such as indexing, query and index term stemming, and search relevance ranking. Dessy finds files by their content, metadata, and context information. For example, PDF files may be found by their author, subject, title, or text. EXIF data of JPEG files may be used in finding them. User–defined tags can be added to files to organize and retrieve them later. Retrieved files are ranked according to their relevance to the search query. The Dessy prototype uses the BM25 ranking function, used widely in information retrieval. Dessy provides an interface for locating files for both users and applications. Dessy is closely integrated with the Syxaw file synchronizer, which provides efficient file and metadata synchronization, optimizing network usage. Dessy supports synchronization of search results, individual files, and directory trees. It allows finding and synchronizing files that reside on remote computers, or the Internet. Dessy is designed to solve the problem of efficient mobile desktop search and synchronization, also supporting remote and Internet search. Remote searches may be carried out offline using a downloaded index, or while connected to the remote machine on a weak network. To secure user data, transmissions between the Dessy client and server are encrypted using symmetric encryption. Symmetric encryption keys are exchanged with RSA key exchange. Dessy emphasizes extensibility. Also the cryptography can be extended. Users may tag their files with context tags and control custom file metadata. Adding new indexed file types, metadata fields, ranking methods, and index types is easy. Finding files is done with virtual directories, which are views into the user’s files, browseable by regular file managers. On mobile devices, the Dessy GUI provides easy access to the search and synchronization system. This thesis includes results of Dessy synchronization and search experiments, including power usage measurements. Finally, Dessy has been designed with mobility and device constraints in mind. It requires only MIDP 2.0 Mobile Java with FileConnection support, and Java 1.5 on desktop machines.
Resumo:
In this paper, we first describe a framework to model the sponsored search auction on the web as a mechanism design problem. Using this framework, we describe two well-known mechanisms for sponsored search auction-Generalized Second Price (GSP) and Vickrey-Clarke-Groves (VCG). We then derive a new mechanism for sponsored search auction which we call optimal (OPT) mechanism. The OPT mechanism maximizes the search engine's expected revenue, while achieving Bayesian incentive compatibility and individual rationality of the advertisers. We then undertake a detailed comparative study of the mechanisms GSP, VCG, and OPT. We compute and compare the expected revenue earned by the search engine under the three mechanisms when the advertisers are symmetric and some special conditions are satisfied. We also compare the three mechanisms in terms of incentive compatibility, individual rationality, and computational complexity. Note to Practitioners-The advertiser-supported web site is one of the successful business models in the emerging web landscape. When an Internet user enters a keyword (i.e., a search phrase) into a search engine, the user gets back a page with results, containing the links most relevant to the query and also sponsored links, (also called paid advertisement links). When a sponsored link is clicked, the user is directed to the corresponding advertiser's web page. The advertiser pays the search engine in some appropriate manner for sending the user to its web page. Against every search performed by any user on any keyword, the search engine faces the problem of matching a set of advertisers to the sponsored slots. In addition, the search engine also needs to decide on a price to be charged to each advertiser. Due to increasing demands for Internet advertising space, most search engines currently use auction mechanisms for this purpose. These are called sponsored search auctions. A significant percentage of the revenue of Internet giants such as Google, Yahoo!, MSN, etc., comes from sponsored search auctions. In this paper, we study two auction mechanisms, GSP and VCG, which are quite popular in the sponsored auction context, and pursue the objective of designing a mechanism that is superior to these two mechanisms. In particular, we propose a new mechanism which we call the OPT mechanism. This mechanism maximizes the search engine's expected revenue subject to achieving Bayesian incentive compatibility and individual rationality. Bayesian incentive compatibility guarantees that it is optimal for each advertiser to bid his/her true value provided that all other agents also bid their respective true values. Individual rationality ensures that the agents participate voluntarily in the auction since they are assured of gaining a non-negative payoff by doing so.
Resumo:
For many, particularly in the Anglophone world and Western Europe, it may be obvious that Google has a monopoly over online search and advertising and that this is an undesirable state of affairs, due to Google's ability to mediate information flows online. The baffling question may be why governments and regulators are doing little to nothing about this situation, given the increasingly pivotal importance of the internet and free flowing communications in our lives. However, the law concerning monopolies, namely antitrust or competition law, works in what may be seen as a less intuitive way by the general public. Monopolies themselves are not illegal. Conduct that is unlawful, i.e. abuses of that market power, is defined by a complex set of rules and revolves principally around economic harm suffered due to anticompetitive behavior. However the effect of information monopolies over search, such as Google’s, is more than just economic, yet competition law does not address this. Furthermore, Google’s collection and analysis of user data and its portfolio of related services make it difficult for others to compete. Such a situation may also explain why Google’s established search rivals, Bing and Yahoo, have not managed to provide services that are as effective or popular as Google’s own (on this issue see also the texts by Dirk Lewandowski and Astrid Mager in this reader). Users, however, are not entirely powerless. Google's business model rests, at least partially, on them – especially the data collected about them. If they stop using Google, then Google is nothing.
Resumo:
Because of limited sensor and communication ranges, designing efficient mechanisms for cooperative tasks is difficult. In this article, several negotiation schemes for multiple agents performing a cooperative task are presented. The negotiation schemes provide suboptimal solutions, but have attractive features of fast decision-making, and scalability to large number of agents without increasing the complexity of the algorithm. A software agent architecture of the decision-making process is also presented. The effect of the magnitude of information flow during the negotiation process is studied by using different models of the negotiation scheme. The performance of the various negotiation schemes, using different information structures, is studied based on the uncertainty reduction achieved for a specified number of search steps. The negotiation schemes perform comparable to that of optimal strategy in terms of uncertainty reduction and also require very low computational time, similar to 7 per cent to that of optimal strategy. Finally, analysis on computational and communication requirement for the negotiation schemes is carried out.
Resumo:
A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced. It stores the tree structure of an XML document using a bit array of opening and closing brackets plus a sequence of labels, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.