936 resultados para Search Engines
Resumo:
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
Resumo:
This paper reports findings from a study investigating the effect of integrating sponsored and nonsponsored search engine links into a single web listing. The premise underlying this research is that web searchers are chiefly interested in relevant results. Given the reported negative bias that web searchers have concerning sponsored links, separate listings may be a disservice to web searchers as it might not direct them to relevant websites. Some web meta-search engines integrate sponsored and nonsponsored links into a single listing. Using a web search engine log of over 7 million interactions from hundreds of thousands of users from a major web meta-search engine, we analysed the click-through patterns for both sponsored and nonsponsored links. We also classified web queries as informational, navigational and transactional based on the expected type of content and analysed the click-through patterns of each classification. The findings show that for more than 35% of queries, there are no clicks on any result. More than 80% of web queries are informational in nature and approximately 10% are transactional, and 10% navigational. Sponsored links account for approximately 15% of all clicks. Integrating sponsored and nonsponsored links does not appear to increase the clicks on sponsored listings. We discuss how these research results could enhance future sponsored search platforms.
Resumo:
This paper investigates self–Googling through the monitoring of search engine activities of users and adds to the few quantitative studies on this topic already in existence. We explore this phenomenon by answering the following questions: To what extent is the self–Googling visible in the usage of search engines; is any significant difference measurable between queries related to self–Googling and generic search queries; to what extent do self–Googling search requests match the selected personalised Web pages? To address these questions we explore the theory of narcissism in order to help define self–Googling and present the results from a 14–month online experiment using Google search engine usage data.
Resumo:
Current multimedia Web search engines still use keywords as the primary means to search. Due to the richness in multimedia contents, general users constantly experience some difficulties in formulating textual queries that are representative enough for their needs. As a result, query reformulation becomes part of an inevitable process in most multimedia searches. Previous Web query formulation studies did not investigate the modification sequences and thus can only report limited findings on the reformulation behavior. In this study, we propose an automatic approach to examine multimedia query reformulation using large-scale transaction logs. The key findings show that search term replacement is the most dominant type of modifications in visual searches but less important in audio searches. Image search users prefer the specified search strategy more than video and audio users. There is also a clear tendency to replace terms with synonyms or associated terms in visual queries. The analysis of the search strategies in different types of multimedia searching provides some insights into user’s searching behavior, which can contribute to the design of future query formulation assistance for keyword-based Web multimedia retrieval systems.
Resumo:
Searching for multimedia is an important activity for users of Web search engines. Studying user's interactions with Web search engine multimedia buttons, including image, audio, and video, is important for the development of multimedia Web search systems. This article provides results from a Weblog analysis study of multimedia Web searching by Dogpile users in 2006. The study analyzes the (a) duration, size, and structure of Web search queries and sessions; (b) user demographics; (c) most popular multimedia Web searching terms; and (d) use of advanced Web search techniques including Boolean and natural language. The current study findings are compared with results from previous multimedia Web searching studies. The key findings are: (a) Since 1997, image search consistently is the dominant media type searched followed by audio and video; (b) multimedia search duration is still short (>50% of searching episodes are <1 min), using few search terms; (c) many multimedia searches are for information about people, especially in audio search; and (d) multimedia search has begun to shift from entertainment to other categories such as medical, sports, and technology (based on the most repeated terms). Implications for design of Web multimedia search engines are discussed.
Resumo:
Tagging has become one of the key activities in next generation websites which allow users selecting short labels to annotate, manage, and share multimedia information such as photos, videos and bookmarks. Tagging does not require users any prior training before participating in the annotation activities as they can freely choose any terms which best represent the semantic of contents without worrying about any formal structure or ontology. However, the practice of free-form tagging can lead to several problems, such as synonymy, polysemy and ambiguity, which potentially increase the complexity of managing the tags and retrieving information. To solve these problems, this research aims to construct a lightweight indexing scheme to structure tags by identifying and disambiguating the meaning of terms and construct a knowledge base or dictionary. News has been chosen as the primary domain of application to demonstrate the benefits of using structured tags for managing the rapidly changing and dynamic nature of news information. One of the main outcomes of this work is an automatically constructed vocabulary that defines the meaning of each named entity tag, which can be extracted from a news article (including person, location and organisation), based on experts suggestions from major search engines and the knowledge from public database such as Wikipedia. To demonstrate the potential applications of the vocabulary, we have used it to provide more functionalities in an online news website, including topic-based news reading, intuitive tagging, clipping and sharing of interesting news, as well as news filtering or searching based on named entity tags. The evaluation results on the impact of disambiguating tags have shown that the vocabulary can help to significantly improve news searching performance. The preliminary results from our user study have demonstrated that users can benefit from the additional functionalities on the news websites as they are able to retrieve more relevant news, clip and share news with friends and families effectively.
Resumo:
In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.
Resumo:
Purpose – This paper aims to report findings from an exploratory study investigating the web interactions and technoliteracy of children in the early childhood years. Previous research has studied aspects of older children’s technoliteracy and web searching; however, few studies have analyzed web search data from children younger than six years of age. Design/methodology/approach – The study explored the Google web searching and technoliteracy of young children who are enrolled in a “preparatory classroom” or kindergarten (the year before young children begin compulsory schooling in Queensland, Australia). Young children were video- and audio-taped while conducting Google web searches in the classroom. The data were qualitatively analysed to understand the young children’s web search behaviour. Findings – The findings show that young children engage in complex web searches, including keyword searching and browsing, query formulation and reformulation, relevance judgments, successive searches, information multitasking and collaborative behaviours. The study results provide significant initial insights into young children’s web searching and technoliteracy. Practical implications – The use of web search engines by young children is an important research area with implications for educators and web technologies developers. Originality/value – This is the first study of young children’s interaction with a web search engine.
Resumo:
Experimental / pilot online journalistic publication. EUAustralia Online (www.euaustralia.com) is a pilot niche publication identifying and demonstrating dynamics of online journalism. The editor, an experienced and senior journalist and academic, specialist in European studies, commenced publication on 28.8.06 during one year’s “industry immersion” -- with media accreditation to the European Commission, Brussels. Reporting now is from Australia and from Europe on field trip exercises. Student editors participate making it partly a training operation. EUAustralia demonstrates adaptation of conventional, universal, “Western” liberal journalistic practices. Its first premise is to fill a knowledge gap in Australia about the European Union -- institutions, functions and directions. The second premise is to test the communications capacity of the online format, where the publication sets a strong standard of journalistic credibility – hence its transparency with sourcing or signposting of “commentary” or ”opinion”. EUAustralia uses modified, enhanced weblog software allowing for future allocation of closed pages to subscribers. An early exemplar of its kind, with modest upload rate (2010-13 average, 16 postings monthly), esteemed, it commands over 180000 site visits p.a. (half as unique visitors; AWB Statistics); strongly rated by search engines, see page one Googlr placements for “EU Australia”. Comment by the ISP (SeventhVision, Broadbeach, Queensland): “The site has good search engine recognition because seen as credible; can be used to generate revenue”. This journalistic exercise has been analysed in theoretical context twice, in published refereed conference proceedings (Communication and Media Policy Forum, Sydney; 2007, 2009).
Resumo:
In the terminology of Logic programming, current search engines answer Sigma1 queries (formulas of the form where is a boolean combination of attributes). Such a query is determined by a particular sequence of keywords input by a user. In order to give more control to users, search engines will have to tackle more expressive queries, namely, Sigma2 queries (formulas of the form ). The purpose of the talk is to examine which directions could be explored in order to move towards more expressive languages, more powerful search engines, and the benefits that users should expect.
Resumo:
The Australian National Data Service (ANDS) was established in 2008 and aims to: influence national policy in the area of data management in the Australian research community; inform best practice for the curation of data, and, transform the disparate collections of research data around Australia into a cohesive collection of research resources One high profile ANDS activity is to establish the population of Research Data Australia, a set of web pages describing data collections produced by or relevant to Australian researchers. It is designed to promote visibility of research data collections in search engines, in order to encourage their re-use. As part of activities associated with the Australian National Data Service, an increasing number of Australian Universities are choosing to implement VIVO, not as a platform to profile information about researchers, but as a 'metadata store' platform to profile information about institutional research data sets, both locally and as part of a national data commons. To date, the University of Melbourne, Griffith University, the Queensland University of Technology, and the University of Western Australia have all chosen to implement VIVO, with interest from other Universities growing.
Resumo:
Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and positions the file signatures model in the class of Vector Space retrieval models.
Resumo:
Background: Specialised disease management programmes for chronic heart failure (CHF) improve survival, quality of life and reduce healthcare utilisation. The overall efficacy of structured telephone support or telemonitoring as an individual component of a CHF disease management strategy remains inconclusive. Objectives: To review randomised controlled trials (RCTs) of structured telephone support or telemonitoring compared to standard practice for patients with CHF in order to quantify the effects of these interventions over and above usual care for these patients. Search strategy: Databases (the Cochrane Central Register of Controlled Trials (CENTRAL), Database of Abstracts of Reviews of Effects (DARE) and Health Technology Assessment Database (HTA) on The Cochrane Library, MEDLINE, EMBASE, CINAHL, AMED and Science Citation Index Expanded and Conference Citation Index on ISI Web of Knowledge) and various search engines were searched from 2006 to November 2008 to update a previously published non-Cochrane review. Bibliographies of relevant studies and systematic reviews and abstract conference proceedings were handsearched. No language limits were applied. Selection criteria: Only peer reviewed, published RCTs comparing structured telephone support or telemonitoring to usual care of CHF patients were included. Unpublished abstract data was included in sensitivity analyses. The intervention or usual care could not include a home visit or more than the usual (four to six weeks) clinic follow-up. Data collection and analysis: Data were presented as risk ratio (RR) with 95% confidence intervals (CI). Primary outcomes included all-cause mortality, all-cause and CHF-related hospitalisations which were meta-analysed using fixed effects models. Other outcomes included length of stay, quality of life, acceptability and cost and these were described and tabulated. Main results: Twenty-five studies and five published abstracts were included. Of the 25 full peer-reviewed studies meta-analysed, 16 evaluated structured telephone support (5613 participants), 11 evaluated telemonitoring (2710 participants), and two tested both interventions (included in counts). Telemonitoring reduced all-cause mortality (RR 0.66, 95% CI 0.54 to 0.81, P < 0.0001) with structured telephone support demonstrating a non-significant positive effect (RR 0.88, 95% CI 0.76 to 1.01, P = 0.08). Both structured telephone support (RR 0.77, 95% CI 0.68 to 0.87, P < 0.0001) and telemonitoring (RR 0.79, 95% CI 0.67 to 0.94, P = 0.008) reduced CHF-related hospitalisations. For both interventions, several studies improved quality of life, reduced healthcare costs and were acceptable to patients. Improvements in prescribing, patient knowledge and self-care, and New York Heart Association (NYHA) functional class were observed. Authors' conclusions: Structured telephone support and telemonitoring are effective in reducing the risk of all-cause mortality and CHF-related hospitalisations in patients with CHF; they improve quality of life, reduce costs, and evidence-based prescribing.
Resumo:
Purpose: Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people may be looking for specific web sites or may wish to conduct transactions with web services. This paper aims to focus on automatically classifying the different user intents behind web queries. Design/methodology/approach: For the research reported in this paper, 130,000 web search engine queries are categorized as informational, navigational, or transactional using a k-means clustering approach based on a variety of query traits. Findings: The research findings show that more than 75 percent of web queries (clustered into eight classifications) are informational in nature, with about 12 percent each for navigational and transactional. Results also show that web queries fall into eight clusters, six primarily informational, and one each of primarily transactional and navigational. Research limitations/implications: This study provides an important contribution to web search literature because it provides information about the goals of searchers and a method for automatically classifying the intents of the user queries. Automatic classification of user intent can lead to improved web search engines by tailoring results to specific user needs. Practical implications: The paper discusses how web search engines can use automatically classified user queries to provide more targeted and relevant results in web searching by implementing a real time classification method as presented in this research. Originality/value: This research investigates a new application of a method for automatically classifying the intent of user queries. There has been limited research to date on automatically classifying the user intent of web queries, even though the pay-off for web search engines can be quite beneficial. © Emerald Group Publishing Limited.