896 resultados para Web, Search Engine, Overlap
Resumo:
Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.
Resumo:
This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages. Readability has been highlighted as an important aspect of web search result personalisation in previous work. The most widely used text readability measures rely on surface level characteristics of text, such as the length of words and sentences. We demonstrate that different tools for extracting text from web pages lead to very different estimations of readability. This has an important implication for search engines because search result personalisation strategies that consider users reading ability may fail if incorrect text readability estimations are computed.
Resumo:
For many, particularly in the Anglophone world and Western Europe, it may be obvious that Google has a monopoly over online search and advertising and that this is an undesirable state of affairs, due to Google's ability to mediate information flows online. The baffling question may be why governments and regulators are doing little to nothing about this situation, given the increasingly pivotal importance of the internet and free flowing communications in our lives. However, the law concerning monopolies, namely antitrust or competition law, works in what may be seen as a less intuitive way by the general public. Monopolies themselves are not illegal. Conduct that is unlawful, i.e. abuses of that market power, is defined by a complex set of rules and revolves principally around economic harm suffered due to anticompetitive behavior. However the effect of information monopolies over search, such as Google’s, is more than just economic, yet competition law does not address this. Furthermore, Google’s collection and analysis of user data and its portfolio of related services make it difficult for others to compete. Such a situation may also explain why Google’s established search rivals, Bing and Yahoo, have not managed to provide services that are as effective or popular as Google’s own (on this issue see also the texts by Dirk Lewandowski and Astrid Mager in this reader). Users, however, are not entirely powerless. Google's business model rests, at least partially, on them – especially the data collected about them. If they stop using Google, then Google is nothing.
Resumo:
The legality of the operation of Google’s search engine, and its liability as an Internet intermediary, has been tested in various jurisdictions on various grounds. In Australia, there was an ultimately unsuccessful case against Google under the Australian Consumer Law relating to how it presents results from its search engine. Despite this failed claim, several complex issues were not adequately addressed in the case including whether Google sufficiently distinguishes between the different parts of its search results page, so as not to mislead or deceive consumers. This article seeks to address this question of consumer confusion by drawing on empirical survey evidence of Australian consumers’ understanding of Google’s search results layout. This evidence, the first of its kind in Australia, indicates some level of consumer confusion. The implications for future legal proceedings in against Google in Australia and in other jurisdictions are discussed.
Resumo:
Fragment Finder 2.0 is a web-based interactive computing server which can be used to retrieve structurally similar protein fragments from 25 and 90% nonredundant data sets. The computing server identifies structurally similar fragments using the protein backbone C alpha angles. In addition, the identified fragments can be superimposed using either of the two structural superposition programs, STAMP and PROFIT, provided in the server. The freely available Java plug-in Jmol has been interfaced with the server for the visualization of the query and superposed fragments. The server is the updated version of a previously developed search engine and employs an in-house-developed fast pattern matching algorithm. This server can be accessed freely over the World Wide Web through the URL http://cluster.physics.iisc.ernet.in/ff/.
Resumo:
In pay-per-click sponsored search auctions which are currently extensively used by search engines, the auction for a keyword involves a certain number of advertisers (say k) competing for available slots (say m) to display their advertisements (ads for short). A sponsored search auction for a keyword is typically conducted for a number of rounds (say T). There are click probabilities mu(ij) associated with each agent slot pair (agent i and slot j). The search engine would like to maximize the social welfare of the advertisers, that is, the sum of values of the advertisers for the keyword. However, the search engine does not know the true values advertisers have for a click to their respective advertisements and also does not know the click probabilities. A key problem for the search engine therefore is to learn these click probabilities during the initial rounds of the auction and also to ensure that the auction mechanism is truthful. Mechanisms for addressing such learning and incentives issues have recently been introduced. These mechanisms, due to their connection to the multi-armed bandit problem, are aptly referred to as multi-armed bandit (MAB) mechanisms. When m = 1, exact characterizations for truthful MAB mechanisms are available in the literature. Recent work has focused on the more realistic but non-trivial general case when m > 1 and a few promising results have started appearing. In this article, we consider this general case when m > 1 and prove several interesting results. Our contributions include: (1) When, mu(ij)s are unconstrained, we prove that any truthful mechanism must satisfy strong pointwise monotonicity and show that the regret will be Theta T7) for such mechanisms. (2) When the clicks on the ads follow a certain click precedence property, we show that weak pointwise monotonicity is necessary for MAB mechanisms to be truthful. (3) If the search engine has a certain coarse pre-estimate of mu(ij) values and wishes to update them during the course of the T rounds, we show that weak pointwise monotonicity and type-I separatedness are necessary while weak pointwise monotonicity and type-II separatedness are sufficient conditions for the MAB mechanisms to be truthful. (4) If the click probabilities are separable into agent-specific and slot-specific terms, we provide a characterization of MAB mechanisms that are truthful in expectation.
Resumo:
Query suggestion is an important feature of the search engine with the explosive and diverse growth of web contents. Different kind of suggestions like query, image, movies, music and book etc. are used every day. Various types of data sources are used for the suggestions. If we model the data into various kinds of graphs then we can build a general method for any suggestions. In this paper, we have proposed a general method for query suggestion by combining two graphs: (1) query click graph which captures the relationship between queries frequently clicked on common URLs and (2) query text similarity graph which finds the similarity between two queries using Jaccard similarity. The proposed method provides literally as well as semantically relevant queries for users' need. Simulation results show that the proposed algorithm outperforms heat diffusion method by providing more number of relevant queries. It can be used for recommendation tasks like query, image, and product suggestion.
Resumo:
This paper investigates how people return to information in a dynamic information environment. For example, a person might want to return to Web content via a link encountered earlier on a Web page, only to learn that the link has since been removed. Changes can benefit users by providing new information, but they hinder returning to previously viewed information. The observational study presented here analyzed instances, collected via a Web search, where people expressed difficulty re-finding information because of changes to the information or its environment. A number of interesting observations arose from this analysis, including that the path originally taken to get to the information target appeared important in its re-retrieval, whereas, surprisingly, the temporal aspects of when the information was seen before were not. While people expressed frustration when problems arose, an explanation of why the change had occurred was often sufficient to allay that frustration, even in the absence of a solution. The implications of these observations for systems that support re-finding in dynamic environments are discussed.
Resumo:
Urquhart, C., Lonsdale, R.,Thomas, R., Spink, S., Yeoman, A., Armstrong, C. & Fenton, R. (2003). Uptake and use of electronic information services: trends in UK higher education from the JUSTEIS project. Program, 37(3), 167-180. Sponsorship: JISC
Resumo:
ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.
Resumo:
Se analizan y describen las principales líneas de trabajo de la Web Semántica en el ámbito de los archivos de televisión. Para ello, se analiza y contextualiza la web semántica desde una perspectiva general para posteriormente analizar las principales iniciativas que trabajan con lo audiovisual: Proyecto MuNCH, Proyecto S5T, Semantic Television y VideoActive.
Resumo:
E-poltergeist takes over the user’s internet browser, automatically initiating Web searches without their permission. Web-based artwork which explores issues of user control when confronted with complex technological systems, questioning the limits of digital interactive arts as consensual reciprocal systems. e-poltergeist was a major web commission that marked an early stage of research in a larger enquiry by Craighead and Thomson into the relationship between live virtual data, global communications networks and instruction-based art, exploring how such systems can be re-contextualised within gallery environments. e-poltergeist presented the 'viewer' with a singular narrative by using live internet search-engine data that aimed to create a perpetual and virtually unstoppable cycle of search engine results, banner ads and moving windows as an interruption into the normal use of an internet browser. The work also addressed the ‘de-personalisation’ of internet use by sending a series of messages from the live search engine data that seemed to address the user directly: 'Is anyone there?'; 'Can anyone hear me?', 'Please help me!'; 'Nobody cares!' e-poltergeist makes a significant contribution to the taxonomy of new media art by dealing with the way that new media art can re-address notions of existing traditions in art such as appropriation and manipulation, instruction-based art and conceptual art. e-poltergeist was commissioned ($12,000) for 010101: Art in Technological Times, a landmark international exhibition presented by the San Francisco Museum of Modern Art, which bought together leading international practitioners working with emergent technologies, including Tatsuo Miyajima, Janet Cardiff, Brian Eno. Peer recognition of the project in the form of reviews include: Curating New Media. Gateshead: Baltic Centre for Contemporary Art. Cook, Sarah, Beryl Graham and Sarah Martin ISBN: 1093655064; The Wire; http://www.wired.com/culture/lifestyle/news/2000/12/40464 (review by Reena Jana); Leonardo (review Barbara Lee Williams and Sonya Rapoport) http://www.leonardo.info/reviews/feb2001/ex_010101_willrapop.html All the work is developed jointly and equally between Craighead and her collaborator, Jon Thomson, Slade School of Fine Art.
Resumo:
Grande parte do tráfego online tem origem em páginas de resultados de motores de de pesquisa. Estes constituem hoje uma ferramenta fundamental de que os turistas se socorrem para pesquisar e filtrar a informação necessária ao planeamento das suas viagens, sendo, por isso, bastante tidos em conta pelas entidades ligadas ao turismo no momento da definição das suas estratégias de marketing. No presente documento é descrita a investigação feita em torno do modo de funcionamento do motor de pesquisa Google e das métricas que utiliza para avaliação de websites e páginas web. Desta investigação resultou a implementação de um website de conteúdos afetos ao mercado de turismo e viagens em Portugal, focado no mercado do turismo externo – All About Portugal. A implementação do website pretende provar, sustentando-se em orientações da área do SEO, que a propagação de conteúdos baseada unicamente nos motores de pesquisa é viável, confirmando, deste modo, a sua importância. Os dados de utilização desse mesmo website introduzem novos elementos que poderão servir de base a novos estudos.
Resumo:
Introduction: Coordination through CVHL/BVCS gives Canadian health libraries access to information technology they could not offer individually, thereby enhancing the library services offered to Canadian health professionals. An example is the portal being developed. Portal best practices are of increasing interest (usability.gov; Wikipedia portals; JISC subject portal project; Stanford clinical portals) but conclusive research is not yet available. This paper will identify best practices for a portal bringing together knowledge for Canadian health professionals supported through a network of libraries. Description: The portal for Canadian health professionals will include capabilities such as: • Authentication • Question referral • Specialist “branch libraries” • Integration of commercial resources, web resources and health systems data • Cross-resource search engine • Infrastructure to enable links from EHR and decision support systems • Knowledge translation tools, such as highlighting of best evidence Best practices will be determined by studying the capabilities of existing portals, including consortia/networks and individual institutions, and through a literature review. Outcomes: Best practices in portals will be reviewed. The collaboratively developed Virtual Library, currently the heart of cvhl.ca, is a unique database collecting high quality, free web documents and sites relevant to Canadian health care. The evident strengths of the Virtual Library will be discussed in light of best practices. Discussion: Identification of best practices will support cost-benefit analysis of options and provide direction for CVHL/BVCS. Open discussion with stakeholders (libraries and professionals) informed by this review will lead to adoption of the best technical solutions supporting Canadian health libraries and their users.
Resumo:
Le web et les images qui y foisonnent font désormais partie de notre quotidien et ils façonnent notre manière de penser le monde. Certaines œuvres d’art permettent, semble-t-il, de réfléchir à la fois sur l’image, les technologies web, les relations qu’elles entretiennent et les enjeux sociopolitiques qui les sous-tendent. C’est dans cette perspective que ce mémoire s’intéresse aux travaux de la série des Googlegrams (2004-2006) de Joan Fontcuberta, particulièrement à deux œuvres qui reprennent les photographies de torture de la prison d’Abu Ghraib devenues iconiques. Ce sont des photomosaïques utilisant ces images comme matrices dans lesquelles viennent s’insérer des milliers de petites images qui ont été trouvées dans le web grâce au moteur de recherche d’images de Google, selon certains mots-clés choisis par l’artiste de façon à faire écho à ces photographies-matrices. Ces œuvres sont ici considérées en tant qu’outils d’études actifs nous permettant de déployer les assemblages d’images et de technologies qu’elles font interagir. Il s’agit de suivre les acteurs et les réseaux qui se superposent et s’entremêlent dans les Googlegrams : d’abord les photographies d’Abu Ghraib et leur iconisation ; ensuite le moteur de recherche et sa relation aux images ; finalement les effets de la photomosaïque. Cette étude s’effectue donc à partir des interactions entre ces différents éléments qui constituent les œuvres afin de réfléchir sur leurs rôles dans le façonnement de la représentation de l’information.