940 resultados para 080505 Web Technologies (excl. Web Search)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid growth of visual information on Web has led to immense interest in multimedia information retrieval (MIR). While advancement in MIR systems has achieved some success in specific domains, particularly the content-based approaches, general Web users still struggle to find the images they want. Despite the success in content-based object recognition or concept extraction, the major problem in current Web image searching remains in the querying process. Since most online users only express their needs in semantic terms or objects, systems that utilize visual features (e.g., color or texture) to search images create a semantic gap which hinders general users from fully expressing their needs. In addition, query-by-example (QBE) retrieval imposes extra obstacles for exploratory search because users may not always have the representative image at hand or in mind when starting a search (i.e. the page zero problem). As a result, the majority of current online image search engines (e.g., Google, Yahoo, and Flickr) still primarily use textual queries to search. The problem with query-based retrieval systems is that they only capture users’ information need in terms of formal queries;; the implicit and abstract parts of users’ information needs are inevitably overlooked. Hence, users often struggle to formulate queries that best represent their needs, and some compromises have to be made. Studies of Web search logs suggest that multimedia searches are more difficult than textual Web searches, and Web image searching is the most difficult compared to video or audio searches. Hence, online users need to put in more effort when searching multimedia contents, especially for image searches. Most interactions in Web image searching occur during query reformulation. While log analysis provides intriguing views on how the majority of users search, their search needs or motivations are ultimately neglected. User studies on image searching have attempted to understand users’ search contexts in terms of users’ background (e.g., knowledge, profession, motivation for search and task types) and the search outcomes (e.g., use of retrieved images, search performance). However, these studies typically focused on particular domains with a selective group of professional users. General users’ Web image searching contexts and behaviors are little understood although they represent the majority of online image searching activities nowadays. We argue that only by understanding Web image users’ contexts can the current Web search engines further improve their usefulness and provide more efficient searches. In order to understand users’ search contexts, a user study was conducted based on university students’ Web image searching in News, Travel, and commercial Product domains. The three search domains were deliberately chosen to reflect image users’ interests in people, time, event, location, and objects. We investigated participants’ Web image searching behavior, with the focus on query reformulation and search strategies. Participants’ search contexts such as their search background, motivation for search, and search outcomes were gathered by questionnaires. The searching activity was recorded with participants’ think aloud data for analyzing significant search patterns. The relationships between participants’ search contexts and corresponding search strategies were discovered by Grounded Theory approach. Our key findings include the following aspects: - Effects of users' interactive intents on query reformulation patterns and search strategies - Effects of task domain on task specificity and task difficulty, as well as on some specific searching behaviors - Effects of searching experience on result expansion strategies A contextual image searching model was constructed based on these findings. The model helped us understand Web image searching from user perspective, and introduced a context-aware searching paradigm for current retrieval systems. A query recommendation tool was also developed to demonstrate how users’ query reformulation contexts can potentially contribute to more efficient searching.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a pilot application based on web search engine calledWeb-based Relation Completion (WebRC), we propose to join two columns of entities linked by a predefined relation by mining knowledge from the web through a web search engine. To achieve this, a novel retrieval task Relation Query Expansion (RelQE) is modelled: given an entity (query), the task is to retrieve documents containing entities in predefined relation to the given one. Solving this problem entails expanding the query before submitting it to a web search engine to ensure that mostly documents containing the linked entity are returned in the top K search results. In this paper, we propose a novel Learning-based Relevance Feedback (LRF) approach to solve this retrieval task. Expansion terms are learned from training pairs of entities linked by the predefined relation and applied to new entity-queries to find entities linked by the same relation. After describing the approach, we present experimental results on real-world web data collections, which show that the LRF approach always improves the precision of top-ranked search results to up to 8.6 times the baseline. Using LRF, WebRC also shows performances way above the baseline.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information available on company websites can help people navigate to the offices of groups and individuals within the company. Automatically retrieving this within-organisation spatial information is a challenging AI problem This paper introduces a novel unsupervised pattern-based method to extract within-organisation spatial information by taking advantage of HTML structure patterns, together with a novel Conditional Random Fields (CRF) based method to identify different categories of within-organisation spatial information. The results show that the proposed method can achieve a high performance in terms of F-Score, indicating that this purely syntactic method based on web search and an analysis of HTML structure is well-suited for retrieving within-organisation spatial information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages. Readability has been highlighted as an important aspect of web search result personalisation in previous work. The most widely used text readability measures rely on surface level characteristics of text, such as the length of words and sentences. We demonstrate that different tools for extracting text from web pages lead to very different estimations of readability. This has an important implication for search engines because search result personalisation strategies that consider users reading ability may fail if incorrect text readability estimations are computed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates how people return to information in a dynamic information environment. For example, a person might want to return to Web content via a link encountered earlier on a Web page, only to learn that the link has since been removed. Changes can benefit users by providing new information, but they hinder returning to previously viewed information. The observational study presented here analyzed instances, collected via a Web search, where people expressed difficulty re-finding information because of changes to the information or its environment. A number of interesting observations arose from this analysis, including that the path originally taken to get to the information target appeared important in its re-retrieval, whereas, surprisingly, the temporal aspects of when the information was seen before were not. While people expressed frustration when problems arose, an explanation of why the change had occurred was often sufficient to allay that frustration, even in the absence of a solution. The implications of these observations for systems that support re-finding in dynamic environments are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Search engines - such as Google - have been characterized as "Databases of intentions". This class will focus on different aspects of intentionality on the web, including goal mining, goal modeling and goal-oriented search. Readings: M. Strohmaier, M. Lux, M. Granitzer, P. Scheir, S. Liaskos, E. Yu, How Do Users Express Goals on the Web? - An Exploration of Intentional Structures in Web Search, We Know'07 International Workshop on Collaborative Knowledge Management for Web Information Systems in conjunction with WISE'07, Nancy, France, 2007. [Web link] Readings: Automatic identification of user goals in web search, U. Lee and Z. Liu and J. Cho WWW '05: Proceedings of the 14th International World Wide Web Conference 391--400 (2005) [Web link]

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tenint en compte l’evolució a Internet dels portals d’informació dels mitjans de comunicació, sorgeix la idea d’un motor de cerca orientat a la recaptació de notícies dispersades per les diferents pàgines web dels grans mitjans de comunicació espanyols, que permetés obtenir informació sobre “descriptors contractats” pels usuaris d’un portal. El primer objectiu és l’anàlisi de les necessitats que es volen cobrir per a un hipotètic client de l’aplicació, el segon és en l’àmbit algorítmic, cal obtenir una metodologia de treball que permeti l’obtenció de la notícia. En l’àmbit de la programació es consideren tres etapes: descarregar les pàgines web necessàries, que es farà mitjançant les eines que proporciona la llibreria cUrl; l’anàlisi de les notícies (obtenir tots els enllaços que corresponen a notícies, filtrar els descriptors per decidir si cal guardar la notícia, analitzar l’estructura interna de les notícies seleccionades per guardar-ne només les parts establertes), i la base de dades que ens ha de permetre organitzar i gestionar les notícies escollides

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El projecte iSAC (Servei Intel·ligent d’Atenció Ciutadana via web) es va iniciar el mes de gener de 2006 amb l’ajut del nou coneixement científic en agents intel·ligents, junt amb l’aplicació de les Tecnologies de la Informació i la Comunicació (TIC) i els cercadors. Actualment, el servei actual d’atenció al ciutadà està composat per dues àrees: l’atenció directa a les oficines i l’atenció telefònica a través del Call Center. Les limitacions de personal i horari d’atenció fan que aquest servei perdi eficàcia. Es vol desenvolupar un producte amb una tecnologia capaç d’ampliar i millorar la capacitat i la qualitat de l’atenció ciutadana en les administracions públiques, sigui quina sigui la seva dimensió. Tot i això, aquest projecte l’explotaran especialment els ajuntaments, als quals la ciutadania s'acosta amb tot tipus de preguntes i dubtes, habitualment no restringides a l'àmbit local. Més concretament, es vol automatitzar a través d’un portal web l’atenció al ciutadà per tal d’obtenir un servei més efectiu

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents two hyperlink analysis-based algorithms to find relevant pages for a given Web page (URL). The first algorithm comes from the extended cocitation analysis of the Web pages. It is intuitive and easy to implement. The second one takes advantage of linear algebra theories to reveal deeper relationships among the Web pages and to identify relevant pages more precisely and effectively. The experimental results show the feasibility and effectiveness of the algorithms. These algorithms could be used for various Web applications, such as enhancing Web search. The ideas and techniques in this work would be helpful to other Web-related researches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets.