Major Web search engines, such as AltaVista, are essential tools in the quest to locate online information. This article reports research that used transaction log analysis to examine the characteristics and changes in AltaVista Web searching that occurred from 1998 to 2002. The research questions we examined are (1) What are the changes in AltaVista Web searching from 1998 to 2002? (2) What are the current characteristics of AltaVista searching, including the duration and frequency of search sessions? (3) What changes in the information needs of AltaVista users occurred between 1998 and 2002? The results of our research show (1) a move toward more interactivity with increases in session and query length, (2) with 70% of session durations at 5 minutes or less, the frequency of interaction is increasing, but it is happening very quickly, and (3) a broadening range of Web searchers' information needs, with the most frequent terms accounting for less than 1% of total term usage. We discuss the implications of these findings for the development of Web search engines. © 2005 Wiley Periodicals, Inc.


Entity-oriented search has become an essential component of modern search engines. It focuses on retrieving a list of entities or information about the specific entities instead of documents. In this paper, we study the problem of finding entity related information, referred to as attribute-value pairs, that play a significant role in searching target entities. We propose a novel decomposition framework combining reduced relations and the discriminative model, Conditional Random Field (CRF), for automatically finding entity-related attribute-value pairs from free text documents. This decomposition framework allows us to locate potential text fragments and identify the hidden semantics, in the form of attribute-value pairs for user queries. Empirical analysis shows that the decomposition framework outperforms pattern-based approaches due to its capability of effective integration of syntactic and semantic features.


The rapid growth of visual information on Web has led to immense interest in multimedia information retrieval (MIR). While advancement in MIR systems has achieved some success in specific domains, particularly the content-based approaches, general Web users still struggle to find the images they want. Despite the success in content-based object recognition or concept extraction, the major problem in current Web image searching remains in the querying process. Since most online users only express their needs in semantic terms or objects, systems that utilize visual features (e.g., color or texture) to search images create a semantic gap which hinders general users from fully expressing their needs. In addition, query-by-example (QBE) retrieval imposes extra obstacles for exploratory search because users may not always have the representative image at hand or in mind when starting a search (i.e. the page zero problem). As a result, the majority of current online image search engines (e.g., Google, Yahoo, and Flickr) still primarily use textual queries to search. The problem with query-based retrieval systems is that they only capture users’ information need in terms of formal queries;; the implicit and abstract parts of users’ information needs are inevitably overlooked. Hence, users often struggle to formulate queries that best represent their needs, and some compromises have to be made. Studies of Web search logs suggest that multimedia searches are more difficult than textual Web searches, and Web image searching is the most difficult compared to video or audio searches. Hence, online users need to put in more effort when searching multimedia contents, especially for image searches. Most interactions in Web image searching occur during query reformulation. While log analysis provides intriguing views on how the majority of users search, their search needs or motivations are ultimately neglected. User studies on image searching have attempted to understand users’ search contexts in terms of users’ background (e.g., knowledge, profession, motivation for search and task types) and the search outcomes (e.g., use of retrieved images, search performance). However, these studies typically focused on particular domains with a selective group of professional users. General users’ Web image searching contexts and behaviors are little understood although they represent the majority of online image searching activities nowadays. We argue that only by understanding Web image users’ contexts can the current Web search engines further improve their usefulness and provide more efficient searches. In order to understand users’ search contexts, a user study was conducted based on university students’ Web image searching in News, Travel, and commercial Product domains. The three search domains were deliberately chosen to reflect image users’ interests in people, time, event, location, and objects. We investigated participants’ Web image searching behavior, with the focus on query reformulation and search strategies. Participants’ search contexts such as their search background, motivation for search, and search outcomes were gathered by questionnaires. The searching activity was recorded with participants’ think aloud data for analyzing significant search patterns. The relationships between participants’ search contexts and corresponding search strategies were discovered by Grounded Theory approach. Our key findings include the following aspects: - Effects of users' interactive intents on query reformulation patterns and search strategies - Effects of task domain on task specificity and task difficulty, as well as on some specific searching behaviors - Effects of searching experience on result expansion strategies A contextual image searching model was constructed based on these findings. The model helped us understand Web image searching from user perspective, and introduced a context-aware searching paradigm for current retrieval systems. A query recommendation tool was also developed to demonstrate how users’ query reformulation contexts can potentially contribute to more efficient searching.


With the rapid growth of information on the Web, the study of information searching has let to an increased interest. Information behaviour (IB) researchers and information systems (IS) developers are continuously exploring user - Web search interactions to understand and to help users to provide assistance with their information searching. In attempting to develop models of IB, several studies have identified various factors that govern user's information searching and information retrieval (IR), such as age, gender, prior knowledge and task complexity. However, how users' contextual factors, such as cognitive styles, affect Web search interactions has not been clearly explained by the current models of Web Searching and IR. This study explores the influence of users' cognitive styles on their Web search behaviour. The main goal of the study is to enhance Web search models with a better understanding of how these cognitive styles affect Web searching. Modelling Web search behaviour with a greater understanding of user's cognitive styles can help information science researchers and IS designers to bridge the semantic gap between the user and the IS. To achieve the aims of the study, a user study with 50 participants was conducted. The study adopted a mixed method approach incorporating several data collection strategies to gather a range of qualitative and quantitative data. The study utilised pre-search and post-search questionnaires to collect the participants' demographic information and their level of satisfaction about the search interactions. Riding's (1991) Cognitive Style Analysis (CSA) test was used to assess the participants' cognitive styles. Participants completed three predesigned search tasks and the whole user - web search interactions, including thinkaloud, were captured using a monitoring program. Data analysis involved several qualitative and quantitative techniques: the quantitative data gave raise to detailed findings about users' Web searching and cognitive styles, the qualitative data enriched the findings with illustrative examples. The study results provide valuable insights into Web searching behaviour among different cognitive style users. The findings of the study extend our understanding of Web search behaviour and how users search information on the Web. Three key study findings emerged: • Users' Web search behaviour was demonstrated through information searching strategies, Web navigation styles, query reformulation behaviour and information processing approaches while performing Web searches. The manner in which these Web search patterns were demonstrated varied among the users with different cognitive style groups. • Users' cognitive styles influenced their information searching strategies, query reformulation behaviour, Web navigational styles and information processing approaches. Users with particular cognitive styles followed certain Web search patterns. • Fundamental relationships were evident between users' cognitive styles and their Web search behaviours; and these relationships can be illustrated through modelling Web search behaviour. Two models that depict the associations between Web search interactions, user characteristics and users' cognitive styles were developed. These models provide a greater understanding of Web search behaviour from the user perspective, particularly how users' cognitive styles influence their Web search behaviour. The significance of this research is twofold: it will provide insights for information science researchers, information system designers, academics, educators, trainers and librarians who want to better understand how users with different cognitive styles perform information searching on the Web; at the same time, it will provide assistance and support to the users. The major outcomes of this study are 1) a comprehensive analysis of how users search the Web; 2) extensive discussion on the implications of the models developed in this study for future work; and 3) a theoretical framework to bridge high-level search models and cognitive models.


Previous studies have shown that users’ cognitive styles play an important role during Web searching. However, only limited studies have showed the relationship between cognitive styles and Web search behavior. Most importantly, it is not clear which components of Web search behavior are influenced by cognitive styles. This paper examines the relationships between users’ cognitive styles and their Web searching and develops a model that portrays the relationship. The study uses qualitative and quantitative analyses to inform the study results based on data gathered from 50 participants. A questionnaire was utilised to collect participants’ demographic information, and Riding’s (1991) Cognitive Style Analysis (CSA) test to assess their cognitive styles. Results show that users’ cognitive styles influenced their information searching strategies, query reformulation behaviour, Web navigational styles and information processing approaches. The user model developed in this study depicts the fundamental relationships between users’ Web search behavior and their cognitive styles. Modeling Web search behavior with a greater understanding of user’s cognitive styles can help information science researchers and information systems designers to bridge the semantic gap between the user and the systems. Implications of the research for theory and practice, and future work are discussed.


Images represent a valuable source of information for the construction industry. Due to technological advancements in digital imaging, the increasing use of digital cameras is leading to an ever-increasing volume of images being stored in construction image databases and thus makes it hard for engineers to retrieve useful information from them. Content-Based Search Engines are tools that utilize the rich image content and apply pattern recognition methods in order to retrieve similar images. In this paper, we illustrate several project management tasks and show how Content-Based Search Engines can facilitate automatic retrieval, and indexing of construction images in image databases.


ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.


We consider the problem of linking web search queries to entities from a knowledge base such as Wikipedia. Such linking enables converting a user’s web search session to a footprint in the knowledge base that could be used to enrich the user profile. Traditional methods for entity linking have been directed towards finding entity mentions in text documents such as news reports, each of which are possibly linked to multiple entities enabling the usage of measures like entity set coherence. Since web search queries are very small text fragments, such criteria that rely on existence of a multitude of mentions do not work too well on them. We propose a three-phase method for linking web search queries to wikipedia entities. The first phase does IR-style scoring of entities against the search query to narrow down to a subset of entities that are expanded using hyperlink information in the second phase to a larger set. Lastly, we use a graph traversal approach to identify the top entities to link the query to. Through an empirical evaluation on real-world web search queries, we illustrate that our methods significantly enhance the linking accuracy over state-of-the-art methods.


Tenint en compte l’evolució a Internet dels portals d’informació dels mitjans de comunicació, sorgeix la idea d’un motor de cerca orientat a la recaptació de notícies dispersades per les diferents pàgines web dels grans mitjans de comunicació espanyols, que permetés obtenir informació sobre “descriptors contractats” pels usuaris d’un portal. El primer objectiu és l’anàlisi de les necessitats que es volen cobrir per a un hipotètic client de l’aplicació, el segon és en l’àmbit algorítmic, cal obtenir una metodologia de treball que permeti l’obtenció de la notícia. En l’àmbit de la programació es consideren tres etapes: descarregar les pàgines web necessàries, que es farà mitjançant les eines que proporciona la llibreria cUrl; l’anàlisi de les notícies (obtenir tots els enllaços que corresponen a notícies, filtrar els descriptors per decidir si cal guardar la notícia, analitzar l’estructura interna de les notícies seleccionades per guardar-ne només les parts establertes), i la base de dades que ens ha de permetre organitzar i gestionar les notícies escollides


El projecte iSAC (Servei Intel·ligent d’Atenció Ciutadana via web) es va iniciar el mes de gener de 2006 amb l’ajut del nou coneixement científic en agents intel·ligents, junt amb l’aplicació de les Tecnologies de la Informació i la Comunicació (TIC) i els cercadors. Actualment, el servei actual d’atenció al ciutadà està composat per dues àrees: l’atenció directa a les oficines i l’atenció telefònica a través del Call Center. Les limitacions de personal i horari d’atenció fan que aquest servei perdi eficàcia. Es vol desenvolupar un producte amb una tecnologia capaç d’ampliar i millorar la capacitat i la qualitat de l’atenció ciutadana en les administracions públiques, sigui quina sigui la seva dimensió. Tot i això, aquest projecte l’explotaran especialment els ajuntaments, als quals la ciutadania s'acosta amb tot tipus de preguntes i dubtes, habitualment no restringides a l'àmbit local. Més concretament, es vol automatitzar a través d’un portal web l’atenció al ciutadà per tal d’obtenir un servei més efectiu


BACKGROUND: The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors. METHODS: Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r. RESULTS: For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93. CONCLUSIONS: The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.