244 resultados para Search engine
em Queensland University of Technology - ePrints Archive
Resumo:
Search engines have forever changed the way people access and discover knowledge, allowing information about almost any subject to be quickly and easily retrieved within seconds. As increasingly more material becomes available electronically the influence of search engines on our lives will continue to grow. This presents the problem of how to find what information is contained in each search engine, what bias a search engine may have, and how to select the best search engine for a particular information need. This research introduces a new method, search engine content analysis, in order to solve the above problem. Search engine content analysis is a new development of traditional information retrieval field called collection selection, which deals with general information repositories. Current research in collection selection relies on full access to the collection or estimations of the size of the collections. Also collection descriptions are often represented as term occurrence statistics. An automatic ontology learning method is developed for the search engine content analysis, which trains an ontology with world knowledge of hundreds of different subjects in a multilevel taxonomy. This ontology is then mined to find important classification rules, and these rules are used to perform an extensive analysis of the content of the largest general purpose Internet search engines in use today. Instead of representing collections as a set of terms, which commonly occurs in collection selection, they are represented as a set of subjects, leading to a more robust representation of information and a decrease of synonymy. The ontology based method was compared with ReDDE (Relevant Document Distribution Estimation method for resource selection) using the standard R-value metric, with encouraging results. ReDDE is the current state of the art collection selection method which relies on collection size estimation. The method was also used to analyse the content of the most popular search engines in use today, including Google and Yahoo. In addition several specialist search engines such as Pubmed and the U.S. Department of Agriculture were analysed. In conclusion, this research shows that the ontology based method mitigates the need for collection size estimation.
Resumo:
In this paper, we use time series analysis to evaluate predictive scenarios using search engine transactional logs. Our goal is to develop models for the analysis of searchers’ behaviors over time and investigate if time series analysis is a valid method for predicting relationships between searcher actions. Time series analysis is a method often used to understand the underlying characteristics of temporal data in order to make forecasts. In this study, we used a Web search engine transactional log and time series analysis to investigate users’ actions. We conducted our analysis in two phases. In the initial phase, we employed a basic analysis and found that 10% of searchers clicked on sponsored links. However, from 22:00 to 24:00, searchers almost exclusively clicked on the organic links, with almost no clicks on sponsored links. In the second and more extensive phase, we used a one-step prediction time series analysis method along with a transfer function method. The period rarely affects navigational and transactional queries, while rates for transactional queries vary during different periods. Our results show that the average length of a searcher session is approximately 2.9 interactions and that this average is consistent across time periods. Most importantly, our findings shows that searchers who submit the shortest queries (i.e., in number of terms) click on highest ranked results. We discuss implications, including predictive value, and future research.
Resumo:
In the present paper, we introduce BioPatML.NET, an application library for the Microsoft Windows .NET framework [2] that implements the BioPatML pattern definition language and sequence search engine. BioPatML.NET is integrated with the Microsoft Biology Foundation (MBF) application library [3], unifying the parsers and annotation services supported or emerging through MBF with the language, search framework and pattern repository of BioPatML. End users who wish to exploit the BioPatML.NET engine and repository without engaging the services of a programmer may do so via the freely accessible web-based BioPatML Editor, which we describe below.
Resumo:
Usability is a multi-dimensional characteristic of a computer system. This paper focuses on usability as a measurement of interaction between the user and the system. The research employs a task-oriented approach to evaluate the usability of a meta search engine. This engine encourages and accepts queries of unlimited size expressed in natural language. A variety of conventional metrics developed by academic and industrial research, including ISO standards,, are applied to the information retrieval process consisting of sequential tasks. Tasks range from formulating (long) queries to interpreting and retaining search results. Results of the evaluation and analysis of the operation log indicate that obtaining advanced search engine results can be accomplished simultaneously with enhancing the usability of the interactive process. In conclusion, we discuss implications for interactive information retrieval system design and directions for future usability research. © 2008 Academy Publisher.
Resumo:
This paper presents the prototype of an information retrieval system for medical records that utilises visualisation techniques, namely word clouds and timelines. The system simplifies and assists information seeking tasks within the medical domain. Access to patient medical information can be time consuming as it requires practitioners to review a large number of electronic medical records to find relevant information. Presenting a summary of the content of a medical document by means of a word cloud may permit information seekers to decide upon the relevance of a document to their information need in a simple and time effective manner. We extend this intuition, by mapping word clouds of electronic medical records onto a timeline, to provide temporal information to the user. This allows exploring word clouds in the context of a patient’s medical history. To enhance the presentation of word clouds, we also provide the means for calculating aggregations and differences between patient’s word clouds.
Resumo:
The Web has become a worldwide repository of information which individuals, companies, and organizations utilize to solve or address various information problems. Many of these Web users utilize automated agents to gather this information for them. Some assume that this approach represents a more sophisticated method of searching. However, there is little research investigating how Web agents search for online information. In this research, we first provide a classification for information agent using stages of information gathering, gathering approaches, and agent architecture. We then examine an implementation of one of the resulting classifications in detail, investigating how agents search for information on Web search engines, including the session, query, term, duration and frequency of interactions. For this temporal study, we analyzed three data sets of queries and page views from agents interacting with the Excite and AltaVista search engines from 1997 to 2002, examining approximately 900,000 queries submitted by over 3,000 agents. Findings include: (1) agent sessions are extremely interactive, with sometimes hundreds of interactions per second (2) agent queries are comparable to human searchers, with little use of query operators, (3) Web agents are searching for a relatively limited variety of information, wherein only 18% of the terms used are unique, and (4) the duration of agent-Web search engine interaction typically spans several hours. We discuss the implications for Web information agents and search engines.
Resumo:
This paper reports results from a study exploring the multimedia search functionality of Chinese language search engines. Web searching in Chinese (Mandarin) is a growing research area and a technical challenge for popular commercial Web search engines. Few studies have been conducted on Chinese language search engines. We investigate two research questions: which Chinese language search engines provide multimedia searching, and what multimedia search functionalities are available in Chinese language Web search engines. Specifically, we examine each Web search engine's (1) features permitting Chinese language multimedia searches, (2) extent of search personalization and user control of multimedia search variables, and (3) the relationships between Web search engines and their features in the Chinese context. Key findings show that Chinese language Web search engines offer limited multimedia search functionality, and general search engines provide a wider range of features than specialized multimedia search engines. Study results have implications for Chinese Web users, Website designers and Web search engine developers. © 2009 Elsevier Ltd. All rights reserved.
Resumo:
Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is not well understood. In this study, we investigate the usage of Dogpile.com, a major Web metasearch engine, with the aim of discovering how Web searchers interact with metasearch engines. We report results examining 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005 and compare these results with findings from other Web searching studies. We collect data on geographical location of searchers, use of system feedback, content selection, sessions, queries, and term usage. Findings show that Dogpile.com searchers are mainly from the USA (84% of searchers), use about 3 terms per query (mean = 2.85), implement system feedback moderately (8.4% of users), and generally (56% of users) spend less than one minute interacting with the Web search engine. Overall, metasearchers seem to have higher degrees of interaction than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of metasearching may be what define the differences from other forms of Web searching. We discuss the implications of our findings in relation to metasearch for Web searchers, search engines, and content providers.
Resumo:
Nowadays, everyone can effortlessly access a range of information on the World Wide Web (WWW). As information resources on the web continue to grow tremendously, it becomes progressively more difficult to meet high expectations of users and find relevant information. Although existing search engine technologies can find valuable information, however, they suffer from the problems of information overload and information mismatch. This paper presents a hybrid Web Information Retrieval approach allowing personalised search using ontology, user profile and collaborative filtering. This approach finds the context of user query with least user’s involvement, using ontology. Simultaneously, this approach uses time-based automatic user profile updating with user’s changing behaviour. Subsequently, this approach uses recommendations from similar users using collaborative filtering technique. The proposed method is evaluated with the FIRE 2010 dataset and manually generated dataset. Empirical analysis reveals that Precision, Recall and F-Score of most of the queries for many users are improved with proposed method.
Resumo:
Expert searchers engage with information as information brokers, researchers, reference librarians, information architects, faculty who teach advanced search, and in a variety of other information-intensive professions. Their experiences are characterized by a profound understanding of information concepts and skills and they have an agile ability to apply this knowledge to interacting with and having an impact on the information environment. This study explored the learning experiences of searchers to understand the acquisition of search expertise. The research question was: What can be learned about becoming an expert searcher from the learning experiences of proficient novice searchers and highly experienced searchers? The key objectives were: (1) to explore the existence of threshold concepts in search expertise; (2) to improve our understanding of how search expertise is acquired and how novice searchers, intent on becoming experts, can learn to search in more expertlike ways. The participant sample drew from two population groups: (1) highly experienced searchers with a minimum of 20 years of relevant professional experience, including LIS faculty who teach advanced search, information brokers, and search engine developers (11 subjects); and (2) MLIS students who had completed coursework in information retrieval and online searching and demonstrated exceptional ability (9 subjects). Using these two groups allowed a nuanced understanding of the experience of learning to search in expertlike ways, with data from those who search at a very high level as well as those who may be actively developing expertise. The study used semi-structured interviews, search tasks with think-aloud narratives, and talk-after protocols. Searches were screen-captured with simultaneous audio-recording of the think-aloud narrative. Data were coded and analyzed using NVivo9 and manually. Grounded theory allowed categories and themes to emerge from the data. Categories represented conceptual knowledge and attributes of expert searchers. In accord with grounded theory method, once theoretical saturation was achieved, during the final stage of analysis the data were viewed through lenses of existing theoretical frameworks. For this study, threshold concept theory (Meyer & Land, 2003) was used to explore which concepts might be threshold concepts. Threshold concepts have been used to explore transformative learning portals in subjects ranging from economics to mathematics. A threshold concept has five defining characteristics: transformative (causing a shift in perception), irreversible (unlikely to be forgotten), integrative (unifying separate concepts), troublesome (initially counter-intuitive), and may be bounded. Themes that emerged provided evidence of four concepts which had the characteristics of threshold concepts. These were: information environment: the total information environment is perceived and understood; information structures: content, index structures, and retrieval algorithms are understood; information vocabularies: fluency in search behaviors related to language, including natural language, controlled vocabulary, and finesse using proximity, truncation, and other language-based tools. The fourth threshold concept was concept fusion, the integration of the other three threshold concepts and further defined by three properties: visioning (anticipating next moves), being light on one's 'search feet' (dancing property), and profound ontological shift (identity as searcher). In addition to the threshold concepts, findings were reported that were not concept-based, including praxes and traits of expert searchers. A model of search expertise is proposed with the four threshold concepts at its core that also integrates the traits and praxes elicited from the study, attributes which are likewise long recognized in LIS research as present in professional searchers. The research provides a deeper understanding of the transformative learning experiences involved in the acquisition of search expertise. It adds to our understanding of search expertise in the context of today's information environment and has implications for teaching advanced search, for research more broadly within library and information science, and for methodologies used to explore threshold concepts.
Resumo:
This thesis developed new search engine models that elicit the meaning behind the words found in documents and queries, rather than simply matching keywords. These new models were applied to searching medical records: an area where search is particularly challenging yet can have significant benefits to our society.
Resumo:
In today’s world of information-driven society, many studies are exploring usefulness and ease of use of the technology. The research into personalizing next-generation user interface is also ever increasing. A better understanding of factors that influence users’ perception of web search engine performance would contribute in achieving this. This study measures and examines how users’ perceived level of prior knowledge and experience influence their perceived level of satisfaction of using the web search engines, and how their perceived level of satisfaction affects their perceived intention to reuse the system. 50 participants from an Australian university participated in the current study, where they performed three search tasks and completed survey questionnaires. A research model was constructed to test the proposed hypotheses. Correlation and regression analyses results indicated a significant correlation between (1) users’ prior level of experience and their perceived level of satisfaction in using the web search engines, and (2) their perceived level of satisfaction in using the systems and their perceived intention to reuse the systems. A theoretical model is proposed to illustrate the causal relationships. The implications and limitations of the study are also discussed.