994 resultados para information engine
Resumo:
The Web has become a worldwide repository of information which individuals, companies, and organizations utilize to solve or address various information problems. Many of these Web users utilize automated agents to gather this information for them. Some assume that this approach represents a more sophisticated method of searching. However, there is little research investigating how Web agents search for online information. In this research, we first provide a classification for information agent using stages of information gathering, gathering approaches, and agent architecture. We then examine an implementation of one of the resulting classifications in detail, investigating how agents search for information on Web search engines, including the session, query, term, duration and frequency of interactions. For this temporal study, we analyzed three data sets of queries and page views from agents interacting with the Excite and AltaVista search engines from 1997 to 2002, examining approximately 900,000 queries submitted by over 3,000 agents. Findings include: (1) agent sessions are extremely interactive, with sometimes hundreds of interactions per second (2) agent queries are comparable to human searchers, with little use of query operators, (3) Web agents are searching for a relatively limited variety of information, wherein only 18% of the terms used are unique, and (4) the duration of agent-Web search engine interaction typically spans several hours. We discuss the implications for Web information agents and search engines.
Resumo:
Usability is a multi-dimensional characteristic of a computer system. This paper focuses on usability as a measurement of interaction between the user and the system. The research employs a task-oriented approach to evaluate the usability of a meta search engine. This engine encourages and accepts queries of unlimited size expressed in natural language. A variety of conventional metrics developed by academic and industrial research, including ISO standards,, are applied to the information retrieval process consisting of sequential tasks. Tasks range from formulating (long) queries to interpreting and retaining search results. Results of the evaluation and analysis of the operation log indicate that obtaining advanced search engine results can be accomplished simultaneously with enhancing the usability of the interactive process. In conclusion, we discuss implications for interactive information retrieval system design and directions for future usability research. © 2008 Academy Publisher.
Resumo:
Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is not well understood. In this study, we investigate the usage of Dogpile.com, a major Web metasearch engine, with the aim of discovering how Web searchers interact with metasearch engines. We report results examining 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005 and compare these results with findings from other Web searching studies. We collect data on geographical location of searchers, use of system feedback, content selection, sessions, queries, and term usage. Findings show that Dogpile.com searchers are mainly from the USA (84% of searchers), use about 3 terms per query (mean = 2.85), implement system feedback moderately (8.4% of users), and generally (56% of users) spend less than one minute interacting with the Web search engine. Overall, metasearchers seem to have higher degrees of interaction than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of metasearching may be what define the differences from other forms of Web searching. We discuss the implications of our findings in relation to metasearch for Web searchers, search engines, and content providers.
Resumo:
Nowadays, everyone can effortlessly access a range of information on the World Wide Web (WWW). As information resources on the web continue to grow tremendously, it becomes progressively more difficult to meet high expectations of users and find relevant information. Although existing search engine technologies can find valuable information, however, they suffer from the problems of information overload and information mismatch. This paper presents a hybrid Web Information Retrieval approach allowing personalised search using ontology, user profile and collaborative filtering. This approach finds the context of user query with least user’s involvement, using ontology. Simultaneously, this approach uses time-based automatic user profile updating with user’s changing behaviour. Subsequently, this approach uses recommendations from similar users using collaborative filtering technique. The proposed method is evaluated with the FIRE 2010 dataset and manually generated dataset. Empirical analysis reveals that Precision, Recall and F-Score of most of the queries for many users are improved with proposed method.
Resumo:
This paper demonstrates an experimental study that examines the accuracy of various information retrieval techniques for Web service discovery. The main goal of this research is to evaluate algorithms for semantic web service discovery. The evaluation is comprehensively benchmarked using more than 1,700 real-world WSDL documents from INEX 2010 Web Service Discovery Track dataset. For automatic search, we successfully use Latent Semantic Analysis and BM25 to perform Web service discovery. Moreover, we provide linking analysis which automatically links possible atomic Web services to meet the complex requirements of users. Our fusion engine recommends a final result to users. Our experiments show that linking analysis can improve the overall performance of Web service discovery. We also find that keyword-based search can quickly return results but it has limitation of understanding users’ goals.
Resumo:
The feasibility of real-time calculation of parameters for an internal combustion engine via reconfigurable hardware implementation is investigated as an alternative to software computation. A detailed in-hardware field programmable gate array (FPGA)-based design is developed and evaluated using input crank angle and in-cylinder pressure data from fully instrumented diesel engines in the QUT Biofuel Engine Research Facility (BERF). Results indicate the feasibility of employing a hardware-based implementation for real-time processing for speeds comparable to the data sampling rate currently used in the facility, with acceptably low level of discrepancies between hardware and software-based calculation of key engine parameters.
Resumo:
Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.
Resumo:
This thesis introduced Bayesian statistics as an analysis technique to isolate resonant frequency information in in-cylinder pressure signals taken from internal combustion engines. Applications of these techniques are relevant to engine design (performance and noise), energy conservation (fuel consumption) and alternative fuel evaluation. The use of Bayesian statistics, over traditional techniques, allowed for a more in-depth investigation into previously difficult to isolate engine parameters on a cycle-by-cycle basis. Specifically, these techniques facilitated the determination of the start of pre-mixed and diffusion combustion and for the in-cylinder temperature profile to be resolved on individual consecutive engine cycles. Dr Bodisco further showed the utility of the Bayesian analysis techniques by applying them to in-cylinder pressure signals taken from a compression ignition engine run with fumigated ethanol.
Resumo:
A fear of imminent information overload predates the World Wide Web by decades. Yet, that fear has never abated. Worse, as the World Wide Web today takes the lion’s share of the information we deal with, both in amount and in time spent gathering it, the situation has only become more precarious. This chapter analyses new issues in information overload that have emerged with the advent of the Web, which emphasizes written communication, defined in this context as the exchange of ideas expressed informally, often casually, as in verbal language. The chapter focuses on three ways to mitigate these issues. First, it helps us, the users, to be more specific in what we ask for. Second, it helps us amend our request when we don't get what we think we asked for. And third, since only we, the human users, can judge whether the information received is what we want, it makes retrieval techniques more effective by basing them on how humans structure information. This chapter reports on extensive experiments we conducted in all three areas. First, to let users be more specific in describing an information need, they were allowed to express themselves in an unrestricted conversational style. This way, they could convey their information need as if they were talking to a fellow human instead of using the two or three words typically supplied to a search engine. Second, users were provided with effective ways to zoom in on the desired information once potentially relevant information became available. Third, a variety of experiments focused on the search engine itself as the mediator between request and delivery of information. All examples that are explained in detail have actually been implemented. The results of our experiments demonstrate how a human-centered approach can reduce information overload in an area that grows in importance with each day that passes. By actually having built these applications, I present an operational, not just aspirational approach.
Resumo:
This paper presents the prototype of an information retrieval system for medical records that utilises visualisation techniques, namely word clouds and timelines. The system simplifies and assists information seeking tasks within the medical domain. Access to patient medical information can be time consuming as it requires practitioners to review a large number of electronic medical records to find relevant information. Presenting a summary of the content of a medical document by means of a word cloud may permit information seekers to decide upon the relevance of a document to their information need in a simple and time effective manner. We extend this intuition, by mapping word clouds of electronic medical records onto a timeline, to provide temporal information to the user. This allows exploring word clouds in the context of a patient’s medical history. To enhance the presentation of word clouds, we also provide the means for calculating aggregations and differences between patient’s word clouds.
Resumo:
This thesis presents new methods for classification and thematic grouping of billions of web pages, at scales previously not achievable. This process is also known as document clustering, where similar documents are automatically associated with clusters that represent various distinct topic. These automatically discovered topics are in turn used to improve search engine performance by only searching the topics that are deemed relevant to particular user queries.
Resumo:
Expert searchers engage with information in a variety of professional settings, as information brokers, reference librarians, information architects and faculty who teach advanced searching. As my recent research shows, the expert searcher’s information experience is defined by profound discernment of critical concepts about information, and a fluid ability to apply this knowledge to their engagement with the information environment. The information experience of the expert searcher means active and intentional participation with the processes and players that created that information environment. Expert searchers become an integral and seamless part of their information environment and also play a role in facilitating the information experiences of others. In this chapter, after discussing my understanding of the concept of information experience, I outline how I used threshold concept theory to explore the information experience of expert searchers. Through the findings, I identify four threshold concepts in the acquisition of search expertise that provide new perspectives on the information experience of the expert searcher. These new perspectives have implications for search engine design and how advanced search skills are taught. Finally, I consider how the fresh insights about the expert searcher’s experiences contribute to wider understanding about information experience.
Resumo:
Information available on company websites can help people navigate to the offices of groups and individuals within the company. Automatically retrieving this within-organisation spatial information is a challenging AI problem This paper introduces a novel unsupervised pattern-based method to extract within-organisation spatial information by taking advantage of HTML structure patterns, together with a novel Conditional Random Fields (CRF) based method to identify different categories of within-organisation spatial information. The results show that the proposed method can achieve a high performance in terms of F-Score, indicating that this purely syntactic method based on web search and an analysis of HTML structure is well-suited for retrieving within-organisation spatial information.
Resumo:
Developing countries in Asia and the Pacific are rapidly reaching middle income economic status. Their competitive advantage is shifting from labor-intensive industries and natural resource-based economies to knowledge-based economies that innovate and create new products and services. Early adoption of information and communication technology (ICT) can allow countries to leapfrog over the traditional development pathway into production of knowledge-based products and services. Since higher education institutions (HEIs) are considered a primary engine of economic growth, adoption of ICT is imperative for securing competitive advantage. ICT is thought to be one of the fastest growing industries and is frequently heralded as a transforming influence on higher education systems globally and, consequently, is enhancing the competitive advantage of countries. It is increasingly becoming evident that an institution-wide ICT strategy covering all evolving functions of competitive HEIs is necessary. Such a system may be designed as an integrated platform but implemented in phases.
Resumo:
The keyword based search technique suffers from the problem of synonymic and polysemic queries. Current approaches address only theproblem of synonymic queries in which different queries might have the same information requirement. But the problem of polysemic queries,i.e., same query having different intentions, still remains unaddressed. In this paper, we propose the notion of intent clusters, the members of which will have the same intention. We develop a clustering algorithm that uses the user session information in query logs in addition to query URL entries to identify cluster of queries having the same intention. The proposed approach has been studied through case examples from the actual log data from AOL, and the clustering algorithm is shown to be successful in discerning the user intentions.