989 resultados para information filtering


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present an information filtering agent called sharable instructable information filtering agent (SIIFA). It adopted the approach of sharable instructable agents. SIIFA provides comprehensible and flexible interaction to represent and filter the documents. The representation scheme in SIIFA is personalized. It, either fully or partly, can be shared among the users of the stream while not revealing their interests and can be easily edited. SIIFA is evaluated on the comp.ai.neural-nets Usent newsgroup documents and compared with the vector space method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents methods based on Information Filters for solving matching problems with emphasis on real-time, or effectively real-time applications. Both applications discussed in this work deal with ultrasound-based rigid registration in computer-assisted orthopedic surgery. In the first application, the usual workflow of rigid registration is reformulated such that registration algorithms would iterate while the surgeon is acquiring ultrasound images of the anatomy to be operated. Using this effectively real-time approach to registration, the surgeon would then receive feedback in order to better gauge the quality of the final registration outcome. The second application considered in this paper circumvents the need to attach physical markers to bones for anatomical referencing. Experiments using anatomical objects immersed in water are performed in order to evaluate and compare the different methods presented herein, using both 2D as well as real-time 3D ultrasound.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Information Filtering (IF) a user may be interested in several topics in parallel. But IF systems have been built on representational models derived from Information Retrieval and Text Categorization, which assume independence between terms. The linearity of these models results in user profiles that can only represent one topic of interest. We present a methodology that takes into account term dependencies to construct a single profile representation for multiple topics, in the form of a hierarchical term network. We also introduce a series of non-linear functions for evaluating documents against the profile. Initial experiments produced positive results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Adaptive information filtering is a challenging research problem. It requires the adaptation of a representation of a user’s multiple interests to various changes in them. We investigate the application of an immune-inspired approach to this problem. Nootropia, is a user profiling model that has many properties in common with computational models of the immune system that have been based on Franscisco Varela’s work. In this paper we concentrate on Nootropia’s evaluation. We define an evaluation methodology that uses virtual user’s to simulate various interest changes. The results show that Nootropia exhibits the desirable adaptive behaviour.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Postprint

Relevância:

70.00% 70.00%

Publicador:

Resumo:

When the average number of spam messages received is continually increasing exponentially, both the Internet service provider and the end user suffer. The lack of an efficient solution may threaten the usability of the email as a communication means. In this paper we present a filtering mechanism applying the idea of preference ranking. This filtering mechanism will distinguish spam emails from other email on the Internet. The preference ranking gives the similarity values for nominated emails and spam emails specified by users, so that the ISP/end users can deal with spam emails at filtering points. We designed three filtering points to classify nominated emails into spam email, unsure email and legitimate email. This filtering mechanism can be applied on both middleware and at the client-side. The experiments show that high precision, recall and TCR (total cost ratio) of spam emails can be predicted for the preference based filtering mechanisms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Spam is commonly defined as unsolicited email messages and the goal of spam filtering is to differentiate spam from legitimate email. Much work have been done to filter spam from legitimate emails using machine learning algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In this paper, architecture of spam filtering has been proposed based on support vector machine (SVM,) which will get better accuracy by reducing FP problems. In this architecture an innovative technique for feature selection called dynamic feature selection (DFS) has been proposed which is enhanced the overall performance of the architecture with reduction of FP problems. The experimental result shows that the proposed technique gives better performance compare to similar existing techniques.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

To use the vast amount of information efficiently and effectively from Web sites is very important for making informed decisions. There are, however, still many problems that need to be overcome in the information gathering research arena to enable the delivery of relevant information required by users. In this paper, an information gathering system is develop by means of multiple agents to solve those problems. We employed some ideas of Gaia's methodology and an open agent architecture to analyze and design the system. The system consists of a query preprocessing agent, information retrieval agent, information filtering agent, and information management agent. The filtering agent is trained with categorized documents and can provide users with the necessary information. The experimental results show that all agents in the system can work cooperatively to retrieve relevant information from the World Wide Web environment.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

While the important role of family carers has been increasingly recognized in healthcare service provision, particularly for patients with acute or chronic illnesses, the family carer's information needs have not been well understood or adequately supported by health information systems. In this study, we explore the information needs of a family carer by analyzing the extensive online diary of a Vietnamese family carer supporting his wife, who was a lung cancer patient. The study provides a deep understanding of the information needs of the family carer and suggests a four-stage information journey model including identification, searching, interpretation and information sharing, and collaboration. A number of themes emerge from the study including the key role of the carer, information filtering by the carer, information sharing and collaboration, and the influence of Vietnamese culture. The paper concludes with a discussion of the requirements for health information systems that meet the needs of family carers.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The recent emergence of intelligent agent technology and advances in information gathering have been the important steps forward in efficiently managing and using the vast amount of information now available on the Web to make informed decisions. There are, however, still many problems that need to be overcome in the information gathering research arena to enable the delivery of relevant information required by end users. Good decisions cannot be made without sufficient, timely, and correct information. Traditionally it is said that knowledge is power, however, nowadays sufficient, timely, and correct information is power. So gathering relevant information to meet user information needs is the crucial step for making good decisions. The ideal goal of information gathering is to obtain only the information that users need (no more and no less). However, the volume of information available, diversity formats of information, uncertainties of information, and distributed locations of information (e.g. World Wide Web) hinder the process of gathering the right information to meet the user needs. Specifically, two fundamental issues in regard to efficiency of information gathering are mismatch and overload. The mismatch means some information that meets user needs has not been gathered (or missed out), whereas, the overload means some gathered information is not what users need. Traditional information retrieval has been developed well in the past twenty years. The introduction of the Web has changed people's perceptions of information retrieval. Usually, the task of information retrieval is considered to have the function of leading the user to those documents that are relevant to his/her information needs. The similar function in information retrieval is to filter out the irrelevant documents (or called information filtering). Research into traditional information retrieval has provided many retrieval models and techniques to represent documents and queries. Nowadays, information is becoming highly distributed, and increasingly difficult to gather. On the other hand, people have found a lot of uncertainties that are contained in the user information needs. These motivate the need for research in agent-based information gathering. Agent-based information systems arise at this moment. In these kinds of systems, intelligent agents will get commitments from their users and act on the users behalf to gather the required information. They can easily retrieve the relevant information from highly distributed uncertain environments because of their merits of intelligent, autonomy and distribution. The current research for agent-based information gathering systems is divided into single agent gathering systems, and multi-agent gathering systems. In both research areas, there are still open problems to be solved so that agent-based information gathering systems can retrieve the uncertain information more effectively from the highly distributed environments. The aim of this thesis is to research the theoretical framework for intelligent agents to gather information from the Web. This research integrates the areas of information retrieval and intelligent agents. The specific research areas in this thesis are the development of an information filtering model for single agent systems, and the development of a dynamic belief model for information fusion for multi-agent systems. The research results are also supported by the construction of real information gathering agents (e.g., Job Agent) for the Internet to help users to gather useful information stored in Web sites. In such a framework, information gathering agents have abilities to describe (or learn) the user information needs, and act like users to retrieve, filter, and/or fuse the information. A rough set based information filtering model is developed to address the problem of overload. The new approach allows users to describe their information needs on user concept spaces rather than on document spaces, and it views a user information need as a rough set over the document space. The rough set decision theory is used to classify new documents into three regions: positive region, boundary region, and negative region. Two experiments are presented to verify this model, and it shows that the rough set based model provides an efficient approach to the overload problem. In this research, a dynamic belief model for information fusion in multi-agent environments is also developed. This model has a polynomial time complexity, and it has been proven that the fusion results are belief (mass) functions. By using this model, a collection fusion algorithm for information gathering agents is presented. The difficult problem for this research is the case where collections may be used by more than one agent. This algorithm, however, uses the technique of cooperation between agents, and provides a solution for this difficult problem in distributed information retrieval systems. This thesis presents the solutions to the theoretical problems in agent-based information gathering systems, including information filtering models, agent belief modeling, and collection fusions. It also presents solutions to some of the technical problems in agent-based information systems, such as document classification, the architecture for agent-based information gathering systems, and the decision in multiple agent environments. Such kinds of information gathering agents will gather relevant information from highly distributed uncertain environments.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this article, we propose a framework, namely, Prediction-Learning-Distillation (PLD) for interactive document classification and distilling misclassified documents. Whenever a user points out misclassified documents, the PLD learns from the mistakes and identifies the same mistakes from all other classified documents. The PLD then enforces this learning for future classifications. If the classifier fails to accept relevant documents or reject irrelevant documents on certain categories, then PLD will assign those documents as new positive/negative training instances. The classifier can then strengthen its weakness by learning from these new training instances. Our experiments’ results have demonstrated that the proposed algorithm can learn from user-identified misclassified documents, and then distil the rest successfully.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Traditional content-based filtering methods usually utilize text extraction and classification techniques for building user profiles as well as for representations of contents, i.e. item profiles. These methods have some disadvantages e.g. mismatch between user profile terms and item profile terms, leading to low performance. Some of the disadvantages can be overcome by incorporating a common ontology which enables representing both the users' and the items' profiles with concepts taken from the same vocabulary. We propose a new content-based method for filtering and ranking the relevancy of items for users, which utilizes a hierarchical ontology. The method measures the similarity of the user's profile to the items' profiles, considering the existing of mutual concepts in the two profiles, as well as the existence of "related" concepts, according to their position in the ontology. The proposed filtering algorithm computes the similarity between the users' profiles and the items' profiles, and rank-orders the relevant items according to their relevancy to each user. The method is being implemented in ePaper, a personalized electronic newspaper project, utilizing a hierarchical ontology designed specifically for classification of News items. It can, however, be utilized in other domains and extended to other ontologies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

While news stories are an important traditional medium to broadcast and consume news, microblogging has recently emerged as a place where people can dis- cuss, disseminate, collect or report information about news. However, the massive information in the microblogosphere makes it hard for readers to keep up with these real-time updates. This is especially a problem when it comes to breaking news, where people are more eager to know “what is happening”. Therefore, this dis- sertation is intended as an exploratory effort to investigate computational methods to augment human effort when monitoring the development of breaking news on a given topic from a microblog stream by extractively summarizing the updates in a timely manner. More specifically, given an interest in a topic, either entered as a query or presented as an initial news report, a microblog temporal summarization system is proposed to filter microblog posts from a stream with three primary concerns: topical relevance, novelty, and salience. Considering the relatively high arrival rate of microblog streams, a cascade framework consisting of three stages is proposed to progressively reduce quantity of posts. For each step in the cascade, this dissertation studies methods that improve over current baselines. In the relevance filtering stage, query and document expansion techniques are applied to mitigate sparsity and vocabulary mismatch issues. The use of word embedding as a basis for filtering is also explored, using unsupervised and supervised modeling to characterize lexical and semantic similarity. In the novelty filtering stage, several statistical ways of characterizing novelty are investigated and ensemble learning techniques are used to integrate results from these diverse techniques. These results are compared with a baseline clustering approach using both standard and delay-discounted measures. In the salience filtering stage, because of the real-time prediction requirement a method of learning verb phrase usage from past relevant news reports is used in conjunction with some standard measures for characterizing writing quality. Following a Cranfield-like evaluation paradigm, this dissertation includes a se- ries of experiments to evaluate the proposed methods for each step, and for the end- to-end system. New microblog novelty and salience judgments are created, building on existing relevance judgments from the TREC Microblog track. The results point to future research directions at the intersection of social media, computational jour- nalism, information retrieval, automatic summarization, and machine learning.