528 resultados para Information Search
Resumo:
The World Wide Web has become a medium for people to share information. People use Web-based collaborative tools such as question answering (QA) portals, blogs/forums, email and instant messaging to acquire information and to form online-based communities. In an online QA portal, a user asks a question and other users can provide answers based on their knowledge, with the question usually being answered by many users. It can become overwhelming and/or time/resource consuming for a user to read all of the answers provided for a given question. Thus, there exists a need for a mechanism to rank the provided answers so users can focus on only reading good quality answers. The majority of online QA systems use user feedback to rank users’ answers and the user who asked the question can decide on the best answer. Other users who didn’t participate in answering the question can also vote to determine the best answer. However, ranking the best answer via this collaborative method is time consuming and requires an ongoing continuous involvement of users to provide the needed feedback. The objective of this research is to discover a way to recommend the best answer as part of a ranked list of answers for a posted question automatically, without the need for user feedback. The proposed approach combines both a non-content-based reputation method and a content-based method to solve the problem of recommending the best answer to the user who posted the question. The non-content method assigns a score to each user which reflects the users’ reputation level in using the QA portal system. Each user is assigned two types of non-content-based reputations cores: a local reputation score and a global reputation score. The local reputation score plays an important role in deciding the reputation level of a user for the category in which the question is asked. The global reputation score indicates the prestige of a user across all of the categories in the QA system. Due to the possibility of user cheating, such as awarding the best answer to a friend regardless of the answer quality, a content-based method for determining the quality of a given answer is proposed, alongside the non-content-based reputation method. Answers for a question from different users are compared with an ideal (or expert) answer using traditional Information Retrieval and Natural Language Processing techniques. Each answer provided for a question is assigned a content score according to how well it matched the ideal answer. To evaluate the performance of the proposed methods, each recommended best answer is compared with the best answer determined by one of the most popular link analysis methods, Hyperlink-Induced Topic Search (HITS). The proposed methods are able to yield high accuracy, as shown by correlation scores: Kendall correlation and Spearman correlation. The reputation method outperforms the HITS method in terms of recommending the best answer. The inclusion of the reputation score with the content score improves the overall performance, which is measured through the use of Top-n match scores.
Resumo:
Over the last decade, the rapid growth and adoption of the World Wide Web has further exacerbated user needs for e±cient mechanisms for information and knowledge location, selection, and retrieval. How to gather useful and meaningful information from the Web becomes challenging to users. The capture of user information needs is key to delivering users' desired information, and user pro¯les can help to capture information needs. However, e®ectively acquiring user pro¯les is di±cult. It is argued that if user background knowledge can be speci¯ed by ontolo- gies, more accurate user pro¯les can be acquired and thus information needs can be captured e®ectively. Web users implicitly possess concept models that are obtained from their experience and education, and use the concept models in information gathering. Prior to this work, much research has attempted to use ontologies to specify user background knowledge and user concept models. However, these works have a drawback in that they cannot move beyond the subsumption of super - and sub-class structure to emphasising the speci¯c se- mantic relations in a single computational model. This has also been a challenge for years in the knowledge engineering community. Thus, using ontologies to represent user concept models and to acquire user pro¯les remains an unsolved problem in personalised Web information gathering and knowledge engineering. In this thesis, an ontology learning and mining model is proposed to acquire user pro¯les for personalised Web information gathering. The proposed compu- tational model emphasises the speci¯c is-a and part-of semantic relations in one computational model. The world knowledge and users' Local Instance Reposito- ries are used to attempt to discover and specify user background knowledge. From a world knowledge base, personalised ontologies are constructed by adopting au- tomatic or semi-automatic techniques to extract user interest concepts, focusing on user information needs. A multidimensional ontology mining method, Speci- ¯city and Exhaustivity, is also introduced in this thesis for analysing the user background knowledge discovered and speci¯ed in user personalised ontologies. The ontology learning and mining model is evaluated by comparing with human- based and state-of-the-art computational models in experiments, using a large, standard data set. The experimental results are promising for evaluation. The proposed ontology learning and mining model in this thesis helps to develop a better understanding of user pro¯le acquisition, thus providing better design of personalised Web information gathering systems. The contributions are increasingly signi¯cant, given both the rapid explosion of Web information in recent years and today's accessibility to the Internet and the full text world.
Resumo:
Tagging has become one of the key activities in next generation websites which allow users selecting short labels to annotate, manage, and share multimedia information such as photos, videos and bookmarks. Tagging does not require users any prior training before participating in the annotation activities as they can freely choose any terms which best represent the semantic of contents without worrying about any formal structure or ontology. However, the practice of free-form tagging can lead to several problems, such as synonymy, polysemy and ambiguity, which potentially increase the complexity of managing the tags and retrieving information. To solve these problems, this research aims to construct a lightweight indexing scheme to structure tags by identifying and disambiguating the meaning of terms and construct a knowledge base or dictionary. News has been chosen as the primary domain of application to demonstrate the benefits of using structured tags for managing the rapidly changing and dynamic nature of news information. One of the main outcomes of this work is an automatically constructed vocabulary that defines the meaning of each named entity tag, which can be extracted from a news article (including person, location and organisation), based on experts suggestions from major search engines and the knowledge from public database such as Wikipedia. To demonstrate the potential applications of the vocabulary, we have used it to provide more functionalities in an online news website, including topic-based news reading, intuitive tagging, clipping and sharing of interesting news, as well as news filtering or searching based on named entity tags. The evaluation results on the impact of disambiguating tags have shown that the vocabulary can help to significantly improve news searching performance. The preliminary results from our user study have demonstrated that users can benefit from the additional functionalities on the news websites as they are able to retrieve more relevant news, clip and share news with friends and families effectively.