995 resultados para factual information


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Over the last decade, the rapid growth and adoption of the World Wide Web has further exacerbated user needs for e±cient mechanisms for information and knowledge location, selection, and retrieval. How to gather useful and meaningful information from the Web becomes challenging to users. The capture of user information needs is key to delivering users' desired information, and user pro¯les can help to capture information needs. However, e®ectively acquiring user pro¯les is di±cult. It is argued that if user background knowledge can be speci¯ed by ontolo- gies, more accurate user pro¯les can be acquired and thus information needs can be captured e®ectively. Web users implicitly possess concept models that are obtained from their experience and education, and use the concept models in information gathering. Prior to this work, much research has attempted to use ontologies to specify user background knowledge and user concept models. However, these works have a drawback in that they cannot move beyond the subsumption of super - and sub-class structure to emphasising the speci¯c se- mantic relations in a single computational model. This has also been a challenge for years in the knowledge engineering community. Thus, using ontologies to represent user concept models and to acquire user pro¯les remains an unsolved problem in personalised Web information gathering and knowledge engineering. In this thesis, an ontology learning and mining model is proposed to acquire user pro¯les for personalised Web information gathering. The proposed compu- tational model emphasises the speci¯c is-a and part-of semantic relations in one computational model. The world knowledge and users' Local Instance Reposito- ries are used to attempt to discover and specify user background knowledge. From a world knowledge base, personalised ontologies are constructed by adopting au- tomatic or semi-automatic techniques to extract user interest concepts, focusing on user information needs. A multidimensional ontology mining method, Speci- ¯city and Exhaustivity, is also introduced in this thesis for analysing the user background knowledge discovered and speci¯ed in user personalised ontologies. The ontology learning and mining model is evaluated by comparing with human- based and state-of-the-art computational models in experiments, using a large, standard data set. The experimental results are promising for evaluation. The proposed ontology learning and mining model in this thesis helps to develop a better understanding of user pro¯le acquisition, thus providing better design of personalised Web information gathering systems. The contributions are increasingly signi¯cant, given both the rapid explosion of Web information in recent years and today's accessibility to the Internet and the full text world.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.

Relevância:

20.00% 20.00%

Publicador: