2 resultados para Yi li (Title)
em University of Queensland eSpace - Australia
Resumo:
Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.
Resumo:
Collaborative filtering is regarded as one of the most promising recommendation algorithms. The item-based approaches for collaborative filtering identify the similarity between two items by comparing users' ratings on them. In these approaches, ratings produced at different times are weighted equally. That is to say, changes in user purchase interest are not taken into consideration. For example, an item that was rated recently by a user should have a bigger impact on the prediction of future user behaviour than an item that was rated a long time ago. In this paper, we present a novel algorithm to compute the time weights for different items in a manner that will assign a decreasing weight to old data. More specifically, the users' purchase habits vary. Even the same user has quite different attitudes towards different items. Our proposed algorithm uses clustering to discriminate between different kinds of items. To each item cluster, we trace each user's purchase interest change and introduce a personalized decay factor according to the user own purchase behaviour. Empirical studies have shown that our new algorithm substantially improves the precision of item-based collaborative filtering without introducing higher order computational complexity.