97 resultados para WIC


Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the overwhelming increase in the amount of texts on the web, it is almost impossible for people to keep abreast of up-to-date information. Text mining is a process by which interesting information is derived from text through the discovery of patterns and trends. Text mining algorithms are used to guarantee the quality of extracted knowledge. However, the extracted patterns using text or data mining algorithms or methods leads to noisy patterns and inconsistency. Thus, different challenges arise, such as the question of how to understand these patterns, whether the model that has been used is suitable, and if all the patterns that have been extracted are relevant. Furthermore, the research raises the question of how to give a correct weight to the extracted knowledge. To address these issues, this paper presents a text post-processing method, which uses a pattern co-occurrence matrix to find the relation between extracted patterns in order to reduce noisy patterns. The main objective of this paper is not only reducing the number of closed sequential patterns, but also improving the performance of pattern mining as well. The experimental results on Reuters Corpus Volume 1 data collection and TREC filtering topics show that the proposed method is promising.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. In order to enhance customer satisfaction and their shopping experiences, it has become important to analysis customers reviews to extract opinions on the products that they buy. Thus, Opinion Mining is getting more important than before especially in doing analysis and forecasting about customers’ behavior for businesses purpose. The right decision in producing new products or services based on data about customers’ characteristics means profit for organization/company. This paper proposes a new architecture for Opinion Mining, which uses a multidimensional model to integrate customers’ characteristics and their comments about products (or services). The key step to achieve this objective is to transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations. This research presents a comprehensive way to calculate customers’ orientation for all possible products’ attributes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In order to comprehend user information needs by concepts, this paper introduces a novel method to match relevance features with ontological concepts. The method first discovers relevance features from user local instances. Then, a concept matching approach is developed for matching these features to accurate concepts in a global knowledge base. This approach is significant for the transition of informative descriptor and conceptional descriptor. The proposed method is elaborately evaluated by comparing against three information gathering baseline models. The experimental results shows the matching approach is successful and achieves a series of remarkable improvements on search effectiveness.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In a people-to-people matching systems, filtering is widely applied to find the most suitable matches. The results returned are either too many or only a few when the search is generic or specific respectively. The use of a sophisticated recommendation approach becomes necessary. Traditionally, the object of recommendation is the item which is inanimate. In online dating systems, reciprocal recommendation is required to suggest a partner only when the user and the recommended candidate both are satisfied. In this paper, an innovative reciprocal collaborative method is developed based on the idea of similarity and common neighbors, utilizing the information of relevance feedback and feature importance. Extensive experiments are carried out using data gathered from a real online dating service. Compared to benchmarking methods, our results show the proposed method can achieve noticeable better performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new community and communication type of social networks - online dating - are gaining momentum. With many people joining in the dating network, users become overwhelmed by choices for an ideal partner. A solution to this problem is providing users with partners recommendation based on their interests and activities. Traditional recommendation methods ignore the users’ needs and provide recommendations equally to all users. In this paper, we propose a recommendation approach that employs different recommendation strategies to different groups of members. A segmentation method using the Gaussian Mixture Model (GMM) is proposed to customize users’ needs. Then a targeted recommendation strategy is applied to each identified segment. Empirical results show that the proposed approach outperforms several existing recommendation methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The rapid development of the World Wide Web has created massive information leading to the information overload problem. Under this circumstance, personalization techniques have been brought out to help users in finding content which meet their personalized interests or needs out of massively increasing information. User profiling techniques have performed the core role in this research. Traditionally, most user profiling techniques create user representations in a static way. However, changes of user interests may occur with time in real world applications. In this research we develop algorithms for mining user interests by integrating time decay mechanisms into topic-based user interest profiling. Time forgetting functions will be integrated into the calculation of topic interest measurements on in-depth level. The experimental study shows that, considering temporal effects of user interests by integrating time forgetting mechanisms shows better performance of recommendation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most recommender systems attempt to use collaborative filtering, content-based filtering or hybrid approach to recommend items to new users. Collaborative filtering recommends items to new users based on their similar neighbours, and content-based filtering approach tries to recommend items that are similar to new users' profiles. The fundamental issues include how to profile new users, and how to deal with the over-specialization in content-based recommender systems. Indeed, the terms used to describe items can be formed as a concept hierarchy. Therefore, we aim to describe user profiles or information needs by using concepts vectors. This paper presents a new method to acquire user information needs, which allows new users to describe their preferences on a concept hierarchy rather than rating items. It also develops a new ranking function to recommend items to new users based on their information needs. The proposed approach is evaluated on Amazon book datasets. The experimental results demonstrate that the proposed approach can largely improve the effectiveness of recommender systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Different reputation models are used in the web in order to generate reputation values for products using uses' review data. Most of the current reputation models use review ratings and neglect users' textual reviews, because it is more difficult to process. However, we argue that the overall reputation score for an item does not reflect the actual reputation for all of its features. And that's why the use of users' textual reviews is necessary. In our work we introduce a new reputation model that defines a new aggregation method for users' extracted opinions about products' features from users' text. Our model uses features ontology in order to define general features and sub-features of a product. It also reflects the frequencies of positive and negative opinions. We provide a case study to show how our results compare with other reputation models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Textual document set has become an important and rapidly growing information source in the web. Text classification is one of the crucial technologies for information organisation and management. Text classification has become more and more important and attracted wide attention of researchers from different research fields. In this paper, many feature selection methods, the implement algorithms and applications of text classification are introduced firstly. However, because there are much noise in the knowledge extracted by current data-mining techniques for text classification, it leads to much uncertainty in the process of text classification which is produced from both the knowledge extraction and knowledge usage, therefore, more innovative techniques and methods are needed to improve the performance of text classification. It has been a critical step with great challenge to further improve the process of knowledge extraction and effectively utilization of the extracted knowledge. Rough Set decision making approach is proposed to use Rough Set decision techniques to more precisely classify the textual documents which are difficult to separate by the classic text classification methods. The purpose of this paper is to give an overview of existing text classification technologies, to demonstrate the Rough Set concepts and the decision making approach based on Rough Set theory for building more reliable and effective text classification framework with higher precision, to set up an innovative evaluation metric named CEI which is very effective for the performance assessment of the similar research, and to propose a promising research direction for addressing the challenging problems in text classification, text mining and other relative fields.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the overwhelming increase in the amount of data on the web and data bases, many text mining techniques have been proposed for mining useful patterns in text documents. Extracting closed sequential patterns using the Pattern Taxonomy Model (PTM) is one of the pruning methods to remove noisy, inconsistent, and redundant patterns. However, PTM model treats each extracted pattern as whole without considering included terms, which could affect the quality of extracted patterns. This paper propose an innovative and effective method that extends the random set to accurately weigh patterns based on their distribution in the documents and their terms distribution in patterns. Then, the proposed approach will find the specific closed sequential patterns (SCSP) based on the new calculated weight. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Semantic Web offers many possibilities for future Web technologies. Therefore, it is a need to search for ways that can bring the huge amount of unstructured documents from current Web to Semantic Web automatically. One big challenge in searching for such ways is how to understand patterns by both humans and machine. To address this issue, we present an innovative model which interprets patterns to high level concepts. These concepts can explain the patterns' meanings in a human understandable way while improving the information filtering performance. The model is evaluated by comparing it against one state-of-the-art benchmark model using standard Reuters dataset. The results show that the proposed model is successful. The significance of this model is three fold. It gives a way to interpret text mining output, provides a technique to find concepts relevant to the whole set of patterns which is an essential feature to understand the topic, and to some extent overcomes information mismatch and overload problems of existing models. This model will be very useful for knowledge based applications.