310 resultados para Topic modeling

em Queensland University of Technology - ePrints Archive


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. In this paper, a novel information filtering model, Pattern-based Topic Model(PBTM) , is proposed to represent the text documents not only using the topic distributions at general level but also using semantic pattern representations at detailed specific level, both of which contribute to the accurate document representation and document relevance ranking. Extensive experiments are conducted to evaluate the effectiveness of PBTM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model achieves outstanding performance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems have two characteristics that need to be carefully studied in order to build a reliable system. Firstly, the multi-dimensional correlation, called as tag assignment , should be appropriately modelled in order to create the user profiles [1]. Secondly, the semantics behind the tags should be considered properly as the flexibility with their design can cause semantic problems such as synonymy and polysemy [2]. This research proposes to address these two challenges for building a tag-based item recommendation system by employing tensor modeling as the multi-dimensional user profile approach, and the topic model as the semantic analysis approach. The first objective is to optimize the tensor model reconstruction and to improve the model performance in generating quality rec-ommendation. A novel Tensor-based Recommendation using Probabilistic Ranking (TRPR) method [3] has been developed. Results show this method to be scalable for large datasets and outperforming the benchmarking methods in terms of accuracy. The memory efficient loop implements the n-mode block-striped (matrix) product for tensor reconstruction as an approximation of the initial tensor. The probabilistic ranking calculates the probabil-ity of users to select candidate items using their tag preference list based on the entries generated from the reconstructed tensor. The second objective is to analyse the tag semantics and utilize the outcome in building the tensor model. This research proposes to investigate the problem using topic model approach to keep the tags nature as the “social vocabulary” [4]. For the tag assignment data, topics can be generated from the occurrences of tags given for an item. However there is only limited amount of tags availa-ble to represent items as collection of topics, since an item might have only been tagged by using several tags. Consequently, the generated topics might not able to represent the items appropriately. Furthermore, given that each tag can belong to any topics with various probability scores, the occurrence of tags cannot simply be mapped by the topics to build the tensor model. A standard weighting technique will not appropriately calculate the value of tagging activity since it will define the context of an item using a tag instead of a topic.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quantum theory has recently been employed to further advance the theory of information retrieval (IR). A challenging research topic is to investigate the so called quantum-like interference in users’ relevance judgement process, where users are involved to judge the relevance degree of each document with respect to a given query. In this process, users’ relevance judgement for the current document is often interfered by the judgement for previous documents, due to the interference on users’ cognitive status. Research from cognitive science has demonstrated some initial evidence of quantum-like cognitive interference in human decision making, which underpins the user’s relevance judgement process. This motivates us to model such cognitive interference in the relevance judgement process, which in our belief will lead to a better modeling and explanation of user behaviors in relevance judgement process for IR and eventually lead to more user-centric IR models. In this paper, we propose to use probabilistic automaton(PA) and quantum finite automaton (QFA), which are suitable to represent the transition of user judgement states, to dynamically model the cognitive interference when the user is judging a list of documents.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Software development and Web site development techniques have evolved significantly over the past 20 years. The relatively young Web Application development area has borrowed heavily from traditional software development methodologies primarily due to the similarities in areas of data persistence and User Interface (UI) design. Recent developments in this area propose a new Web Modeling Language (WebML) to facilitate the nuances specific to Web development. WebML is one of a number of implementations designed to enable modeling of web site interaction flows while being extendable to accommodate new features in Web site development into the future. Our research aims to extend WebML with a focus on stigmergy which is a biological term originally used to describe coordination between insects. We see design features in existing Web sites that mimic stigmergic mechanisms as part of the UI. We believe that we can synthesize and embed stigmergy in Web 2.0 sites. This paper focuses on the sub-topic of site UI design and stigmergic mechanism designs required to achieve this.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The SimCalc Vision and Contributions Advances in Mathematics Education 2013, pp 419-436 Modeling as a Means for Making Powerful Ideas Accessible to Children at an Early Age Richard Lesh, Lyn English, Serife Sevis, Chanda Riggs … show all 4 hide » Look Inside » Get Access Abstract In modern societies in the 21st century, significant changes have been occurring in the kinds of “mathematical thinking” that are needed outside of school. Even in the case of primary school children (grades K-2), children not only encounter situations where numbers refer to sets of discrete objects that can be counted. Numbers also are used to describe situations that involve continuous quantities (inches, feet, pounds, etc.), signed quantities, quantities that have both magnitude and direction, locations (coordinates, or ordinal quantities), transformations (actions), accumulating quantities, continually changing quantities, and other kinds of mathematical objects. Furthermore, if we ask, what kind of situations can children use numbers to describe? rather than restricting attention to situations where children should be able to calculate correctly, then this study shows that average ability children in grades K-2 are (and need to be) able to productively mathematize situations that involve far more than simple counts. Similarly, whereas nearly the entire K-16 mathematics curriculum is restricted to situations that can be mathematized using a single input-output rule going in one direction, even the lives of primary school children are filled with situations that involve several interacting actions—and which involve feedback loops, second-order effects, and issues such as maximization, minimization, or stabilizations (which, many years ago, needed to be postponed until students had been introduced to calculus). …This brief paper demonstrates that, if children’s stories are used to introduce simulations of “real life” problem solving situations, then average ability primary school children are quite capable of dealing productively with 60-minute problems that involve (a) many kinds of quantities in addition to “counts,” (b) integrated collections of concepts associated with a variety of textbook topic areas, (c) interactions among several different actors, and (d) issues such as maximization, minimization, and stabilization.