34 resultados para Probabilistic choice models

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the most cited studies within the field of binary choice models is that of Klein and Spady (1993), in which the authors propose a semiparametric estimator for use when the distribution of the error term is unknown. However, although theoretically appealing, the estimator has been found to be difficult to implement, and therefore not very attractive from an applied point of view. The current study offers an indirect inference-based solution to this problem. The new estimator is not only simple with good small-sample properties, but also consistent and asymptotically normal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In discrete choice models the marginal effect of a variable of interest that is interacted with another variable differs from the marginal effect of a variable that is not interacted with any variable. The magnitude of the interaction effect is also not equal to the marginal effect of the interaction term. I present consistent estimators of both marginal and interaction effects in ordered response models. This procedure is general and can easily be extended to other discrete choice models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The amount of multimedia content available online constantly increases, and this leads to problems for users who search for content or similar communities. Users in Flickr often self-organize in user communities through Flickr Groups. These groups are particularly interesting as they are a natural instantiation of the content + relations social media paradigm. We propose a novel approach to group searching through hypergroup discovery. Starting from roughly 11,000 Flickr groups' content and membership information, we create three different bag-of-word representations for groups, on which we learn probabilistic topic models. Finally, we cast the hypergroup discovery as a clustering problem that is solved via probabilistic affinity propagation. We show that hypergroups so found are generally consistent and can be described through topic-based and similarity-based measures. Our proposed solution could be relatively easily implemented as an application to enrich Flickr's traditional group search.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traditional learning techniques learn from flat data files with the assumption that each class has a similar number of examples. However, the majority of real-world data are stored as relational systems with imbalanced data distribution, where one class of data is over-represented as compared with other classes. We propose to extend a relational learning technique called Probabilistic Relational Models (PRMs) to deal with the imbalanced class problem. We address learning from imbalanced relational data using an ensemble of PRMs and propose a new model: the PRMs-IM. We show the performance of PRMs-IM on a real university relational database to identify students at risk.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Probabilistic topic models have become a standard in modern machine learning with wide applications in organizing and summarizing ‘documents’ in high-dimensional data such as images, videos, texts, gene expression data, and so on. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics than bag-of-word interpretation, but also more informative for classification tasks. This paper describes the Topic Model Kernel (TMK), a high dimensional mapping for Support Vector Machine classification of data generated from probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks from real world datasets. We outperform existing kernels on the distributional features and give the comparative results on non-probabilistic data types.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Intertemporal labour–leisure choice models typically assume agents have a very low degree of impatience. Yet there is a lot of empirical evidence indicating a high degree of impatience. Using a life-cycle model of consumption–saving and labour–leisure choice, we show that even if an agent displays a relatively moderate degree of impatience, his labour supply choice delivers highly counterfactual patterns. We resolve this counterfactual finding by augmenting the standard model with a time-dependent marginal utility of leisure assumption that is consistent with some recent evidences from leisure studies. We also introduce various extensions and discuss their relative importance and associated challenges.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that models the process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The performance of different information criteria - namely Akaike, corrected Akaike (AICC), Schwarz-Bayesian (SBC), and Hannan-Quinn - is investigated so as to choose the optimal lag length in stable and unstable vector autoregressive (VAR) models both when autoregressive conditional heteroscedasticity (ARCH) is present and when it is not. The investigation covers both large and small sample sizes. The Monte Carlo simulation results show that SBC has relatively better performance in lag-choice accuracy in many situations. It is also generally the least sensitive to ARCH regardless of stability or instability of the VAR model, especially in large sample sizes. These appealing properties of SBC make it the optimal criterion for choosing lag length in many situations, especially in the case of financial data, which are usually characterized by occasional periods of high volatility. SBC also has the best forecasting abilities in the majority of situations in which we vary sample size, stability, variance structure (ARCH or not), and forecast horizon (one period or five). frequently, AICC also has good lag-choosing and forecasting properties. However, when ARCH is present, the five-period forecast performance of all criteria in all situations worsens.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recommender systems are important to help users select relevant and personalised information over massive amounts of data available. We propose an unified framework called Preference Network (PN) that jointly models various types of domain knowledge for the task of recommendation. The PN is a probabilistic model that systematically combines both content-based filtering and collaborative filtering into a single conditional Markov random field. Once estimated, it serves as a probabilistic database that supports various useful queries such as rating prediction and top-N recommendation. To handle the challenging problem of learning large networks of users and items, we employ a simple but effective pseudo-likelihood with regularisation. Experiments on the movie rating data demonstrate the merits of the PN.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Abstract
Recommender systems are important to help users select relevant and personalised information over massive amounts of data available. We propose an unified framework called Preference Network (PN) that jointly models various types of domain knowledge for the task of recommendation. The PN is a probabilistic model that systematically combines both content-based filtering and collaborative filtering into a single conditional
Markov random field. Once estimated, it serves as a probabilistic database that supports various useful queries such as rating prediction and top-N recommendation. To handle the challenging problem of learning large networks of users and items, we employ a simple but effective pseudo-likelihood with regularisation. Experiments on the movie rating data demonstrate the merits of the PN.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Learning preference models from human generated data is an important task in modern information processing systems. Its popular setting consists of simple input ratings, assigned with numerical values to indicate their relevancy with respect to a specific query. Since ratings are often specified within a small range, several objects may have the same ratings, thus creating ties among objects for a given query. Dealing with this phenomena presents a general problem of modelling preferences in the presence of ties and being query-specific. To this end, we present in this paper a novel approach by constructing probabilistic models directly on the collection of objects exploiting the combinatorial structure induced by the ties among them. The proposed probabilistic setting allows exploration of a super-exponential combinatorial state-space with unknown numbers of partitions and unknown order among them. Learning and inference in such a large state-space are challenging, and yet we present in this paper efficient algorithms to perform these tasks. Our approach exploits discrete choice theory, imposing generative process such that the finite set of objects is partitioned into subsets in a stagewise procedure, and thus reducing the state-space at each stage significantly. Efficient Markov chain Monte Carlo algorithms are then presented for the proposed models. We demonstrate that the model can potentially be trained in a large-scale setting of hundreds of thousands objects using an ordinary computer. In fact, in some special cases with appropriate model specification, our models can be learned in linear time. We evaluate the models on two application areas: (i) document ranking with the data from the Yahoo! challenge and (ii) collaborative filtering with movie data. We demonstrate that the models are competitive against state-of-the-arts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

‘Flexible learning’ represents a need associated with ‘lifelong learning’ and the equipping of graduates to actively engage in a ‘knowledge society’. While the precise meaning of each of these terms is not easy to discern, notions of flexible learning have progressed an evolutionary path that concentrates on students as though they are the only stakeholder group in the higher education environment that would benefit from choice. Academic discourse also presumes that all cultural groups making up the increasingly diverse student population aspire to engage in student-centred learning as a precursor to involvement in a knowledge economy. In this environment academics have been encouraged to embrace online teaching and promote a more student-centred learning approach when the natural inclination and talent of many academics may make this style of pedagogy so challenging that learning outcomes are compromised. We question this ‘one size fits all’ mentality and suggest a model that empowers both the students and academics by allowing them the ability to choose the approach that suits their educational philosophy and preferred learning/teaching approach. The model represents an innovation in flexibility that recognises initial embedded learning foundation abilities and reaches both teachers and learners by utilising their own frames of reference.