904 resultados para Random coefficient multinomial logit
Resumo:
Comunicación presentada en CIDUI 2010, Congreso Internacional Docencia Universitaria e Innovación, Barcelona, 30 junio-2 julio 2010.
Resumo:
In a context of intense competition, cooperative advertising between firms is critical. Accordingly, the objective of this article is to analyze the potential differentiated effect of advertising on two basic consumption patterns: individual products (i.e. hotel, restaurant) vs. bundle (i.e. hotel + restaurant). This research adds to the extant literature in that, for the first time, this potential differentiated effect is examined through a hierarchical modelling framework that reflects the way people make their decisions: first, they decide whether to visit or not a region; second, whether to purchase an advertised product in that region; and third, whether to buy products together or separately at the region. The empirical analysis, applied to a sample of 11,288 individuals, shows that the influence of advertising is positive for the decisions to visit and to purchase; however, when it comes to the joint or separate consumption, advertising has a differentiated effect: its impact is much greater on the joint alternative (“hotel + restaurant”) than the separate options (“hotel” and “restaurant”). Also, the variable distance moderates the advertising effect.
Resumo:
Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
The aims of this study were (a) to assess the ability of the rating of perceived exertion (RPE) to predict performance (i.e. number of vertical jumps performed to a fixed jump height) of an intermittent vertical jump exercise, and (b) to determine the ability of RPE to describe the physiological demand of such exercise. Eight healthy men performed intermittent vertical jumps with rest periods of 4, 5, and 6s until fatigue. Heart rate and RPE were recorded every five jumps throughout the sessions. The number of vertical jumps performed was also recorded. Random coefficient growth curve analysis identified relationships between the number of vertical jumps and both RPE and heart rate for which there were similar slopes. In addition, there were no differences between individual slopes and the mean slope for either RPE or heart rate. Moreover, RPE and number of jumps were highly correlated throughout all sessions (r=0.97-0.99; P0.001), as were RPE and heart rate (r=0.93-0.97; P0.001). The findings suggest that RPE can both predict the performance of intermittent vertical jump exercise and describe the physiological demands of such exercise.
Resumo:
Purpose: To evaluate rates of visual field progression in eyes with optic disc hemorrhages and the effect of intraocular pressure (IOP) reduction on these rates. Design: Observational cohort study. Participants: The study included 510 eyes of 348 patients with glaucoma who were recruited from the Diagnostic Innovations in Glaucoma Study (DIGS) and followed for an average of 8.2 years. Methods: Eyes were followed annually with clinical examination, standard automated perimetry visual fields, and optic disc stereophotographs. The presence of optic disc hemorrhages was determined on the basis of masked evaluation of optic disc stereophotographs. Evaluation of rates of visual field change during follow-up was performed using the visual field index (VFI). Main Outcome Measures: The evaluation of the effect of optic disc hemorrhages on rates of visual field progression was performed using random coefficient models. Estimates of rates of change for individual eyes were obtained by best linear unbiased prediction (BLUP). Results: During follow-up, 97 (19%) of the eyes had at least 1 episode of disc hemorrhage. The overall rate of VFI change in eyes with hemorrhages was significantly faster than in eyes without hemorrhages (-0.88%/year vs. -0.38%/year, respectively, P < 0.001). The difference in rates of visual field loss pre- and post-hemorrhage was significantly related to the reduction of IOP in the post-hemorrhage period compared with the pre-hemorrhage period (r = -0.61; P < 0.001). Each 1 mmHg of IOP reduction was associated with a difference of 0.31%/year in the rate of VFI change. Conclusions: There was a beneficial effect of treatment in slowing rates of progressive visual field loss in eyes with optic disc hemorrhage. Further research should elucidate the reasons why some patients with hemorrhages respond well to IOP reduction and others seem to continue to progress despite a significant reduction in IOP levels. Financial Disclosure(s): Proprietary or commercial disclosure may be found after the references. Ophthalmology 2010; 117: 2061-2066 (C) 2010 by the American Academy of Ophthalmology.
Resumo:
Questionnaire surveys, while more economical, typically achieve poorer response rates than interview surveys. We used data from a national volunteer cohort of young adult twins, who were scheduled for assessment by questionnaire in 1989 and by interview in 1996-2000, to identify predictors of questionnaire non-response. Out of a total of 8536 twins, 5058 completed the questionnaire survey (59% response rate), and 6255 completed a telephone interview survey conducted a decade later (73% response rate). Multinomial logit models were fitted to the interview data to identify socioeconomic, psychiatric and health behavior correlates of non-response in the earlier questionnaire survey. Male gender, education below University level, and being a dizygotic rather than monozygotic twin, all predicted reduced likelihood of participating in the questionnaire survey. Associations between questionnaire response status and psychiatric history and health behavior variables were modest, with history of alcohol dependence and childhood conduct disorder predicting decreased probability of returning a questionnaire, and history of smoking and heavy drinking more weakly associated with non-response. Body-mass index showed no association with questionnaire non-response. Despite a poor response rate to the self-report questionnaire survey, we found only limited sampling biases for most variables. While not appropriate for studies where socioeconomic variables are critical, it appears that survey by questionnaire, with questionnaire administration by telephone to non-responders, will represent a viable strategy for gene-mapping studies requiring that large numbers of relatives be screened.
Resumo:
Tese de Doutoramento, Ciências Económicas e Empresariais (Desenvolvimento Económico e Social e Economia Pública), 16 de Janeiro de 2014, Universidade dos Açores.
Resumo:
experimental design, mixed model, random coefficient regression model, population pharmacokinetics, approximate design
Resumo:
This paper assesses empirically the importance of size discrimination and disaggregate data for deciding where to locate a start-up concern. We compare three econometric specifications using Catalan data: a multinomial logit with 4 and 41 alternatives (provinces and comarques, respectively) in which firm size is the main covariate; a conditional logit with 4 and 41 alternatives including attributes of the sites as well as size-site interactions; and a Poisson model on the comarques and the full spatial choice set (942 municipalities) with site-specific variables. Our results suggest that if these two issues are ignored, conclusions may be misleading. We provide evidence that large and small firms behave differently and conclude that Catalan firms tend to choose between comarques rather than between municipalities. Moreover, labour-intensive firms seem more likely to be located in the city of Barcelona. Keywords: Catalonia, industrial location, multinomial response model. JEL: C250, E30, R00, R12
Resumo:
The contributions of this paper are twofold: On the one hand, the paper analyses the factors determining the growth in car ownership in Spain over the last two decades, and, on the other, the paper provides empirical evidence for a controversial methodological issue. From a methodological point of view, the paper compares the two alternative decision mechanisms used for modelling car ownership: ordered-response versus unordered-response mechanisms. A discrete choice model is estimated at three points in time: 1980, 1990 and 2000. The study concludes that on the basis of forecasting performance, the multinomial logit model and the ordered probit model are almost undistinguishable. As for the empirical results, it can be emphasised that income elasticity is not constant and declines as car ownership increases. Besides, households living in rural areas are less sensitive than those living in urban areas. Car ownership is also sensitive to the quality of public transport for those living in the largest cities. The results also confirmed the existence of a generation effect, which will vanish around the year 2020, a weak life-cycle effect, and a positive effect of employment on the number of cars per household. Finally, the change in the estimated coefficients over time reflects an increase in mobility needs and, consequently, an increase in car ownership.
Resumo:
Following major reforms of the British National Health Service (NHS) in 1990, the roles of purchasing and providing health services were separated, with the relationship between purchasers and providers governed by contracts. Using a mixed multinomial logit analysis, we show how this policy shift led to a selection of contracts that is consistent with the predictions of a simple model, based on contract theory, in which the characteristics of the health services being purchased and of the contracting parties influence the choice of contract form. The paper thus provides evidence in support of the practical relevance of theory in understanding health care market reform.
Resumo:
This paper develop and estimates a model of demand estimation for environmental public goods which allows for consumers to learn about their preferences through consumption experiences. We develop a theoretical model of Bayesian updating, perform comparative statics over the model, and show how the theoretical model can be consistently incorporated into a reduced form econometric model. We then estimate the model using data collected for two environmental goods. We find that the predictions of the theoretical exercise that additional experience makes consumers more certain over their preferences in both mean and variance are supported in each case.
Resumo:
The application of compositional data analysis through log ratio trans-formations corresponds to a multinomial logit model for the shares themselves.This model is characterized by the property of Independence of Irrelevant Alter-natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactlythis invariance of the ratio that underlies the commonly used zero replacementprocedure in compositional data analysis. In this paper we investigate using thenested logit model that does not embody IIA and an associated zero replacementprocedure and compare its performance with that of the more usual approach ofusing the multinomial logit model. Our comparisons exploit a data set that com-bines voting data by electoral division with corresponding census data for eachdivision for the 2001 Federal election in Australia