965 resultados para Probabilistic choice models


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In order to achieve to minimize car-based trips, transport planners have been particularly interested in understanding the factors that explain modal choices. In the transport modelling literature there has been an increasing awareness that socioeconomic attributes and quantitative variables are not sufficient to characterize travelers and forecast their travel behavior. Recent studies have also recognized that users? social interactions and land use patterns influence travel behavior, especially when changes to transport systems are introduced, but links between international and Spanish perspectives are rarely deal. In this paper, factorial and path analyses through a Multiple-Indicator Multiple-Cause (MIMIC) model are used to understand and describe the relationship between the different psychological and environmental constructs with social influence and socioeconomic variables. The MIMIC model generates Latent Variables (LVs) to be incorporated sequentially into Discrete Choice Models (DCM) where the levels of service and cost attributes of travel modes are also included directly to measure the effect of the transport policies that have been introduced in Madrid during the last three years in the context of the economic crisis. The data used for this paper are collected from a two panel smartphone-based survey (n=255 and 190 respondents, respectively) of Madrid.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Probabilistic graphical models are a huge research field in artificial intelligence nowadays. The scope of this work is the study of directed graphical models for the representation of discrete distributions. Two of the main research topics related to this area focus on performing inference over graphical models and on learning graphical models from data. Traditionally, the inference process and the learning process have been treated separately, but given that the learned models structure marks the inference complexity, this kind of strategies will sometimes produce very inefficient models. With the purpose of learning thinner models, in this master thesis we propose a new model for the representation of network polynomials, which we call polynomial trees. Polynomial trees are a complementary representation for Bayesian networks that allows an efficient evaluation of the inference complexity and provides a framework for exact inference. We also propose a set of methods for the incremental compilation of polynomial trees and an algorithm for learning polynomial trees from data using a greedy score+search method that includes the inference complexity as a penalization in the scoring function.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bayesian network classifiers are widely used in machine learning because they intuitively represent causal relations. Multi-label classification problems require each instance to be assigned a subset of a defined set of h labels. This problem is equivalent to finding a multi-valued decision function that predicts a vector of h binary classes. In this paper we obtain the decision boundaries of two widely used Bayesian network approaches for building multi-label classifiers: Multi-label Bayesian network classifiers built using the binary relevance method and Bayesian network chain classifiers. We extend our previous single-label results to multi-label chain classifiers, and we prove that, as expected, chain classifiers provide a more expressive model than the binary relevance method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the most promising areas in which probabilistic graphical models have shown an incipient activity is the field of heuristic optimization and, in particular, in Estimation of Distribution Algorithms. Due to their inherent parallelism, different research lines have been studied trying to improve Estimation of Distribution Algorithms from the point of view of execution time and/or accuracy. Among these proposals, we focus on the so-called distributed or island-based models. This approach defines several islands (algorithms instances) running independently and exchanging information with a given frequency. The information sent by the islands can be either a set of individuals or a probabilistic model. This paper presents a comparative study for a distributed univariate Estimation of Distribution Algorithm and a multivariate version, paying special attention to the comparison of two alternative methods for exchanging information, over a wide set of parameters and problems ? the standard benchmark developed for the IEEE Workshop on Evolutionary Algorithms and other Metaheuristics for Continuous Optimization Problems of the ISDA 2009 Conference. Several analyses from different points of view have been conducted to analyze both the influence of the parameters and the relationships between them including a characterization of the configurations according to their behavior on the proposed benchmark.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El aprendizaje automático y la cienciometría son las disciplinas científicas que se tratan en esta tesis. El aprendizaje automático trata sobre la construcción y el estudio de algoritmos que puedan aprender a partir de datos, mientras que la cienciometría se ocupa principalmente del análisis de la ciencia desde una perspectiva cuantitativa. Hoy en día, los avances en el aprendizaje automático proporcionan las herramientas matemáticas y estadísticas para trabajar correctamente con la gran cantidad de datos cienciométricos almacenados en bases de datos bibliográficas. En este contexto, el uso de nuevos métodos de aprendizaje automático en aplicaciones de cienciometría es el foco de atención de esta tesis doctoral. Esta tesis propone nuevas contribuciones en el aprendizaje automático que podrían arrojar luz sobre el área de la cienciometría. Estas contribuciones están divididas en tres partes: Varios modelos supervisados (in)sensibles al coste son aprendidos para predecir el éxito científico de los artículos y los investigadores. Los modelos sensibles al coste no están interesados en maximizar la precisión de clasificación, sino en la minimización del coste total esperado derivado de los errores ocasionados. En este contexto, los editores de revistas científicas podrían disponer de una herramienta capaz de predecir el número de citas de un artículo en el fututo antes de ser publicado, mientras que los comités de promoción podrían predecir el incremento anual del índice h de los investigadores en los primeros años. Estos modelos predictivos podrían allanar el camino hacia nuevos sistemas de evaluación. Varios modelos gráficos probabilísticos son aprendidos para explotar y descubrir nuevas relaciones entre el gran número de índices bibliométricos existentes. En este contexto, la comunidad científica podría medir cómo algunos índices influyen en otros en términos probabilísticos y realizar propagación de la evidencia e inferencia abductiva para responder a preguntas bibliométricas. Además, la comunidad científica podría descubrir qué índices bibliométricos tienen mayor poder predictivo. Este es un problema de regresión multi-respuesta en el que el papel de cada variable, predictiva o respuesta, es desconocido de antemano. Los índices resultantes podrían ser muy útiles para la predicción, es decir, cuando se conocen sus valores, el conocimiento de cualquier valor no proporciona información sobre la predicción de otros índices bibliométricos. Un estudio bibliométrico sobre la investigación española en informática ha sido realizado bajo la cultura de publicar o morir. Este estudio se basa en una metodología de análisis de clusters que caracteriza la actividad en la investigación en términos de productividad, visibilidad, calidad, prestigio y colaboración internacional. Este estudio también analiza los efectos de la colaboración en la productividad y la visibilidad bajo diferentes circunstancias. ABSTRACT Machine learning and scientometrics are the scientific disciplines which are covered in this dissertation. Machine learning deals with the construction and study of algorithms that can learn from data, whereas scientometrics is mainly concerned with the analysis of science from a quantitative perspective. Nowadays, advances in machine learning provide the mathematical and statistical tools for properly working with the vast amount of scientometrics data stored in bibliographic databases. In this context, the use of novel machine learning methods in scientometrics applications is the focus of attention of this dissertation. This dissertation proposes new machine learning contributions which would shed light on the scientometrics area. These contributions are divided in three parts: Several supervised cost-(in)sensitive models are learned to predict the scientific success of articles and researchers. Cost-sensitive models are not interested in maximizing classification accuracy, but in minimizing the expected total cost of the error derived from mistakes in the classification process. In this context, publishers of scientific journals could have a tool capable of predicting the citation count of an article in the future before it is published, whereas promotion committees could predict the annual increase of the h-index of researchers within the first few years. These predictive models would pave the way for new assessment systems. Several probabilistic graphical models are learned to exploit and discover new relationships among the vast number of existing bibliometric indices. In this context, scientific community could measure how some indices influence others in probabilistic terms and perform evidence propagation and abduction inference for answering bibliometric questions. Also, scientific community could uncover which bibliometric indices have a higher predictive power. This is a multi-output regression problem where the role of each variable, predictive or response, is unknown beforehand. The resulting indices could be very useful for prediction purposes, that is, when their index values are known, knowledge of any index value provides no information on the prediction of other bibliometric indices. A scientometric study of the Spanish computer science research is performed under the publish-or-perish culture. This study is based on a cluster analysis methodology which characterizes the research activity in terms of productivity, visibility, quality, prestige and international collaboration. This study also analyzes the effects of collaboration on productivity and visibility under different circumstances.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El papel del precio en el sector turístico es especialmente complejo debido a la heterogeneidad existente entre los turistas y, por tanto, a las distintas sensibilidades al precio que muestran. En este sentido, el presente trabajo propone la utilización de modelos de elección discreta para identificar las sensibilidades individuales, turista a turista, y, a continuación, utilizar dichas estimaciones como punto de partida para detectar grupos de turistas con una respuesta similar a los precios. La aplicación empírica realizada en el contexto de la Comunidad Valenciana permite detectar tres segmentos: turistas de precio bajo, turistas indiferentes al precio y turistas de precio alto.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A theory of value sits at the core of every school of economic thought and directs the allocation of resources to competing uses. Ecological resources complicate the modem neoclassical approach to determining value due to their complex nature, considerable non-market values and the difficulty in assigning property rights. Application of the market model through economic valuation only provides analytical solutions based on virtual markets, and neither the demand nor supply-side techniques of valuation can adequately consider the complex set of biophysical and ecological relations that lead to the provision of ecosystem goods and services. This paper sets out a conceptual framework for a complex systems approach to the value of ecological resources. This approach is based on there being both an intrinsic quality of ecological resources and a subjective evaluation by the consumer. Both elements are necessary for economic value. This conceptual framework points the way towards a theory of value that incorporates both elements, so has implications for principles by which ecological resources can be allocated. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The goal of this study was to develop Multinomial Logit models for the mode choice behavior of immigrants, with key focuses on neighborhood effects and behavioral assimilation. The first aspect shows the relationship between social network ties and immigrants’ chosen mode of transportation, while the second aspect explores the gradual changes toward alternative mode usage with regard to immigrants’ migrating period in the United States (US). Mode choice models were developed for work, shopping, social, recreational, and other trip purposes to evaluate the impacts of various land use patterns, neighborhood typology, socioeconomic-demographic and immigrant related attributes on individuals’ travel behavior. Estimated coefficients of mode choice determinants were compared between each alternative mode (i.e., high-occupancy vehicle, public transit, and non-motorized transport) with single-occupant vehicles. The model results revealed the significant influence of neighborhood and land use variables on the usage of alternative modes among immigrants. Incorporating these indicators into the demand forecasting process will provide a better understanding of the diverse travel patterns for the unique composition of population groups in Florida.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

My dissertation has three chapters which develop and apply microeconometric tech- niques to empirically relevant problems. All the chapters examines the robustness issues (e.g., measurement error and model misspecification) in the econometric anal- ysis. The first chapter studies the identifying power of an instrumental variable in the nonparametric heterogeneous treatment effect framework when a binary treat- ment variable is mismeasured and endogenous. I characterize the sharp identified set for the local average treatment effect under the following two assumptions: (1) the exclusion restriction of an instrument and (2) deterministic monotonicity of the true treatment variable in the instrument. The identification strategy allows for general measurement error. Notably, (i) the measurement error is nonclassical, (ii) it can be endogenous, and (iii) no assumptions are imposed on the marginal distribution of the measurement error, so that I do not need to assume the accuracy of the measure- ment. Based on the partial identification result, I provide a consistent confidence interval for the local average treatment effect with uniformly valid size control. I also show that the identification strategy can incorporate repeated measurements to narrow the identified set, even if the repeated measurements themselves are endoge- nous. Using the the National Longitudinal Study of the High School Class of 1972, I demonstrate that my new methodology can produce nontrivial bounds for the return to college attendance when attendance is mismeasured and endogenous.

The second chapter, which is a part of a coauthored project with Federico Bugni, considers the problem of inference in dynamic discrete choice problems when the structural model is locally misspecified. We consider two popular classes of estimators for dynamic discrete choice models: K-step maximum likelihood estimators (K-ML) and K-step minimum distance estimators (K-MD), where K denotes the number of policy iterations employed in the estimation problem. These estimator classes include popular estimators such as Rust (1987)’s nested fixed point estimator, Hotz and Miller (1993)’s conditional choice probability estimator, Aguirregabiria and Mira (2002)’s nested algorithm estimator, and Pesendorfer and Schmidt-Dengler (2008)’s least squares estimator. We derive and compare the asymptotic distributions of K- ML and K-MD estimators when the model is arbitrarily locally misspecified and we obtain three main results. In the absence of misspecification, Aguirregabiria and Mira (2002) show that all K-ML estimators are asymptotically equivalent regardless of the choice of K. Our first result shows that this finding extends to a locally misspecified model, regardless of the degree of local misspecification. As a second result, we show that an analogous result holds for all K-MD estimators, i.e., all K- MD estimator are asymptotically equivalent regardless of the choice of K. Our third and final result is to compare K-MD and K-ML estimators in terms of asymptotic mean squared error. Under local misspecification, the optimally weighted K-MD estimator depends on the unknown asymptotic bias and is no longer feasible. In turn, feasible K-MD estimators could have an asymptotic mean squared error that is higher or lower than that of the K-ML estimators. To demonstrate the relevance of our asymptotic analysis, we illustrate our findings using in a simulation exercise based on a misspecified version of Rust (1987) bus engine problem.

The last chapter investigates the causal effect of the Omnibus Budget Reconcil- iation Act of 1993, which caused the biggest change to the EITC in its history, on unemployment and labor force participation among single mothers. Unemployment and labor force participation are difficult to define for a few reasons, for example, be- cause of marginally attached workers. Instead of searching for the unique definition for each of these two concepts, this chapter bounds unemployment and labor force participation by observable variables and, as a result, considers various competing definitions of these two concepts simultaneously. This bounding strategy leads to partial identification of the treatment effect. The inference results depend on the construction of the bounds, but they imply positive effect on labor force participa- tion and negligible effect on unemployment. The results imply that the difference- in-difference result based on the BLS definition of unemployment can be misleading

due to misclassification of unemployment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este artículo de investigación científica y tecnológica estudia la percepción de seguridad en el uso de puentes peatonales, empleando un enfoque sustentado en dos campos principales: el microeconómico y el psicológico. El trabajo hace la estimación simultánea de un modelo híbrido de elección y variables latentes con datos de una encuesta de preferencias declaradas, encontrando mejor ajuste que un modelo mixto de referencia, lo que indica que la percepción de seguridad determina el comportamiento de los peatones cuando se enfrentan a la decisión de usar o no un puente peatonal. Se encontró que el sexo, la edad y el nivel de estudios son atributos que inciden en la percepción de seguridad. El modelo calibrado sugiere varias estrategias para aumentar el uso de puentes peatonales que son discutidas, encontrando que el uso de barreras ocasiona una pérdida de utilidad, en los peatones, que debería ser estudiada como extensión del presente trabajo.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Rational choice models argue that income inequality leads to a higher expected utility of crime and thus generates incentives to engage in illegal activities. Yet, the results of empirical studies do not provide strong support for this theory; in fact, Neumayer provides apparently strong evidence that income inequality is not a significant determinant of violent property crime rates when a representative sample is used and country specific fixed effects are controlled for. An important limitation of this and other empirical studies on the subject is that they only consider proportional income differences, even though in rational choice models absolute difference in legal and illegal incomes determine the expected utility of crime. Using the same methodology and data as Neumayer, but using absolute inequality measures rather than proportional ones, this paper finds that absolute income inequality is a statistically significant determinant of robbery and violent theft rates. This result is robust to changes in sample size and to different absolute inequality measures, which not only implies that inequality is an important correlate of violent property crime rates but also suggests that absolute measures are preferable when the impact of inequality on property crime is studied.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation investigates customer behavior modeling in service outsourcing and revenue management in the service sector (i.e., airline and hotel industries). In particular, it focuses on a common theme of improving firms’ strategic decisions through the understanding of customer preferences. Decisions concerning degrees of outsourcing, such as firms’ capacity choices, are important to performance outcomes. These choices are especially important in high-customer-contact services (e.g., airline industry) because of the characteristics of services: simultaneity of consumption and production, and intangibility and perishability of the offering. Essay 1 estimates how outsourcing affects customer choices and market share in the airline industry, and consequently the revenue implications from outsourcing. However, outsourcing decisions are typically endogenous. A firm may choose whether to outsource or not based on what a firm expects to be the best outcome. Essay 2 contributes to the literature by proposing a structural model which could capture a firm’s profit-maximizing decision-making behavior in a market. This makes possible the prediction of consequences (i.e., performance outcomes) of future strategic moves. Another emerging area in service operations management is revenue management. Choice-based revenue systems incorporate discrete choice models into traditional revenue management algorithms. To successfully implement a choice-based revenue system, it is necessary to estimate customer preferences as a valid input to optimization algorithms. The third essay investigates how to estimate customer preferences when part of the market is consistently unobserved. This issue is especially prominent in choice-based revenue management systems. Normally a firm only has its own observed purchases, while those customers who purchase from competitors or do not make purchases are unobserved. Most current estimation procedures depend on unrealistic assumptions about customer arriving. This study proposes a new estimation methodology, which does not require any prior knowledge about the customer arrival process and allows for arbitrary demand distributions. Compared with previous methods, this model performs superior when the true demand is highly variable.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Studies have demonstrated that public policies to support private firms’ investment have the ability to promote entrepreneurship, but the sustainability of subsidized firms has not often been analysed. This paper aims to examine this dimension specifically through evaluating the mortality of subsidized firms in the long-term. The analysis focuses on a case study of the LEADER+ Programme in the Alentejo region of Portugal. With this purpose, the paper examines the activity status (active or not active) of 154 private, rural, for-profit firms in Alentejo that had received a subsidy to support investment between 2002 and 2008 under the LEADER+ Programme. The methodology is based on binary choice models in order to study the probability of these firms still being active. The explanatory variables used are the following: (1) the characteristics of entrepreneurs and managers’ strategic decisions, (2) firm profile and characteristics, (3) regional economic environment. Data assessment showed that the cumulative mortality rate of firms on 31st December 2013 is over 20 %. Interpretation of the regression model revealed that he probability of firms’ survival increases with higher investment, firm age and regional business concentration, whereas the number of applications made by firms has a negative impact on their survival. So it seems that for subsidized firms the amount of investment is as important as its frequency.