956 resultados para Multinomial logit models with random coefficients (RCL)
Resumo:
Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.
Resumo:
Bayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.
Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.
Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with
little or no prior knowledge
Resumo:
The increasing nationwide interest in intelligent transportation systems (ITS) and the need for more efficient transportation have led to the expanding use of variable message sign (VMS) technology. VMS panels are substantially heavier than flat panel aluminum signs and have a larger depth (dimension parallel to the direction of traffic). The additional weight and depth can have a significant effect on the aerodynamic forces and inertial loads transmitted to the support structure. The wind induced drag forces and the response of VMS structures is not well understood. Minimum design requirements for VMS structures are contained in the American Association of State Highway Transportation Officials Standard Specification for Structural Support for Highway Signs, Luminaires, and Traffic Signals (AASHTO Specification). However the Specification does not take into account the prismatic geometry of VMS and the complex interaction of the applied aerodynamic forces to the support structure. In view of the lack of code guidance and the limited number research performed so far, targeted experimentation and large scale testing was conducted at the Florida International University (FIU) Wall of Wind (WOW) to provide reliable drag coefficients and investigate the aerodynamic instability of VMS. A comprehensive range of VMS geometries was tested in turbulence representative of the high frequency end of the spectrum in a simulated suburban atmospheric boundary layer. The mean normal, lateral and vertical lift force coefficients, in addition to the twisting moment coefficient and eccentricity ratio, were determined using the measured data for each model. Wind tunnel testing confirmed that drag on a prismatic VMS is smaller than the 1.7 suggested value in the current AASHTO Specification (2013). An alternative to the AASHTO Specification code value is presented in the form of a design matrix. Testing and analysis also indicated that vortex shedding oscillations and galloping instability could be significant for VMS signs with a large depth ratio attached to a structure with a low natural frequency. The effect of corner modification was investigated by testing models with chamfered and rounded corners. Results demonstrated an additional decrease in the drag coefficient but a possible Reynolds number dependency for the rounded corner configuration.
Resumo:
The early Pliocene warm phase was characterized by high sea surface temperatures and a deep thermocline in the eastern equatorial Pacific. A new hypothesis suggests that the progressive closure of the Panamanian seaway contributed substantially to the termination of this zonally symmetric state in the equatorial Pacific. According to this hypothesis, intensification of the Atlantic meridional overturning circulation (AMOC) - induced by the closure of the gateway - was the principal cause of equatorial Pacific thermocline shoaling during the Pliocene. In this study, twelve Panama seaway sensitivity experiments from eight ocean/climate models of different complexity are analyzed to examine the effect of an open gateway on AMOC strength and thermocline depth. All models show an eastward Panamanian net throughflow, leading to a reduction in AMOC strength compared to the corresponding closed-Panama case. In those models that do not include a dynamic atmosphere, deepening of the equatorial Pacific thermocline appears to scale almost linearly with the throughflow-induced reduction in AMOC strength. Models with dynamic atmosphere do not follow this simple relation. There are indications that in four out of five models equatorial wind-stress anomalies amplify the tropical Pacific thermocline deepening. In summary, the models provide strong support for the hypothesized relationship between Panama closure and equatorial Pacific thermocline shoaling.
Resumo:
To learn complex skills, like collaboration, learners need to acquire a concrete and consistent mental model of what it means to master this skill. If learners know their current mastery level and know their targeted mastery level, they can better determine their subsequent learning activities. Rubrics support learners in judging their skill performance as they provide textual descriptions of skills’ mastery levels with performance indicators for all constituent subskills. However, text-based rubrics have a limited capacity to support the formation of mental models with contextualized, time-related and observable behavioral aspects of a complex skill. This paper outlines the design of a study that intends to investigate the effect of rubrics with video modelling examples compared to text-based rubrics on skills acquisition and feedback provisioning. The hypothesis is that video-enhanced rubrics, compared to text based rubrics, will improve mental model formation of a complex skill and improve the feedback quality a learner receives (from e.g. teachers, peers) while practicing a skill, hence positively effecting final mastery of a skill.
Resumo:
Recent studies have shown that cancer risk related to overweight and obesity is mediated by time and might be better approximated by using life years lived with excess weight. In this study we aimed to assess the impact of overweight duration and intensity in older adults on the risk of developing different forms of cancer. Study participants from seven European and one US cohort study with two or more weight assessments during follow-up were included (n = 329,576). Trajectories of body mass index (BMI) across ages were estimated using a quadratic growth model; overweight duration (BMI ≥ 25) and cumulative weighted overweight years were calculated. In multivariate Cox models and random effects analyses, a longer duration of overweight was significantly associated with the incidence of obesity-related cancer [overall hazard ratio (HR) per 10-year increment: 1.36; 95 % CI 1.12-1.60], but also increased the risk of postmenopausal breast and colorectal cancer. Additionally accounting for the degree of overweight further increased the risk of obesity-related cancer. Risks associated with a longer overweight duration were higher in men than in women and were attenuated by smoking. For postmenopausal breast cancer, increased risks were confined to women who never used hormone therapy. Overall, 8.4 % of all obesity-related cancers could be attributed to overweight at any age. These findings provide further insights into the role of overweight duration in the etiology of cancer and indicate that weight control is relevant at all ages. This knowledge is vital for the development of effective and targeted cancer prevention strategies.
Resumo:
Peer effects in adolescent cannabis are difficult to estimate, due in part to the lack of appropriate data on behaviour and social ties. This paper exploits survey data that have many desirable properties and have not previously been used for this purpose. The data set, collected from teenagers in three annual waves from 2002-2004 contains longitudinal information about friendship networks within schools (N = 5,020). We exploit these data on network structure to estimate peer effects on adolescents from their nominated friends within school using two alternative approaches to identification. First, we present a cross-sectional instrumental variable (IV) estimate of peer effects that exploits network structure at the second degree, i.e. using information on friends of friends who are not themselves ego’s friends to instrument for the cannabis use of friends. Second, we present an individual fixed effects estimate of peer effects using the full longitudinal structure of the data. Both innovations allow a greater degree of control for correlated effects than is commonly the case in the substance-use peer effects literature, improving our chances of obtaining estimates of peer effects than can be plausibly interpreted as causal. Both estimates suggest positive peer effects of non-trivial magnitude, although the IV estimate is imprecise. Furthermore, when we specify identical models with behaviour and characteristics of randomly selected school peers in place of friends’, we find effectively zero effect from these ‘placebo’ peers, lending credence to our main estimates. We conclude that cross-sectional data can be used to estimate plausible positive peer effects on cannabis use where network structure information is available and appropriately exploited.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
This work aims to investigate the relationship between the entrepreneurship and the incidence of bureaucratic corruption in the states of Brazil and Federal District. The main hypothesis of this study is that the opening of a business in Brazilian states is negatively affected by the incidence of corruption. The theoretical reference is divided into Entrepreneurship and bureaucratic corruption, with an emphasis on materialistic perspective (objectivist) of entrepreneurship and the effects of bureaucratic corruption on entrepreneurial activity. By the regression method with panel data, we estimated the models with pooled data and fixed and random effects. To measure corruption, I used the General Index of Corruption for the Brazilian states (BOLL, 2010), and to represent entrepreneurship, firm entry per capita by state. Tests (Chow, Hausman and Breusch-Pagan) indicate that the random effects model is more appropriate, and the preliminary results indicate a positive impact of bureaucratic corruption on entrepreneurial activity, contradicting the hypothesis expected and found in previous articles to Brazil, and corroborating the proposition of Dreher and Gassebner (2011) that, in countries with high regulation, bureaucratic corruption can be grease in the wheels of entrepreneurship
Resumo:
This paper presents a study of the effects of alcohol consumption on household income in Ireland using the Slán National Health and Lifestyle Survey 2007 dataset, accounting for endogeneity and selection bias. Drinkers are categorised into one of four categories based on the recommended weekly drinking levels by the Irish Health Promotion Unit; those who never drank, non-drinkers, moderate and heavy drinkers. A multinomial logit OLS Two Step Estimate is used to explain individual's choice of drinking status and to correct for selection bias which would result in the selection into a particular category of drinking being endogenous. Endogeneity which may arise through the simultaneity of drinking status and income either due to the reverse causation between the two variables, income affecting alcohol consumption or alcohol consumption affecting income, or due to unobserved heterogeneity, is addressed. This paper finds that the household income of drinkers is higher than that of non-drinkers and of those who never drank. There is very little difference between the household income of moderate and heavy drinkers, with heavy drinkers earning slightly more. Weekly household income for those who never drank is €454.20, non-drinkers is €506.26, compared with €683.36 per week for moderate drinkers and €694.18 for heavy drinkers.
Resumo:
Ce mémoire s’intéresse à l’étude du critère de validation croisée pour le choix des modèles relatifs aux petits domaines. L’étude est limitée aux modèles de petits domaines au niveau des unités. Le modèle de base des petits domaines est introduit par Battese, Harter et Fuller en 1988. C’est un modèle de régression linéaire mixte avec une ordonnée à l’origine aléatoire. Il se compose d’un certain nombre de paramètres : le paramètre β de la partie fixe, la composante aléatoire et les variances relatives à l’erreur résiduelle. Le modèle de Battese et al. est utilisé pour prédire, lors d’une enquête, la moyenne d’une variable d’intérêt y dans chaque petit domaine en utilisant une variable auxiliaire administrative x connue sur toute la population. La méthode d’estimation consiste à utiliser une distribution normale, pour modéliser la composante résiduelle du modèle. La considération d’une dépendance résiduelle générale, c’est-à-dire autre que la loi normale donne une méthodologie plus flexible. Cette généralisation conduit à une nouvelle classe de modèles échangeables. En effet, la généralisation se situe au niveau de la modélisation de la dépendance résiduelle qui peut être soit normale (c’est le cas du modèle de Battese et al.) ou non-normale. L’objectif est de déterminer les paramètres propres aux petits domaines avec le plus de précision possible. Cet enjeu est lié au choix de la bonne dépendance résiduelle à utiliser dans le modèle. Le critère de validation croisée sera étudié à cet effet.
Resumo:
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.
Resumo:
Neste trabalho obtém-se uma solução analítica para a equação de advecção-difusão aplicada a problemas de dispersão de poluentes em rios e canais. Para tanto, consideram-se os casos unidimensionais e bidimensionais em regime transiente com coeficientes de difusividade e velocidades constantes. A abordagem utilizada para a resolução deste problema é o método de Separação de Variáveis. Os modelos resolvidos foram simulados utilizando o MatLab. Apresentam-se os resultados das simulações numéricas em formato gráfico. Os resultados de algumas simulações numéricas existem na literatura e puderam ser comparados. O modelo proposto mostrou-se coerente em relação aos dados considerados. Para outras simulações não foram encontrados comparativos na literatura, todavia esses problemas governados por equações diferenciais parciais, mesmo lineares, não são de fácil solução analítica. Sendo que, muitas delas representam importantes problemas de matemática e física, com diversas aplicações na engenharia. Dessa forma, é de grande importância a disponibilidade de um maior número de problemas-teste para avaliação de desempenho de formulações numéricas, cada vez mais eficazes, já que soluções analíticas oferecem uma base mais segura para comparação de resultados.
Resumo:
The predictive capabilities of computational fire models have improved in recent years such that models have become an integral part of many research efforts. Models improve the understanding of the fire risk of materials and may decrease the number of expensive experiments required to assess the fire hazard of a specific material or designed space. A critical component of a predictive fire model is the pyrolysis sub-model that provides a mathematical representation of the rate of gaseous fuel production from condensed phase fuels given a heat flux incident to the material surface. The modern, comprehensive pyrolysis sub-models that are common today require the definition of many model parameters to accurately represent the physical description of materials that are ubiquitous in the built environment. Coupled with the increase in the number of parameters required to accurately represent the pyrolysis of materials is the increasing prevalence in the built environment of engineered composite materials that have never been measured or modeled. The motivation behind this project is to develop a systematic, generalized methodology to determine the requisite parameters to generate pyrolysis models with predictive capabilities for layered composite materials that are common in industrial and commercial applications. This methodology has been applied to four common composites in this work that exhibit a range of material structures and component materials. The methodology utilizes a multi-scale experimental approach in which each test is designed to isolate and determine a specific subset of the parameters required to define a material in the model. Data collected in simultaneous thermogravimetry and differential scanning calorimetry experiments were analyzed to determine the reaction kinetics, thermodynamic properties, and energetics of decomposition for each component of the composite. Data collected in microscale combustion calorimetry experiments were analyzed to determine the heats of complete combustion of the volatiles produced in each reaction. Inverse analyses were conducted on sample temperature data collected in bench-scale tests to determine the thermal transport parameters of each component through degradation. Simulations of quasi-one-dimensional bench-scale gasification tests generated from the resultant models using the ThermaKin modeling environment were compared to experimental data to independently validate the models.