961 resultados para Probabilistic choice models


Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

For climate risk management, cumulative distribution functions (CDFs) are an important source of information. They are ideally suited to compare probabilistic forecasts of primary (e.g. rainfall) or secondary data (e.g. crop yields). Summarised as CDFs, such forecasts allow an easy quantitative assessment of possible, alternative actions. Although the degree of uncertainty associated with CDF estimation could influence decisions, such information is rarely provided. Hence, we propose Cox-type regression models (CRMs) as a statistical framework for making inferences on CDFs in climate science. CRMs were designed for modelling probability distributions rather than just mean or median values. This makes the approach appealing for risk assessments where probabilities of extremes are often more informative than central tendency measures. CRMs are semi-parametric approaches originally designed for modelling risks arising from time-to-event data. Here we extend this original concept beyond time-dependent measures to other variables of interest. We also provide tools for estimating CDFs and surrounding uncertainty envelopes from empirical data. These statistical techniques intrinsically account for non-stationarities in time series that might be the result of climate change. This feature makes CRMs attractive candidates to investigate the feasibility of developing rigorous global circulation model (GCM)-CRM interfaces for provision of user-relevant forecasts. To demonstrate the applicability of CRMs, we present two examples for El Ni ? no/Southern Oscillation (ENSO)-based forecasts: the onset date of the wet season (Cairns, Australia) and total wet season rainfall (Quixeramobim, Brazil). This study emphasises the methodological aspects of CRMs rather than discussing merits or limitations of the ENSO-based predictors.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In restructured power systems, generation and commercialization activities became market activities, while transmission and distribution activities continue as regulated monopolies. As a result, the adequacy of transmission network should be evaluated independent of generation system. After introducing the constrained fuzzy power flow (CFPF) as a suitable tool to quantify the adequacy of transmission network to satisfy 'reasonable demands for the transmission of electricity' (as stated, for instance, at European Directive 2009/72/EC), the aim is now showing how this approach can be used in conjunction with probabilistic criteria in security analysis. In classical security analysis models of power systems are considered the composite system (generation plus transmission). The state of system components is usually modeled with probabilities and loads (and generation) are modeled by crisp numbers, probability distributions or fuzzy numbers. In the case of CFPF the component’s failure of the transmission network have been investigated. In this framework, probabilistic methods are used for failures modeling of the transmission system components and possibility models are used to deal with 'reasonable demands'. The enhanced version of the CFPF model is applied to an illustrative case.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Smart State initiative requires both improved education and training, panicularly in technical fields, plus entrepreneurship to commercialise new ideas. In this study, we propose an entrepreneurial intentions model as a guide to examine the educational choices and entrepreneurial intentions of first-year University students, focusing on the effect of role models. A survey of over 1000 first-year University students revealed that the most enterprising students were choosing to study in the disciplines of information technology and business, economics and law, or selecting dualdegree programs that include business. The role models most often identified for their choice of field of study were parents,followed by teachers and peers, with females identifying more role models than males. For entrepreneurship, students' role models were parents andpeers,followed by famous persons and teachers. Males and females identified similar numbers of role models, but males found starting a business more desirable and more feasible, and reponed higher entrepreneurial intention. The implications of these findings for Smart State policy are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Smart State initiative requires both improved education and training, particularly in technical fields, plus entrepreneurship to commercialise new ideas. In this study, we propose an entrepreneurial intentions model as a guide to examine the educational choices and entrepreneurial intentions of first-year University students, focusing on the effect of role models. A survey of over 1000 first -year University students revealed that the most enterprising students were choosing to study in the disciplines of information technology and business, economics and law, or selecting dual degree programs that include business. The role models most often identified for their choice of field of study were parents, followed by teachers and peers, wish females identifying more role models than males. For entrepreneurship, students' role models were parents and peers, followed by famous persons and teachers. Males and females identified similar numbers of role models, but males found starting a business more desirable and more feasible, and reported higher entrepreneurial intention. The implications of these findings for Smart State policy are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Studies have examined the associations between cancers and circulating 25-hydroxyvitamin D [25(OH)D], but little is known about the impact of different laboratory practices on 25(OH)D concentrations. We examined the potential impact of delayed blood centrifuging, choice of collection tube, and type of assay on 25(OH)D concentrations. Blood samples from 20 healthy volunteers underwent alternative laboratory procedures: four centrifuging times (2, 24, 72, and 96 h after blood draw); three types of collection tubes (red top serum tube, two different plasma anticoagulant tubes containing heparin or EDTA); and two types of assays (DiaSorin radioimmunoassay [RIA] and chemiluminescence immunoassay [CLIA/LIAISON®]). Log-transformed 25(OH)D concentrations were analyzed using the generalized estimating equations (GEE) linear regression models. We found no difference in 25(OH)D concentrations by centrifuging times or type of assay. There was some indication of a difference in 25(OH)D concentrations by tube type in CLIA/LIAISON®-assayed samples, with concentrations in heparinized plasma (geometric mean, 16.1 ng ml−1) higher than those in serum (geometric mean, 15.3 ng ml−1) (p = 0.01), but the difference was significant only after substantial centrifuging delays (96 h). Our study suggests no necessity for requiring immediate processing of blood samples after collection or for the choice of a tube type or assay.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To navigate successfully in a novel environment a robot needs to be able to Simultaneously Localize And Map (SLAM) its surroundings. The most successful solutions to this problem so far have involved probabilistic algorithms, but there has been much promising work involving systems based on the workings of part of the rodent brain known as the hippocampus. In this paper we present a biologically plausible system called RatSLAM that uses competitive attractor networks to carry out SLAM in a probabilistic manner. The system can effectively perform parameter self-calibration and SLAM in one dimension. Tests in two dimensional environments revealed the inability of the RatSLAM system to maintain multiple pose hypotheses in the face of ambiguous visual input. These results support recent rat experimentation that suggest current competitive attractor models are not a complete solution to the hippocampal modelling problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have developed a new experimental method for interrogating statistical theories of music perception by implementing these theories as generative music algorithms. We call this method Generation in Context. This method differs from most experimental techniques in music perception in that it incorporates aesthetic judgments. Generation In Context is designed to measure percepts for which the musical context is suspected to play an important role. In particular the method is suitable for the study of perceptual parameters which are temporally dynamic. We outline a use of this approach to investigate David Temperley’s (2007) probabilistic melody model, and provide some provisional insights as to what is revealed about the model. We suggest that Temperley’s model could be improved by dynamically modulating the probability distributions according to the changing musical context.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Typical daily decision-making process of individuals regarding use of transport system involves mainly three types of decisions: mode choice, departure time choice and route choice. This paper focuses on the mode and departure time choice processes and studies different model specifications for a combined mode and departure time choice model. The paper compares different sets of explanatory variables as well as different model structures to capture the correlation among alternatives and taste variations among the commuters. The main hypothesis tested in this paper is that departure time alternatives are also correlated by the amount of delay. Correlation among different alternatives is confirmed by analyzing different nesting structures as well as error component formulations. Random coefficient logit models confirm the presence of the random taste heterogeneity across commuters. Mixed nested logit models are estimated to jointly account for the random taste heterogeneity and the correlation among different alternatives. Results indicate that accounting for the random taste heterogeneity as well as inter-alternative correlation improves the model performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

System analysis within the traction power system is vital to the design and operation of an electrified railway. Loads in traction power systems are often characterised by their mobility, wide range of power variations, regeneration and service dependence. In addition, the feeding systems may take different forms in AC electrified railways. Comprehensive system studies are usually carried out by computer simulation. A number of traction power simulators have been available and they allow calculation of electrical interaction among trains and deterministic solutions of the power network. In the paper, a different approach is presented to enable load-flow analysis on various feeding systems and service demands in AC railways by adopting probabilistic techniques. It is intended to provide a different viewpoint to the load condition. Simulation results are given to verify the probabilistic-load-flow models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Governments around the world are facing the challenge of responding to increased expectations by their customers with regard to public service delivery. Citizens, for example, expect governments to provide better and more efficient electronic services on the Web in an integrated way. Online portals have become the approach of choice in online service delivery to meet these requirements and become more customer-focussed. This study describes and analyses existing variants of online service delivery models based upon an empirical study and provides valuable insights for researchers and practitioners in government. For this study, we have conducted interviews with senior management representatives from five international governments. Based on our findings, we distinguish three different classes of service delivery models. We describe and characterise each of these models in detail and provide an in-depth discussion of the strengths and weaknesses of these approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper is to demonstrate the validity of using Gaussian mixture models (GMM) for representing probabilistic distributions in a decentralised data fusion (DDF) framework. GMMs are a powerful and compact stochastic representation allowing efficient communication of feature properties in large scale decentralised sensor networks. It will be shown that GMMs provide a basis for analytical solutions to the update and prediction operations for general Bayesian filtering. Furthermore, a variant on the Covariance Intersect algorithm for Gaussian mixtures will be presented ensuring a conservative update for the fusion of correlated information between two nodes in the network. In addition, purely visual sensory data will be used to show that decentralised data fusion and tracking of non-Gaussian states observed by multiple autonomous vehicles is feasible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Comparison are required to understand transport benefits of Transit Oriented Developments (TODs). Mode shares of TOD users need to be understood. Accurate travel demand models for TODs are needed.