991 resultados para online inference
Resumo:
Social media data are produced continuously by a large and uncontrolled number of users. The dynamic nature of such data requires the sentiment and topic analysis model to be also dynamically updated, capturing the most recent language use of sentiments and topics in text. We propose a dynamic Joint Sentiment-Topic model (dJST) which allows the detection and tracking of views of current and recurrent interests and shifts in topic and sentiment. Both topic and sentiment dynamics are captured by assuming that the current sentiment-topic-specific word distributions are generated according to the word distributions at previous epochs. We study three different ways of accounting for such dependency information: (1) Sliding window where the current sentiment-topic word distributions are dependent on the previous sentiment-topic-specific word distributions in the last S epochs; (2) skip model where history sentiment topic word distributions are considered by skipping some epochs in between; and (3) multiscale model where previous long- and shorttimescale distributions are taken into consideration. We derive efficient online inference procedures to sequentially update the model with newly arrived data and show the effectiveness of our proposed model on the Mozilla add-on reviews crawled between 2007 and 2011. © 2013 ACM 2157-6904/2013/12-ART5 $ 15.00.
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.
Resumo:
The use of Bayesian inference in the inference of time-frequency representations has, thus far, been limited to offline analysis of signals, using a smoothing spline based model of the time-frequency plane. In this paper we introduce a new framework that allows the routine use of Bayesian inference for online estimation of the time-varying spectral density of a locally stationary Gaussian process. The core of our approach is the use of a likelihood inspired by a local Whittle approximation. This choice, along with the use of a recursive algorithm for non-parametric estimation of the local spectral density, permits the use of a particle filter for estimating the time-varying spectral density online. We provide demonstrations of the algorithm through tracking chirps and the analysis of musical data.
Resumo:
Detecting anomalies in the online social network is a significant task as it assists in revealing the useful and interesting information about the user behavior on the network. This paper proposes a rule-based hybrid method using graph theory, Fuzzy clustering and Fuzzy rules for modeling user relationships inherent in online-social-network and for identifying anomalies. Fuzzy C-Means clustering is used to cluster the data and Fuzzy inference engine is used to generate rules based on the cluster behavior. The proposed method is able to achieve improved accuracy for identifying anomalies in comparison to existing methods.
Resumo:
We propose an architecture for a rule-based online management systems (RuleOMS). Typically, many domain areas face the problem that stakeholders maintain databases of their business core information and they have to take decisions or create reports according to guidelines, policies or regulations. To address this issue we propose the integration of databases, in particular relational databases, with a logic reasoner and rule engine. We argue that defeasible logic is an appropriate formalism to model rules, in particular when the rules are meant to model regulations. The resulting RuleOMS provides an efficient and flexible solution to the problem at hand using defeasible inference. A case study of an online child care management system is used to illustrate the proposed architecture.
Resumo:
Changepoints are abrupt variations in the generative parameters of a data sequence. Online detection of changepoints is useful in modelling and prediction of time series in application areas such as finance, biometrics, and robotics. While frequentist methods have yielded online filtering and prediction techniques, most Bayesian papers have focused on the retrospective segmentation problem. Here we examine the case where the model parameters before and after the changepoint are independent and we derive an online algorithm for exact inference of the most recent changepoint. We compute the probability distribution of the length of the current ``run,'' or time since the last changepoint, using a simple message-passing algorithm. Our implementation is highly modular so that the algorithm may be applied to a variety of types of data. We illustrate this modularity by demonstrating the algorithm on three different real-world data sets.
Resumo:
The limit order book of an exchange represents an information store of market participants' future aims and for many traders the information held in this store is of interest. However, information loss occurs between orders being entered into the exchange and limit order book data being sent out. We present an online algorithm which carries out Bayesian inference to replace information lost at the level of the exchange server and apply our proof of concept algorithm to real historical data from some of the world's most liquid futures contracts as traded on CME GLOBEX, EUREX and NYSE Liffe exchanges. © 2013 © 2013 Taylor & Francis.
Resumo:
Choosing the right or the best option is often a demanding and challenging task for the user (e.g., a customer in an online retailer) when there are many available alternatives. In fact, the user rarely knows which offering will provide the highest value. To reduce the complexity of the choice process, automated recommender systems generate personalized recommendations. These recommendations take into account the preferences collected from the user in an explicit (e.g., letting users express their opinion about items) or implicit (e.g., studying some behavioral features) way. Such systems are widespread; research indicates that they increase the customers' satisfaction and lead to higher sales. Preference handling is one of the core issues in the design of every recommender system. This kind of system often aims at guiding users in a personalized way to interesting or useful options in a large space of possible options. Therefore, it is important for them to catch and model the user's preferences as accurately as possible. In this thesis, we develop a comparative preference-based user model to represent the user's preferences in conversational recommender systems. This type of user model allows the recommender system to capture several preference nuances from the user's feedback. We show that, when applied to conversational recommender systems, the comparative preference-based model is able to guide the user towards the best option while the system is interacting with her. We empirically test and validate the suitability and the practical computational aspects of the comparative preference-based user model and the related preference relations by comparing them to a sum of weights-based user model and the related preference relations. Product configuration, scheduling a meeting and the construction of autonomous agents are among several artificial intelligence tasks that involve a process of constrained optimization, that is, optimization of behavior or options subject to given constraints with regards to a set of preferences. When solving a constrained optimization problem, pruning techniques, such as the branch and bound technique, point at directing the search towards the best assignments, thus allowing the bounding functions to prune more branches in the search tree. Several constrained optimization problems may exhibit dominance relations. These dominance relations can be particularly useful in constrained optimization problems as they can instigate new ways (rules) of pruning non optimal solutions. Such pruning methods can achieve dramatic reductions in the search space while looking for optimal solutions. A number of constrained optimization problems can model the user's preferences using the comparative preferences. In this thesis, we develop a set of pruning rules used in the branch and bound technique to efficiently solve this kind of optimization problem. More specifically, we show how to generate newly defined pruning rules from a dominance algorithm that refers to a set of comparative preferences. These rules include pruning approaches (and combinations of them) which can drastically prune the search space. They mainly reduce the number of (expensive) pairwise comparisons performed during the search while guiding constrained optimization algorithms to find optimal solutions. Our experimental results show that the pruning rules that we have developed and their different combinations have varying impact on the performance of the branch and bound technique.
Resumo:
Book review of: Chance Encounters: A First Course in Data Analysis and Inference by Christopher J. Wild and George A.F. Seber 2000, John Wiley & Sons Inc. Hard-bound, xviii + 612 pp ISBN 0-471-32936-3
Resumo:
For many learning tasks the duration of the data collection can be greater than the time scale for changes of the underlying data distribution. The question we ask is how to include the information that data are aging. Ad hoc methods to achieve this include the use of validity windows that prevent the learning machine from making inferences based on old data. This introduces the problem of how to define the size of validity windows. In this brief, a new adaptive Bayesian inspired algorithm is presented for learning drifting concepts. It uses the analogy of validity windows in an adaptive Bayesian way to incorporate changes in the data distribution over time. We apply a theoretical approach based on information geometry to the classification problem and measure its performance in simulations. The uncertainty about the appropriate size of the memory windows is dealt with in a Bayesian manner by integrating over the distribution of the adaptive window size. Thus, the posterior distribution of the weights may develop algebraic tails. The learning algorithm results from tracking the mean and variance of the posterior distribution of the weights. It was found that the algebraic tails of this posterior distribution give the learning algorithm the ability to cope with an evolving environment by permitting the escape from local traps.
Resumo:
Operating industrial processes is becoming more complex each day, and one of the factors that contribute to this growth in complexity is the integration of new technologies and smart solutions employed in the industry, such as the decision support systems. In this regard, this dissertation aims to develop a decision support system based on an computational tool called expert system. The main goal is to turn operation more reliable and secure while maximizing the amount of relevant information to each situation by using an expert system based on rules designed for a particular area of expertise. For the modeling of such rules has been proposed a high-level environment, which allows the creation and manipulation of rules in an easier way through visual programming. Despite its wide range of possible applications, this dissertation focuses only in the context of real-time filtering of alarms during the operation, properly validated in a case study based on a real scenario occurred in an industrial plant of an oil and gas refinery
Resumo:
In this work, we propose the Seasonal Dynamic Factor Analysis (SeaDFA), an extension of Nonstationary Dynamic Factor Analysis, through which one can deal with dimensionality reduction in vectors of time series in such a way that both common and specific components are extracted. Furthermore, common factors are able to capture not only regular dynamics (stationary or not) but also seasonal ones, by means of the common factors following a multiplicative seasonal VARIMA(p, d, q) × (P, D, Q)s model. Additionally, a bootstrap procedure that does not need a backward representation of the model is proposed to be able to make inference for all the parameters in the model. A bootstrap scheme developed for forecasting includes uncertainty due to parameter estimation, allowing enhanced coverage of forecasting intervals. A challenging application is provided. The new proposed model and a bootstrap scheme are applied to an innovative subject in electricity markets: the computation of long-term point forecasts and prediction intervals of electricity prices. Several appendices with technical details, an illustrative example, and an additional table are available online as Supplementary Materials.
Resumo:
Tese de mestrado, Bioinformática e Biologia Computacional (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2016