976 resultados para Bayesian models


Relevância:

40.00% 40.00%

Publicador:

Resumo:

We propose and analyze two different Bayesian online algorithms for learning in discrete Hidden Markov Models and compare their performance with the already known Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalization we draw learning curves in simplified situations for these algorithms and compare their performances.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Esta tesis doctoral nace con el propósito de entender, analizar y sobre todo modelizar el comportamiento estadístico de las series financieras. En este sentido, se puede afirmar que los modelos que mejor recogen las especiales características de estas series son los modelos de heterocedasticidad condicionada en tiempo discreto,si los intervalos de tiempo en los que se recogen los datos lo permiten, y en tiempo continuo si tenemos datos diarios o datos intradía. Con esta finalidad, en esta tesis se proponen distintos estimadores bayesianos para la estimación de los parámetros de los modelos GARCH en tiempo discreto (Bollerslev (1986)) y COGARCH en tiempo continuo (Kluppelberg et al. (2004)). En el capítulo 1 se introducen las características de las series financieras y se presentan los modelos ARCH, GARCH y COGARCH, así como sus principales propiedades. Mandelbrot (1963) destacó que las series financieras no presentan estacionariedad y que sus incrementos no presentan autocorrelación, aunque sus cuadrados sí están correlacionados. Señaló también que la volatilidad que presentan no es constante y que aparecen clusters de volatilidad. Observó la falta de normalidad de las series financieras, debida principalmente a su comportamiento leptocúrtico, y también destacó los efectos estacionales que presentan las series, analizando como se ven afectadas por la época del año o el día de la semana. Posteriormente Black (1976) completó la lista de características especiales incluyendo los denominados leverage effects relacionados con como las fluctuaciones positivas y negativas de los precios de los activos afectan a la volatilidad de las series de forma distinta.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Survival or longevity is an economically important trait in beef cattle. The main inconvenience for its inclusion in selection criteria is delayed recording of phenotypic data and the high computational demand for including survival in proportional hazard models. Thus, identification of a longevity-correlated trait that could be recorded early in life would be very useful for selection purposes. We estimated the genetic relationship of survival with productive and reproductive traits in Nellore cattle, including weaning weight (WW), post-weaning growth (PWG), muscularity (MUSC), scrotal circumference at 18 months (SC18), and heifer pregnancy (HP). Survival was measured in discrete time intervals and modeled through a sequential threshold model. Five independent bivariate Bayesian analyses were performed, accounting for cow survival and the five productive and reproductive traits. Posterior mean estimates for heritability (standard deviation in parentheses) were 0.55 (0.01) for WW, 0.25 (0.01) for PWG, 0.23 (0.01) for MUSC, and 0.48 (0.01) for SC18. The posterior mean estimates (95% confidence interval in parentheses) for the genetic correlation with survival were 0.16 (0.13-0.19), 0.30 (0.25-0.34), 0.31 (0.25-0.36), 0.07 (0.02-0.12), and 0.82 (0.78-0.86) for WW, PWG, MUSC, SC18, and HP, respectively. Based on the high genetic correlation and heritability (0.54) posterior mean estimates for HP, the expected progeny difference for HP can be used to select bulls for longevity, as well as for post-weaning gain and muscle score.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Creation of cold dark matter (CCDM) can macroscopically be described by a negative pressure, and, therefore, the mechanism is capable to accelerate the Universe, without the need of an additional dark energy component. In this framework, we discuss the evolution of perturbations by considering a Neo-Newtonian approach where, unlike in the standard Newtonian cosmology, the fluid pressure is taken into account even in the homogeneous and isotropic background equations (Lima, Zanchin, and Brandenberger, MNRAS 291, L1, 1997). The evolution of the density contrast is calculated in the linear approximation and compared to the one predicted by the Lambda CDM model. The difference between the CCDM and Lambda CDM predictions at the perturbative level is quantified by using three different statistical methods, namely: a simple chi(2)-analysis in the relevant space parameter, a Bayesian statistical inference, and, finally, a Kolmogorov-Smirnov test. We find that under certain circumstances, the CCDM scenario analyzed here predicts an overall dynamics (including Hubble flow and matter fluctuation field) which fully recovers that of the traditional cosmic concordance model. Our basic conclusion is that such a reduction of the dark sector provides a viable alternative description to the accelerating Lambda CDM cosmology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Bayesian mixing models have allowed for the inclusion of uncertainty and prior information in the analysis of trophic interactions using stable isotopes. Formulating prior distributions is relatively straightforward when incorporating dietary data. However, the use of data that are related, but not directly proportional, to diet (such as prey availability data) is often problematic because such information is not necessarily predictive of diet, and the information required to build a reliable prior distribution for all prey species is often unavailable. Omitting prey availability data impacts the estimation of a predator's diet and introduces the strong assumption of consumer ultrageneralism (where all prey are consumed in equal proportions), particularly when multiple prey have similar isotope values. Methodology: We develop a procedure to incorporate prey availability data into Bayesian mixing models conditional on the similarity of isotope values between two prey. If a pair of prey have similar isotope values (resulting in highly uncertain mixing model results), our model increases the weight of availability data in estimating the contribution of prey to a predator's diet. We test the utility of this method in an intertidal community against independently measured feeding rates. Conclusions: Our results indicate that our weighting procedure increases the accuracy by which consumer diets can be inferred in situations where multiple prey have similar isotope values. This suggests that the exchange of formalism for predictive power is merited, particularly when the relationship between prey availability and a predator's diet cannot be assumed for all species in a system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: Understanding the patterns of association between polymorphisms at different loci in a population ( linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging. Results: We present a more practical method to build GM that describe LD. The method is based on learning weighted Bayesian network structures from haplotype data, extracting equivalence structure classes and using them to model LD. The results obtained in public data from the HapMap database showed that the method is a promising tool for modeling LD. The associations represented by the learned models are correlated with the traditional measure of LD D`. The method was able to represent LD blocks found by standard tools. The granularity of the association blocks and the readability of the models can be controlled in the method. The results suggest that the causality information gained by our method can be useful to tell about the conservability of the genetic markers and to guide the selection of subset of representative markers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed-crop competitiveness using as input categorical variables for the total density of weeds and corresponding proportions of narrow and broad-leaved weeds. The inferred categorical variable values for the weed-crop competitiveness along with three other categorical variables extracted from estimated maps for the weed seed production and weed coverage are then used as input for a second Bayesian network classifier to infer categorical variables values for the risk of infestation. Weed biomass and yield loss data samples are used to learn the probability relationship among the nodes of the first and second Bayesian classifiers in a supervised fashion, respectively. For comparison purposes, two types of Bayesian network structures are considered, namely an expert-based Bayesian classifier and a naive Bayes classifier. The inference system focused on the knowledge interpretation by translating a Bayesian classifier into a set of classification rules. The results obtained for the risk inference in a corn-crop field are presented and discussed. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present various diagnostic methods for polyhazard models. Polyhazard models are a flexible family for fitting lifetime data. Their main advantage over the single hazard models, such as the Weibull and the log-logistic models, is to include a large amount of nonmonotone hazard shapes, as bathtub and multimodal curves. Some influence methods, such as the local influence and total local influence of an individual are derived, analyzed and discussed. A discussion of the computation of the likelihood displacement as well as the normal curvature in the local influence method are presented. Finally, an example with real data is given for illustration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A total of 152,145 weekly test-day milk yield records from 7317 first lactations of Holstein cows distributed in 93 herds in southeastern Brazil were analyzed. Test-day milk yields were classified into 44 weekly classes of DIM. The contemporary groups were defined as herd-year-week of test-day. The model included direct additive genetic, permanent environmental and residual effects as random and fixed effects of contemporary group and age of cow at calving as covariable, linear and quadratic effects. Mean trends were modeled by a cubic regression on orthogonal polynomials of DIM. Additive genetic and permanent environmental random effects were estimated by random regression on orthogonal Legendre polynomials. Residual variances were modeled using third to seventh-order variance functions or a step function with 1, 6,13,17 and 44 variance classes. Results from Akaike`s and Schwarz`s Bayesian information criterion suggested that a model considering a 7th-order Legendre polynomial for additive effect, a 12th-order polynomial for permanent environment effect and a step function with 6 classes for residual variances, fitted best. However, a parsimonious model, with a 6th-order Legendre polynomial for additive effects and a 7th-order polynomial for permanent environmental effects, yielded very similar genetic parameter estimates. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the investment decisions considering the presence of financial constraints of 373 large Brazilian firms from 1997 to 2004, using panel data. A Bayesian econometric model was used considering ridge regression for multicollinearity problems among the variables in the model. Prior distributions are assumed for the parameters, classifying the model into random or fixed effects. We used a Bayesian approach to estimate the parameters, considering normal and Student t distributions for the error and assumed that the initial values for the lagged dependent variable are not fixed, but generated by a random process. The recursive predictive density criterion was used for model comparisons. Twenty models were tested and the results indicated that multicollinearity does influence the value of the estimated parameters. Controlling for capital intensity, financial constraints are found to be more important for capital-intensive firms, probably due to their lower profitability indexes, higher fixed costs and higher degree of property diversification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We discuss the expectation propagation (EP) algorithm for approximate Bayesian inference using a factorizing posterior approximation. For neural network models, we use a central limit theorem argument to make EP tractable when the number of parameters is large. For two types of models, we show that EP can achieve optimal generalization performance when data are drawn from a simple distribution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study focus on the probabilistic modelling of mechanical properties of prestressing strands based on data collected from tensile tests carried out in Laboratório Nacional de Engenharia Civil (LNEC), Portugal, for certification purposes, and covers a period of about 9 years of production. The strands studied were produced by six manufacturers from four countries, namely Portugal, Spain, Italy and Thailand. Variability of the most important mechanical properties is examined and the results are compared with the recommendations of the Probabilistic Model Code, as well as the Eurocodes and earlier studies. The obtained results show a very low variability which, of course, benefits structural safety. Based on those results, probabilistic models for the most important mechanical properties of prestressing strands are proposed.