984 resultados para Bayesian method
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Practical Bayesian inference depends upon detailed examination of posterior distribution. When the prior and likelihood are conjugate, this is easily carried out; however, in general, one must resort to numerical approximation. In this paper, our aim is to solve, using MAPLE, the Bayesian paradigm, for a very special data collecting procedure, known as the randomized-response technique. This allows researchers to obtain sensitive information while guaranteeing privacy to respondents. This approach intends to reduce false responses on sensitive questions. Exact methods and approximations will be compared from the accuracy point of view as well as for the computational effort.
Resumo:
In the context of Bayesian statistical analysis, elicitation is the process of formulating a prior density f(.) about one or more uncertain quantities to represent a person's knowledge and beliefs. Several different methods of eliciting prior distributions for one unknown parameter have been proposed. However, there are relatively few methods for specifying a multivariate prior distribution and most are just applicable to specific classes of problems and/or based on restrictive conditions, such as independence of variables. Besides, many of these procedures require the elicitation of variances and correlations, and sometimes elicitation of hyperparameters which are difficult for experts to specify in practice. Garthwaite et al. (2005) discuss the different methods proposed in the literature and the difficulties of eliciting multivariate prior distributions. We describe a flexible method of eliciting multivariate prior distributions applicable to a wide class of practical problems. Our approach does not assume a parametric form for the unknown prior density f(.), instead we use nonparametric Bayesian inference, modelling f(.) by a Gaussian process prior distribution. The expert is then asked to specify certain summaries of his/her distribution, such as the mean, mode, marginal quantiles and a small number of joint probabilities. The analyst receives that information, treating it as a data set D with which to update his/her prior beliefs to obtain the posterior distribution for f(.). Theoretical properties of joint and marginal priors are derived and numerical illustrations to demonstrate our approach are given. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
A methodology to define favorable areas in petroleum and mineral exploration is applied, which consists in weighting the exploratory variables, in order to characterize their importance as exploration guides. The exploration data are spatially integrated in the selected area to establish the association between variables and deposits, and the relationships among distribution, topology, and indicator pattern of all variables. Two methods of statistical analysis were compared. The first one is the Weights of Evidence Modeling, a conditional probability approach (Agterberg, 1989a), and the second one is the Principal Components Analysis (Pan, 1993). In the conditional method, the favorability estimation is based on the probability of deposit and variable joint occurrence, with the weights being defined as natural logarithms of likelihood ratios. In the multivariate analysis, the cells which contain deposits are selected as control cells and the weights are determined by eigendecomposition, being represented by the coefficients of the eigenvector related to the system's largest eigenvalue. The two techniques of weighting and complementary procedures were tested on two case studies: 1. Recôncavo Basin, Northeast Brazil (for Petroleum) and 2. Itaiacoca Formation of Ribeira Belt, Southeast Brazil (for Pb-Zn Mississippi Valley Type deposits). The applied methodology proved to be easy to use and of great assistance to predict the favorability in large areas, particularly in the initial phase of exploration programs. © 1998 International Association for Mathematical Geology.
Resumo:
In this article we describe a feature extraction algorithm for pattern classification based on Bayesian Decision Boundaries and Pruning techniques. The proposed method is capable of optimizing MLP neural classifiers by retaining those neurons in the hidden layer that realy contribute to correct classification. Also in this article we proposed a method which defines a plausible number of neurons in the hidden layer based on the stem-and-leaf graphics of training samples. Experimental investigation reveals the efficiency of the proposed method. © 2002 IEEE.
Resumo:
The aim of this study was to estimate genetic, environmental and phenotypic correlation between birth weight (BW) and weight at 205 days age (W205), BW and weight at 365 days age (W365) and W205-W365, using Bayesian inference. The Brazilian Program for Genetic Improvement of Buffaloes provided the data that included 3,883 observations from Mediterranean breed buffaloes. With the purpose to estimate variance and covariance, bivariate analyses were performed using Gibbs sampler that is included in the MTGSAM software. The model for BW, W205 and W365 included additive direct and maternal genetic random effects, maternal environmental random effect and contemporary group as fixed effect. The convergence diagnosis was achieved using Geweke, a method that uses an algorithm implemented in R software through the package Bayesian Output Analysis. The calculated direct genetic correlations were 0.34 (BW-W205), 0.25 (BW-W365) and 0.74 (W205-W365). The environmental correlations were 0.12, 0.11 and 0.72 between BW-W205, BW-W365 and W205-W365, respectively. The phenotypic correlations were low for BW-W205 (0.01) and BW-W365 (0.04), differently than the obtained for W205-W365 with a value of 0.67. The results indicate that BW trait have low genetic, environmental and phenotypic association with the two others traits. The genetic correlation between W205 and W365 was high and suggests that the selection for weight at around 205 days could be beneficial to accelerate the genetic gain.
Resumo:
Quantitative analysis of growth genetic parameters is not available for many breeds of buffaloes making selection and breeding decisions an empirical process that lacks robustness. The objective of this study was to estimate heritability for birth weight (BW), weight at 205 days (W205) and 365 days (W365) of age using Bayesian inference. The Brazilian Program for Genetic Improvement of Buffaloes provided the data. For the traits BW, W205 and W365 of Brazilian Mediterranean buffaloes 5169, 3792 and 3883 observations have been employed for the analysis, respectively. In order to obtain the estimates of variance, univariate analyses were conducted using the Gibbs sampler included in the MTGSAM software. The model for BW, W205 and W365 included additive direct and maternal genetic random effects, random maternal permanent environmental effect and contemporary group that was treated as a fixed effect. The convergence diagnosis was performed employing Geweke, a method that uses an algorithm from the Bayesian Output Analysis package that was implemented using R software environment. The average values for weight traits were 37.6 +/- 4.7 kg for BW, 192.7 +/- 40.3 kg for W205 and 298.6 +/- 67.4 kg for W365. The heritability posterior distributions for direct and maternal effects were symmetric and close to those expected in a normal distribution. Direct heritability estimates obtained using the modes were 0.30 (BW), 0.52 (W205) and 0.54 (W365). The maternal heritability coefficient estimates were 0.31, 0.19 and 0.21 for BW, W205 and W365, respectively. Our data suggests that all growth traits and mainly W205 and W365, have clear potential for yield improvement through direct genetic selection.
Resumo:
The objective of the study was to estimate heritability and repeatability for milk yield (MY) and lactation length (LL) in buffaloes using Bayesian inference. The Brazilian genetic improvement program of buffalo provided the data that included 628 females, from four herds, born between 1980 and 2003. In order to obtain the estimates of variance, univariate analyses were performed with the Gibbs sampler, using the MTGSAM software. The model for MY and LL included direct genetic additive and permanent environment as random effects, and contemporary groups, milking frequency and calving number as fixed effects. The convergence diagnosis was performed with the Geweke method using an algorithm implemented in R software through the package Bayesian Output Analysis. Average for milk yield and lactation length was 1,546.1 +/- 483.8 kg and 252.3 +/- 42.5 days, respectively. The heritability coefficients were 0.31 (mode), 0.35 (mean) and 0.34 (median) for MY and 0.11 (mode), 0.10 (mean) and 0.10 (median) for LL. The repeatability coefficient (mode) were 0.50 and 0.15 for MY and LL, respectively. Milk yield is the only trait with clear potential for genetic improvement by direct genetic selection. The repeatability for MY indicates that selection based on the first lactation could contribute for an improvement in this trait.
Resumo:
Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Resumo:
This thesis presents Bayesian solutions to inference problems for three types of social network data structures: a single observation of a social network, repeated observations on the same social network, and repeated observations on a social network developing through time. A social network is conceived as being a structure consisting of actors and their social interaction with each other. A common conceptualisation of social networks is to let the actors be represented by nodes in a graph with edges between pairs of nodes that are relationally tied to each other according to some definition. Statistical analysis of social networks is to a large extent concerned with modelling of these relational ties, which lends itself to empirical evaluation. The first paper deals with a family of statistical models for social networks called exponential random graphs that takes various structural features of the network into account. In general, the likelihood functions of exponential random graphs are only known up to a constant of proportionality. A procedure for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods is presented. The algorithm consists of two basic steps, one in which an ordinary Metropolis-Hastings up-dating step is used, and another in which an importance sampling scheme is used to calculate the acceptance probability of the Metropolis-Hastings step. In paper number two a method for modelling reports given by actors (or other informants) on their social interaction with others is investigated in a Bayesian framework. The model contains two basic ingredients: the unknown network structure and functions that link this unknown network structure to the reports given by the actors. These functions take the form of probit link functions. An intrinsic problem is that the model is not identified, meaning that there are combinations of values on the unknown structure and the parameters in the probit link functions that are observationally equivalent. Instead of using restrictions for achieving identification, it is proposed that the different observationally equivalent combinations of parameters and unknown structure be investigated a posteriori. Estimation of parameters is carried out using Gibbs sampling with a switching devise that enables transitions between posterior modal regions. The main goal of the procedures is to provide tools for comparisons of different model specifications. Papers 3 and 4, propose Bayesian methods for longitudinal social networks. The premise of the models investigated is that overall change in social networks occurs as a consequence of sequences of incremental changes. Models for the evolution of social networks using continuos-time Markov chains are meant to capture these dynamics. Paper 3 presents an MCMC algorithm for exploring the posteriors of parameters for such Markov chains. More specifically, the unobserved evolution of the network in-between observations is explicitly modelled thereby avoiding the need to deal with explicit formulas for the transition probabilities. This enables likelihood based parameter inference in a wider class of network evolution models than has been available before. Paper 4 builds on the proposed inference procedure of Paper 3 and demonstrates how to perform model selection for a class of network evolution models.
Resumo:
In my PhD thesis I propose a Bayesian nonparametric estimation method for structural econometric models where the functional parameter of interest describes the economic agent's behavior. The structural parameter is characterized as the solution of a functional equation, or by using more technical words, as the solution of an inverse problem that can be either ill-posed or well-posed. From a Bayesian point of view, the parameter of interest is a random function and the solution to the inference problem is the posterior distribution of this parameter. A regular version of the posterior distribution in functional spaces is characterized. However, the infinite dimension of the considered spaces causes a problem of non continuity of the solution and then a problem of inconsistency, from a frequentist point of view, of the posterior distribution (i.e. problem of ill-posedness). The contribution of this essay is to propose new methods to deal with this problem of ill-posedness. The first one consists in adopting a Tikhonov regularization scheme in the construction of the posterior distribution so that I end up with a new object that I call regularized posterior distribution and that I guess it is solution of the inverse problem. The second approach consists in specifying a prior distribution on the parameter of interest of the g-prior type. Then, I detect a class of models for which the prior distribution is able to correct for the ill-posedness also in infinite dimensional problems. I study asymptotic properties of these proposed solutions and I prove that, under some regularity condition satisfied by the true value of the parameter of interest, they are consistent in a "frequentist" sense. Once I have set the general theory, I apply my bayesian nonparametric methodology to different estimation problems. First, I apply this estimator to deconvolution and to hazard rate, density and regression estimation. Then, I consider the estimation of an Instrumental Regression that is useful in micro-econometrics when we have to deal with problems of endogeneity. Finally, I develop an application in finance: I get the bayesian estimator for the equilibrium asset pricing functional by using the Euler equation defined in the Lucas'(1978) tree-type models.
Resumo:
In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.