66 resultados para Asymptotic behaviour, Bayesian methods, Mixture models, Overfitting, Posterior concentration
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.
Resumo:
In this paper, we introduce a Bayesian analysis for bioequivalence data assuming multivariate pharmacokinetic measures. With the introduction of correlation parameters between the pharmacokinetic measures or between the random effects in the bioequivalence models, we observe a good improvement in the bioequivalence results. These results are of great practical interest since they can yield higher accuracy and reliability for the bioequivalence tests, usually assumed by regulatory offices. An example is introduced to illustrate the proposed methodology by comparing the usual univariate bioequivalence methods with multivariate bioequivalence. We also consider some usual existing discrimination Bayesian methods to choose the best model to be used in bioequivalence studies.
Resumo:
We consider the two-dimensional Navier-Stokes equations with a time-delayed convective term and a forcing term which contains some hereditary features. Some results on existence and uniqueness of solutions are established. We discuss the asymptotic behaviour of solutions and we also show the exponential stability of stationary solutions.
Resumo:
In this paper, we compare the performance of two statistical approaches for the analysis of data obtained from the social research area. In the first approach, we use normal models with joint regression modelling for the mean and for the variance heterogeneity. In the second approach, we use hierarchical models. In the first case, individual and social variables are included in the regression modelling for the mean and for the variance, as explanatory variables, while in the second case, the variance at level 1 of the hierarchical model depends on the individuals (age of the individuals), and in the level 2 of the hierarchical model, the variance is assumed to change according to socioeconomic stratum. Applying these methodologies, we analyze a Colombian tallness data set to find differences that can be explained by socioeconomic conditions. We also present some theoretical and empirical results concerning the two models. From this comparative study, we conclude that it is better to jointly modelling the mean and variance heterogeneity in all cases. We also observe that the convergence of the Gibbs sampling chain used in the Markov Chain Monte Carlo method for the jointly modeling the mean and variance heterogeneity is quickly achieved.
Resumo:
In this paper we make use of some stochastic volatility models to analyse the behaviour of a weekly ozone average measurements series. The models considered here have been used previously in problems related to financial time series. Two models are considered and their parameters are estimated using a Bayesian approach based on Markov chain Monte Carlo (MCMC) methods. Both models are applied to the data provided by the monitoring network of the Metropolitan Area of Mexico City. The selection of the best model for that specific data set is performed using the Deviance Information Criterion and the Conditional Predictive Ordinate method.
Resumo:
P>In the context of either Bayesian or classical sensitivity analyses of over-parametrized models for incomplete categorical data, it is well known that prior-dependence on posterior inferences of nonidentifiable parameters or that too parsimonious over-parametrized models may lead to erroneous conclusions. Nevertheless, some authors either pay no attention to which parameters are nonidentifiable or do not appropriately account for possible prior-dependence. We review the literature on this topic and consider simple examples to emphasize that in both inferential frameworks, the subjective components can influence results in nontrivial ways, irrespectively of the sample size. Specifically, we show that prior distributions commonly regarded as slightly informative or noninformative may actually be too informative for nonidentifiable parameters, and that the choice of over-parametrized models may drastically impact the results, suggesting that a careful examination of their effects should be considered before drawing conclusions.Resume Que ce soit dans un cadre Bayesien ou classique, il est bien connu que la surparametrisation, dans les modeles pour donnees categorielles incompletes, peut conduire a des conclusions erronees. Cependant, certains auteurs persistent a negliger les problemes lies a la presence de parametres non identifies. Nous passons en revue la litterature dans ce domaine, et considerons quelques exemples surparametres simples dans lesquels les elements subjectifs influencent de facon non negligeable les resultats, independamment de la taille des echantillons. Plus precisement, nous montrons comment des a priori consideres comme peu ou non-informatifs peuvent se reveler extremement informatifs en ce qui concerne les parametres non identifies, et que le recours a des modeles surparametres peut avoir sur les conclusions finales un impact considerable. Ceci suggere un examen tres attentif de l`impact potentiel des a priori.
Resumo:
We present a Bayesian approach for modeling heterogeneous data and estimate multimodal densities using mixtures of Skew Student-t-Normal distributions [Gomez, H.W., Venegas, O., Bolfarine, H., 2007. Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18, 395-407]. A stochastic representation that is useful for implementing a MCMC-type algorithm and results about existence of posterior moments are obtained. Marginal likelihood approximations are obtained, in order to compare mixture models with different number of component densities. Data sets concerning the Gross Domestic Product per capita (Human Development Report) and body mass index (National Health and Nutrition Examination Survey), previously studied in the related literature, are analyzed. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
Hardy-Weinberg Equilibrium (HWE) is an important genetic property that populations should have whenever they are not observing adverse situations as complete lack of panmixia, excess of mutations, excess of selection pressure, etc. HWE for decades has been evaluated; both frequentist and Bayesian methods are in use today. While historically the HWE formula was developed to examine the transmission of alleles in a population from one generation to the next, use of HWE concepts has expanded in human diseases studies to detect genotyping error and disease susceptibility (association); Ryckman and Williams (2008). Most analyses focus on trying to answer the question of whether a population is in HWE. They do not try to quantify how far from the equilibrium the population is. In this paper, we propose the use of a simple disequilibrium coefficient to a locus with two alleles. Based on the posterior density of this disequilibrium coefficient, we show how one can conduct a Bayesian analysis to verify how far from HWE a population is. There are other coefficients introduced in the literature and the advantage of the one introduced in this paper is the fact that, just like the standard correlation coefficients, its range is bounded and it is symmetric around zero (equilibrium) when comparing the positive and the negative values. To test the hypothesis of equilibrium, we use a simple Bayesian significance test, the Full Bayesian Significance Test (FBST); see Pereira, Stern andWechsler (2008) for a complete review. The disequilibrium coefficient proposed provides an easy and efficient way to make the analyses, especially if one uses Bayesian statistics. A routine in R programs (R Development Core Team, 2009) that implements the calculations is provided for the readers.
Resumo:
In a sample of censored survival times, the presence of an immune proportion of individuals who are not subject to death, failure or relapse, may be indicated by a relatively high number of individuals with large censored survival times. In this paper the generalized log-gamma model is modified for the possibility that long-term survivors may be present in the data. The model attempts to separately estimate the effects of covariates on the surviving fraction, that is, the proportion of the population for which the event never occurs. The logistic function is used for the regression model of the surviving fraction. Inference for the model parameters is considered via maximum likelihood. Some influence methods, such as the local influence and total local influence of an individual are derived, analyzed and discussed. Finally, a data set from the medical area is analyzed under the log-gamma generalized mixture model. A residual analysis is performed in order to select an appropriate model.
Resumo:
We discuss the expectation propagation (EP) algorithm for approximate Bayesian inference using a factorizing posterior approximation. For neural network models, we use a central limit theorem argument to make EP tractable when the number of parameters is large. For two types of models, we show that EP can achieve optimal generalization performance when data are drawn from a simple distribution.
Resumo:
In this paper, we introduce a Bayesian analysis for survival multivariate data in the presence of a covariate vector and censored observations. Different ""frailties"" or latent variables are considered to capture the correlation among the survival times for the same individual. We assume Weibull or generalized Gamma distributions considering right censored lifetime data. We develop the Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods.
Resumo:
Background Airway bypass is a bronchoscopic lung-volume reduction procedure for emphysema whereby transbronchial passages into the lung are created to release trapped air, supported with paclitaxel-coated stents to ease the mechanics of breathing. The aim of the EASE (Exhale airway stents for emphysema) trial was to evaluate safety and efficacy of airway bypass in people with severe homogeneous emphysema. Methods We undertook a randomised, double-blind, sham-controlled study in 38 specialist respiratory centres worldwide. We recruited 315 patients who had severe hyperinflation (ratio of residual volume [RV] to total lung capacity of >= 0.65). By computer using a random number generator, we randomly allocated participants (in a 2:1 ratio) to either airway bypass (n=208) or sham control (107). We divided investigators into team A (masked), who completed pre-procedure and post-procedure assessments, and team B (unmasked), who only did bronchoscopies without further interaction with patients. Participants were followed up for 12 months. The 6-month co-primary efficacy endpoint required 12% or greater improvement in forced vital capacity (FVC) and 1 point or greater decrease in the modified Medical Research Council dyspnoea score from baseline. The composite primary safety endpoint incorporated five severe adverse events. We did Bayesian analysis to show the posterior probability that airway bypass was superior to sham control (success threshold, 0.965). Analysis was by intention to treat. This study is registered with ClinicalTrials.gov, number NCT00391612. Findings All recruited patients were included in the analysis. At 6 months, no difference between treatment arms was noted with respect to the co-primary efficacy endpoint (30 of 208 for airway bypass vs 12 of 107 for sham control; posterior probability 0.749, below the Bayesian success threshold of 0.965). The 6-month composite primary safety endpoint was 14.4% (30 of 208) for airway bypass versus 11.2% (12 of 107) for sham control (judged non-inferior, with a posterior probability of 1.00 [Bayesian success threshold >0.95]). Interpretation Although our findings showed safety and transient improvements, no sustainable benefit was recorded with airway bypass in patients with severe homogeneous emphysema.
Resumo:
Historically, the cure rate model has been used for modeling time-to-event data within which a significant proportion of patients are assumed to be cured of illnesses, including breast cancer, non-Hodgkin lymphoma, leukemia, prostate cancer, melanoma, and head and neck cancer. Perhaps the most popular type of cure rate model is the mixture model introduced by Berkson and Gage [1]. In this model, it is assumed that a certain proportion of the patients are cured, in the sense that they do not present the event of interest during a long period of time and can found to be immune to the cause of failure under study. In this paper, we propose a general hazard model which accommodates comprehensive families of cure rate models as particular cases, including the model proposed by Berkson and Gage. The maximum-likelihood-estimation procedure is discussed. A simulation study analyzes the coverage probabilities of the asymptotic confidence intervals for the parameters. A real data set on children exposed to HIV by vertical transmission illustrates the methodology.