933 resultados para Bayesian hierarchical linear model
Resumo:
It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis (Bishop98a) in several directions: 1. We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. 2. We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. 3. Using tools from differential geometry we derive expressions for local directionalcurvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model.We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set andapply our system to two more complex 12- and 19-dimensional data sets.
Resumo:
This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.
Resumo:
A flexible and multipurpose bio-inspired hierarchical model for analyzing musical timbre is presented in this paper. Inspired by findings in the fields of neuroscience, computational neuroscience, and psychoacoustics, not only does the model extract spectral and temporal characteristics of a signal, but it also analyzes amplitude modulations on different timescales. It uses a cochlear filter bank to resolve the spectral components of a sound, lateral inhibition to enhance spectral resolution, and a modulation filter bank to extract the global temporal envelope and roughness of the sound from amplitude modulations. The model was evaluated in three applications. First, it was used to simulate subjective data from two roughness experiments. Second, it was used for musical instrument classification using the k-NN algorithm and a Bayesian network. Third, it was applied to find the features that characterize sounds whose timbres were labeled in an audiovisual experiment. The successful application of the proposed model in these diverse tasks revealed its potential in capturing timbral information.
Resumo:
The cerebral cortex presents self-similarity in a proper interval of spatial scales, a property typical of natural objects exhibiting fractal geometry. Its complexity therefore can be characterized by the value of its fractal dimension (FD). In the computation of this metric, it has usually been employed a frequentist approach to probability, with point estimator methods yielding only the optimal values of the FD. In our study, we aimed at retrieving a more complete evaluation of the FD by utilizing a Bayesian model for the linear regression analysis of the box-counting algorithm. We used T1-weighted MRI data of 86 healthy subjects (age 44.2 ± 17.1 years, mean ± standard deviation, 48% males) in order to gain insights into the confidence of our measure and investigate the relationship between mean Bayesian FD and age. Our approach yielded a stronger and significant (P < .001) correlation between mean Bayesian FD and age as compared to the previous implementation. Thus, our results make us suppose that the Bayesian FD is a more truthful estimation for the fractal dimension of the cerebral cortex compared to the frequentist FD.
Resumo:
Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.
Resumo:
This paper applies Hierarchical Bayesian Models to price farm-level yield insurance contracts. This methodology considers the temporal effect, the spatial dependence and spatio-temporal models. One of the major advantages of this framework is that an estimate of the premium rate is obtained directly from the posterior distribution. These methods were applied to a farm-level data set of soybean in the State of the Parana (Brazil), for the period between 1994 and 2003. The model selection was based on a posterior predictive criterion. This study improves considerably the estimation of the fair premium rates considering the small number of observations.
Resumo:
We introduce the log-beta Weibull regression model based on the beta Weibull distribution (Famoye et al., 2005; Lee et al., 2007). We derive expansions for the moment generating function which do not depend on complicated functions. The new regression model represents a parametric family of models that includes as sub-models several widely known regression models that can be applied to censored survival data. We employ a frequentist analysis, a jackknife estimator, and a parametric bootstrap for the parameters of the proposed model. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess global influences. Further, for different parameter settings, sample sizes, and censoring percentages, several simulations are performed. In addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We define martingale and deviance residuals to evaluate the model assumptions. The extended regression model is very useful for the analysis of real data and could give more realistic fits than other special regression models.
Resumo:
A significant problem in the collection of responses to potentially sensitive questions, such as relating to illegal, immoral or embarrassing activities, is non-sampling error due to refusal to respond or false responses. Eichhorn & Hayre (1983) suggested the use of scrambled responses to reduce this form of bias. This paper considers a linear regression model in which the dependent variable is unobserved but for which the sum or product with a scrambling random variable of known distribution, is known. The performance of two likelihood-based estimators is investigated, namely of a Bayesian estimator achieved through a Markov chain Monte Carlo (MCMC) sampling scheme, and a classical maximum-likelihood estimator. These two estimators and an estimator suggested by Singh, Joarder & King (1996) are compared. Monte Carlo results show that the Bayesian estimator outperforms the classical estimators in almost all cases, and the relative performance of the Bayesian estimator improves as the responses become more scrambled.
Resumo:
This paper presents a method for estimating the posterior probability density of the cointegrating rank of a multivariate error correction model. A second contribution is the careful elicitation of the prior for the cointegrating vectors derived from a prior on the cointegrating space. This prior obtains naturally from treating the cointegrating space as the parameter of interest in inference and overcomes problems previously encountered in Bayesian cointegration analysis. Using this new prior and Laplace approximation, an estimator for the posterior probability of the rank is given. The approach performs well compared with information criteria in Monte Carlo experiments. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Modeling volatile organic compounds (voc`s) adsorption onto cup-stacked carbon nanotubes (cscnt) using the linear driving force model. Volatile organic compounds (VOC`s) are an important category of air pollutants and adsorption has been employed in the treatment (or simply concentration) of these compounds. The current study used an ordinary analytical methodology to evaluate the properties of a cup-stacked nanotube (CSCNT), a stacking morphology of truncated conical graphene, with large amounts of open edges on the outer surface and empty central channels. This work used a Carbotrap bearing a cup-stacked structure (composite); for comparison, Carbotrap was used as reference (without the nanotube). The retention and saturation capacities of both adsorbents to each concentration used (1, 5, 20 and 35 ppm of toluene and phenol) were evaluated. The composite performance was greater than Carbotrap; the saturation capacities for the composite was 67% higher than Carbotrap (average values). The Langmuir isotherm model was used to fit equilibrium data for both adsorbents, and a linear driving force model (LDF) was used to quantify intraparticle adsorption kinetics. LDF was suitable to describe the curves.
Resumo:
HE PROBIT MODEL IS A POPULAR DEVICE for explaining binary choice decisions in econometrics. It has been used to describe choices such as labor force participation, travel mode, home ownership, and type of education. These and many more examples can be found in papers by Amemiya (1981) and Maddala (1983). Given the contribution of economics towards explaining such choices, and given the nature of data that are collected, prior information on the relationship between a choice probability and several explanatory variables frequently exists. Bayesian inference is a convenient vehicle for including such prior information. Given the increasing popularity of Bayesian inference it is useful to ask whether inferences from a probit model are sensitive to a choice between Bayesian and sampling theory techniques. Of interest is the sensitivity of inference on coefficients, probabilities, and elasticities. We consider these issues in a model designed to explain choice between fixed and variable interest rate mortgages. Two Bayesian priors are employed: a uniform prior on the coefficients, designed to be noninformative for the coefficients, and an inequality restricted prior on the signs of the coefficients. We often know, a priori, whether increasing the value of a particular explanatory variable will have a positive or negative effect on a choice probability. This knowledge can be captured by using a prior probability density function (pdf) that is truncated to be positive or negative. Thus, three sets of results are compared:those from maximum likelihood (ML) estimation, those from Bayesian estimation with an unrestricted uniform prior on the coefficients, and those from Bayesian estimation with a uniform prior truncated to accommodate inequality restrictions on the coefficients.
Resumo:
The conventional convection-dispersion model is widely used to interrelate hepatic availability (F) and clearance (Cl) with the morphology and physiology of the liver and to predict effects such as changes in liver blood flow on F and Cl. The extension of this model to include nonlinear kinetics and zonal heterogeneity of the liver is not straightforward and requires numerical solution of partial differential equation, which is not available in standard nonlinear regression analysis software. In this paper, we describe an alternative compartmental model representation of hepatic disposition (including elimination). The model allows the use of standard software for data analysis and accurately describes the outflow concentration-time profile for a vascular marker after bolus injection into the liver. In an evaluation of a number of different compartmental models, the most accurate model required eight vascular compartments, two of them with back mixing. In addition, the model includes two adjacent secondary vascular compartments to describe the tail section of the concentration-time profile for a reference marker. The model has the added flexibility of being easy to modify to model various enzyme distributions and nonlinear elimination. Model predictions of F, MTT, CV2, and concentration-time profile as well as parameter estimates for experimental data of an eliminated solute (palmitate) are comparable to those for the extended convection-dispersion model.
Resumo:
Composition is a practice of key importance in software engineering. When real-time applications are composed it is necessary that their timing properties (such as meeting the deadlines) are guaranteed. The composition is performed by establishing an interface between the application and the physical platform. Such an interface does typically contain information about the amount of computing capacity needed by the application. In multiprocessor platforms, the interface should also present information about the degree of parallelism. Recently there have been quite a few interface proposals. However, they are either too complex to be handled or too pessimistic.In this paper we propose the Generalized Multiprocessor Periodic Resource model (GMPR) that is strictly superior to the MPR model without requiring a too detailed description. We describe a method to generate the interface from the application specification. All these methods have been implemented in Matlab routines that are publicly available.
Resumo:
The problem of selecting suppliers/partners is a crucial and important part in the process of decision making for companies that intend to perform competitively in their area of activity. The selection of supplier/partner is a time and resource-consuming task that involves data collection and a careful analysis of the factors that can positively or negatively influence the choice. Nevertheless it is a critical process that affects significantly the operational performance of each company. In this work, there were identified five broad selection criteria: Quality, Financial, Synergies, Cost, and Production System. Within these criteria, it was also included five sub-criteria. After the identification criteria, a survey was elaborated and companies were contacted in order to understand which factors have more weight in their decisions to choose the partners. Interpreted the results and processed the data, it was adopted a model of linear weighting to reflect the importance of each factor. The model has a hierarchical structure and can be applied with the Analytic Hierarchy Process (AHP) method or Value Analysis. The goal of the paper it's to supply a selection reference model that can represent an orientation/pattern for a decision making on the suppliers/partners selection process