926 resultados para Bayesian Mixture Model, Cavalieri Method, Trapezoidal Rule
Resumo:
The recent advent of new technologies has led to huge amounts of genomic data. With these data come new opportunities to understand biological cellular processes underlying hidden regulation mechanisms and to identify disease related biomarkers for informative diagnostics. However, extracting biological insights from the immense amounts of genomic data is a challenging task. Therefore, effective and efficient computational techniques are needed to analyze and interpret genomic data. In this thesis, novel computational methods are proposed to address such challenges: a Bayesian mixture model, an extended Bayesian mixture model, and an Eigen-brain approach. The Bayesian mixture framework involves integration of the Bayesian network and the Gaussian mixture model. Based on the proposed framework and its conjunction with K-means clustering and principal component analysis (PCA), biological insights are derived such as context specific/dependent relationships and nested structures within microarray where biological replicates are encapsulated. The Bayesian mixture framework is then extended to explore posterior distributions of network space by incorporating a Markov chain Monte Carlo (MCMC) model. The extended Bayesian mixture model summarizes the sampled network structures by extracting biologically meaningful features. Finally, an Eigen-brain approach is proposed to analyze in situ hybridization data for the identification of the cell-type specific genes, which can be useful for informative blood diagnostics. Computational results with region-based clustering reveals the critical evidence for the consistency with brain anatomical structure.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Resumo:
2000 Mathematics Subject Classification: 62F15.
Resumo:
We present a statistical image-based shape + structure model for Bayesian visual hull reconstruction and 3D structure inference. The 3D shape of a class of objects is represented by sets of contours from silhouette views simultaneously observed from multiple calibrated cameras. Bayesian reconstructions of new shapes are then estimated using a prior density constructed with a mixture model and probabilistic principal components analysis. We show how the use of a class-specific prior in a visual hull reconstruction can reduce the effect of segmentation errors from the silhouette extraction process. The proposed method is applied to a data set of pedestrian images, and improvements in the approximate 3D models under various noise conditions are shown. We further augment the shape model to incorporate structural features of interest; unknown structural parameters for a novel set of contours are then inferred via the Bayesian reconstruction process. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the input silhouettes, and works even with only a single input view. Using a data set of thousands of pedestrian images generated from a synthetic model, we can accurately infer the 3D locations of 19 joints on the body based on observed silhouette contours from real images.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is interest in studying latent variables. These latent variables are directly considered in the Item Response Models (IRM) and they are usually called latent traits. A usual assumption for parameter estimation of the IRM, considering one group of examinees, is to assume that the latent traits are random variables which follow a standard normal distribution. However, many works suggest that this assumption does not apply in many cases. Furthermore, when this assumption does not hold, the parameter estimates tend to be biased and misleading inference can be obtained. Therefore, it is important to model the distribution of the latent traits properly. In this paper we present an alternative latent traits modeling based on the so-called skew-normal distribution; see Genton (2004). We used the centred parameterization, which was proposed by Azzalini (1985). This approach ensures the model identifiability as pointed out by Azevedo et al. (2009b). Also, a Metropolis Hastings within Gibbs sampling (MHWGS) algorithm was built for parameter estimation by using an augmented data approach. A simulation study was performed in order to assess the parameter recovery in the proposed model and the estimation method, and the effect of the asymmetry level of the latent traits distribution on the parameter estimation. Also, a comparison of our approach with other estimation methods (which consider the assumption of symmetric normality for the latent traits distribution) was considered. The results indicated that our proposed algorithm recovers properly all parameters. Specifically, the greater the asymmetry level, the better the performance of our approach compared with other approaches, mainly in the presence of small sample sizes (number of examinees). Furthermore, we analyzed a real data set which presents indication of asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of strong negative asymmetry of the latent traits distribution. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^
Resumo:
All muscle contractions are dependent on the functioning of motor units. In diseases such as amyotrophic lateral sclerosis (ALS), progressive loss of motor units leads to gradual paralysis. A major difficulty in the search for a treatment for these diseases has been the lack of a reliable measure of disease progression. One possible measure would be an estimate of the number of surviving motor units. Despite over 30 years of motor unit number estimation (MUNE), all proposed methods have been met with practical and theoretical objections. Our aim is to develop a method of MUNE that overcomes these objections. We record the compound muscle action potential (CMAP) from a selected muscle in response to a graded electrical stimulation applied to the nerve. As the stimulus increases, the threshold of each motor unit is exceeded, and the size of the CMAP increases until a maximum response is obtained. However, the threshold potential required to excite an axon is not a precise value but fluctuates over a small range leading to probabilistic activation of motor units in response to a given stimulus. When the threshold ranges of motor units overlap, there may be alternation where the number of motor units that fire in response to the stimulus is variable. This means that increments in the value of the CMAP correspond to the firing of different combinations of motor units. At a fixed stimulus, variability in the CMAP, measured as variance, can be used to conduct MUNE using the "statistical" or the "Poisson" method. However, this method relies on the assumptions that the numbers of motor units that are firing probabilistically have the Poisson distribution and that all single motor unit action potentials (MUAP) have a fixed and identical size. These assumptions are not necessarily correct. We propose to develop a Bayesian statistical methodology to analyze electrophysiological data to provide an estimate of motor unit numbers. Our method of MUNE incorporates the variability of the threshold, the variability between and within single MUAPs, and baseline variability. Our model not only gives the most probable number of motor units but also provides information about both the population of units and individual units. We use Markov chain Monte Carlo to obtain information about the characteristics of individual motor units and about the population of motor units and the Bayesian information criterion for MUNE. We test our method of MUNE on three subjects. Our method provides a reproducible estimate for a patient with stable but severe ALS. In a serial study, we demonstrate a decline in the number of motor unit numbers with a patient with rapidly advancing disease. Finally, with our last patient, we show that our method has the capacity to estimate a larger number of motor units.
Resumo:
Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.
Resumo:
Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.
Resumo:
The popular Newmark algorithm, used for implicit direct integration of structural dynamics, is extended by means of a nodal partition to permit use of different timesteps in different regions of a structural model. The algorithm developed has as a special case an explicit-explicit subcycling algorithm previously reported by Belytschko, Yen and Mullen. That algorithm has been shown, in the absence of damping or other energy dissipation, to exhibit instability over narrow timestep ranges that become narrower as the number of degrees of freedom increases, making them unlikely to be encountered in practice. The present algorithm avoids such instabilities in the case of a one to two timestep ratio (two subcycles), achieving unconditional stability in an exponential sense for a linear problem. However, with three or more subcycles, the trapezoidal rule exhibits stability that becomes conditional, falling towards that of the central difference method as the number of subcycles increases. Instabilities over narrow timestep ranges, that become narrower as the model size increases, also appear with three or more subcycles. However by moving the partition between timesteps one row of elements into the region suitable for integration with the larger timestep these the unstable timestep ranges become extremely narrow, even in simple systems with a few degrees of freedom. As well, accuracy is improved. Use of a version of the Newmark algorithm that dissipates high frequencies minimises or eliminates these narrow bands of instability. Viscous damping is also shown to remove these instabilities, at the expense of having more effect on the low frequency response.
Resumo:
This study was undertaken to test whether the structural remodelling of pulmonary parenchyma can be sequentially altered in a model and method that demonstrate the progression of the disease and result in remodelling within the lungs that is typical of idiopathic pulmonary fibrosis. Three groups of mice were studied: (i) animals that received 3-5-di-tert-butyl-4-hydroxytoluene (BHT) and were killed after 2 weeks (early BHT = 9); (ii) animals that received BHT and were killed after 4 weeks (late BHT = 11); (iii) animals that received corn oil solution (control = 10). The mice were placed in a ventilated Plexiglas chamber with a mixture of pure humidified oxygen and compressed air. Lung histological sections underwent haematoxylin-eosin, immunohistochemistry (epithelial, endothelial and immune cells) and specific staining (collagen/elastic fibres) methods for morphometric analysis. When compared with the control group, early BHT and late BHT groups showed significant decrease of type II pneumocytes, lower vascular density in both and higher endothelial activity. CD4 was increased in late BHT compared with early and control groups, while CD8, macrophage and neutrophil cells were more prominent only in early BHT. The collagenous fibre density were significantly higher only in late BHT, whereas elastic fibre content in late BHT was lower than that in control group. We conclude that the BHT experimental model is pathologically very similar to human usual interstitial pneumonia. This feature is important in the identification of animal models of idiopathic pulmonary fibrosis that can accurately reflect the pathogenesis and progression of the human disease.
Resumo:
Parallel hyperspectral unmixing problem is considered in this paper. A semisupervised approach is developed under the linear mixture model, where the abundance's physical constraints are taken into account. The proposed approach relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. Since Libraries are potentially very large and hyperspectral datasets are of high dimensionality a parallel implementation in a pixel-by-pixel fashion is derived to properly exploits the graphics processing units (GPU) architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for real hyperspectral datasets reveal significant speedup factors, up to 164 times, with regards to optimized serial implementation.
Resumo:
INTRODUCTION: The purpose of this ecological study was to evaluate the urban spatial and temporal distribution of tuberculosis (TB) in Ribeirão Preto, State of São Paulo, southeast Brazil, between 2006 and 2009 and to evaluate its relationship with factors of social vulnerability such as income and education level. METHODS: We evaluated data from TBWeb, an electronic notification system for TB cases. Measures of social vulnerability were obtained from the SEADE Foundation, and information about the number of inhabitants, education and income of the households were obtained from Brazilian Institute of Geography and Statistics. Statistical analyses were conducted by a Bayesian regression model assuming a Poisson distribution for the observed new cases of TB in each area. A conditional autoregressive structure was used for the spatial covariance structure. RESULTS: The Bayesian model confirmed the spatial heterogeneity of TB distribution in Ribeirão Preto, identifying areas with elevated risk and the effects of social vulnerability on the disease. We demonstrated that the rate of TB was correlated with the measures of income, education and social vulnerability. However, we observed areas with low vulnerability and high education and income, but with high estimated TB rates. CONCLUSIONS: The study identified areas with different risks for TB, given that the public health system deals with the characteristics of each region individually and prioritizes those that present a higher propensity to risk of TB. Complex relationships may exist between TB incidence and a wide range of environmental and intrinsic factors, which need to be studied in future research.
Resumo:
This paper extends the Nelson-Siegel linear factor model by developing a flexible macro-finance framework for modeling and forecasting the term structure of US interest rates. Our approach is robust to parameter uncertainty and structural change, as we consider instabilities in parameters and volatilities, and our model averaging method allows for investors' model uncertainty over time. Our time-varying parameter Nelson-Siegel Dynamic Model Averaging (NS-DMA) predicts yields better than standard benchmarks and successfully captures plausible time-varying term premia in real time. The proposed model has significant in-sample and out-of-sample predictability for excess bond returns, and the predictability is of economic value.