961 resultados para Markov chain modelling
The recent advent of new technologies has led to huge amounts of genomic data. With these data come new opportunities to understand biological cellular processes underlying hidden regulation mechanisms and to identify disease related biomarkers for informative diagnostics. However, extracting biological insights from the immense amounts of genomic data is a challenging task. Therefore, effective and efficient computational techniques are needed to analyze and interpret genomic data. In this thesis, novel computational methods are proposed to address such challenges: a Bayesian mixture model, an extended Bayesian mixture model, and an Eigen-brain approach. The Bayesian mixture framework involves integration of the Bayesian network and the Gaussian mixture model. Based on the proposed framework and its conjunction with K-means clustering and principal component analysis (PCA), biological insights are derived such as context specific/dependent relationships and nested structures within microarray where biological replicates are encapsulated. The Bayesian mixture framework is then extended to explore posterior distributions of network space by incorporating a Markov chain Monte Carlo (MCMC) model. The extended Bayesian mixture model summarizes the sampled network structures by extracting biologically meaningful features. Finally, an Eigen-brain approach is proposed to analyze in situ hybridization data for the identification of the cell-type specific genes, which can be useful for informative blood diagnostics. Computational results with region-based clustering reveals the critical evidence for the consistency with brain anatomical structure.
Statistical methodology is proposed for comparing molecular shapes. In order to account for the continuous nature of molecules, classical shape analysis methods are combined with techniques used for predicting random fields in spatial statistics. Applying a modification of Procrustes analysis, Bayesian inference is carried out using Markov chain Monte Carlo methods for the pairwise alignment of the resulting molecular fields. Superimposing entire fields rather than the configuration matrices of nuclear positions thereby solves the problem that there is usually no clear one--to--one correspondence between the atoms of the two molecules under consideration. Using a similar concept, we also propose an adaptation of the generalised Procrustes analysis algorithm for the simultaneous alignment of multiple molecular fields. The methodology is applied to a dataset of 31 steroid molecules.
Background: Partially clonal organisms are very common in nature, yet the influence of partial asexuality on the temporal dynamics of genetic diversity remains poorly understood. Mathematical models accounting for clonality predict deviations only for extremely rare sex and only towards mean inbreeding coefficient (F-IS) over bar < 0. Yet in partially clonal species, both F-IS < 0 and F-IS > 0 are frequently observed also in populations where there is evidence for a significant amount of sexual reproduction. Here, we studied the joint effects of partial clonality, mutation and genetic drift with a state-and-time discrete Markov chain model to describe the dynamics of F-IS over time under increasing rates of clonality. Results: Results of the mathematical model and simulations show that partial clonality slows down the asymptotic convergence to F-IS = 0. Thus, although clonality alone does not lead to departures from Hardy-Weinberg expectations once reached the final equilibrium state, both negative and positive F-IS values can arise transiently even at intermediate rates of clonality. More importantly, such "transient" departures from Hardy Weinberg proportions may last long as clonality tunes up the temporal variation of F-IS and reduces its rate of change over time, leading to a hyperbolic increase of the maximal time needed to reach the final mean (F-IS,F-infinity) over bar value expected at equilibrium. Conclusion: Our results argue for a dynamical interpretation of F-IS in clonal populations. Negative values cannot be interpreted as unequivocal evidence for extremely scarce sex but also as intermediate rates of clonality in finite populations. Complementary observations (e.g. frequency distribution of multiloci genotypes, population history) or time series data may help to discriminate between different possible conclusions on the extent of clonality when mean (F-IS) over bar values deviating from zero and/or a large variation of F-IS over loci are observed.
Scientific curiosity, exploration of georesources and environmental concerns are pushing the geoscientific research community toward subsurface investigations of ever-increasing complexity. This review explores various approaches to formulate and solve inverse problems in ways that effectively integrate geological concepts with geophysical and hydrogeological data. Modern geostatistical simulation algorithms can produce multiple subsurface realizations that are in agreement with conceptual geological models and statistical rock physics can be used to map these realizations into physical properties that are sensed by the geophysical or hydrogeological data. The inverse problem consists of finding one or an ensemble of such subsurface realizations that are in agreement with the data. The most general inversion frameworks are presently often computationally intractable when applied to large-scale problems and it is necessary to better understand the implications of simplifying (1) the conceptual geological model (e.g., using model compression); (2) the physical forward problem (e.g., using proxy models); and (3) the algorithm used to solve the inverse problem (e.g., Markov chain Monte Carlo or local optimization methods) to reach practical and robust solutions given today's computer resources and knowledge. We also highlight the need to not only use geophysical and hydrogeological data for parameter estimation purposes, but also to use them to falsify or corroborate alternative geological scenarios.
The fundamental objective for health research is to determine whether changes should be made to clinical decisions. Decisions made by veterinary surgeons in the light of new research evidence are known to be influenced by their prior beliefs, especially their initial opinions about the plausibility of possible results. In this paper, clinical trial results for a bovine mastitis control plan were evaluated within a Bayesian context, to incorporate a community of prior distributions that represented a spectrum of clinical prior beliefs. The aim was to quantify the effect of veterinary surgeons’ initial viewpoints on the interpretation of the trial results. A Bayesian analysis was conducted using Markov chain Monte Carlo procedures. Stochastic models included a financial cost attributed to a change in clinical mastitis following implementation of the control plan. Prior distributions were incorporated that covered a realistic range of possible clinical viewpoints, including scepticism, enthusiasm and uncertainty. Posterior distributions revealed important differences in the financial gain that clinicians with different starting viewpoints would anticipate from the mastitis control plan, given the actual research results. For example, a severe sceptic would ascribe a probability of 0.50 for a return of <£5 per cow in an average herd that implemented the plan, whereas an enthusiast would ascribe this probability for a return of >£20 per cow. Simulations using increased trial sizes indicated that if the original study was four times as large, an initial sceptic would be more convinced about the efficacy of the control plan but would still anticipate less financial return than an initial enthusiast would anticipate after the original study. In conclusion, it is possible to estimate how clinicians’ prior beliefs influence their interpretation of research evidence. Further research on the extent to which different interpretations of evidence result in changes to clinical practice would be worthwhile.
Synthetic biology, by co-opting molecular machinery from existing organisms, can be used as a tool for building new genetic systems from scratch, for understanding natural networks through perturbation, or for hybrid circuits that piggy-back on existing cellular infrastructure. Although the toolbox for genetic circuits has greatly expanded in recent years, it is still difficult to separate the circuit function from its specific molecular implementation. In this thesis, we discuss the function-driven design of two synthetic circuit modules, and use mathematical models to understand the fundamental limits of circuit topology versus operating regimes as determined by the specific molecular implementation. First, we describe a protein concentration tracker circuit that sets the concentration of an output protein relative to the concentration of a reference protein. The functionality of this circuit relies on a single negative feedback loop that is implemented via small programmable protein scaffold domains. We build a mass-action model to understand the relevant timescales of the tracking behavior and how the input/output ratios and circuit gain might be tuned with circuit components. Second, we design an event detector circuit with permanent genetic memory that can record order and timing between two chemical events. This circuit was implemented using bacteriophage integrases that recombine specific segments of DNA in response to chemical inputs. We simulate expected population-level outcomes using a stochastic Markov-chain model, and investigate how inferences on past events can be made from differences between single-cell and population-level responses. Additionally, we present some preliminary investigations on spatial patterning using the event detector circuit as well as the design of stationary phase promoters for growth-phase dependent activation. These results advance our understanding of synthetic gene circuits, and contribute towards the use of circuit modules as building blocks for larger and more complex synthetic networks.
The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.
The cerebral cortex presents self-similarity in a proper interval of spatial scales, a property typical of natural objects exhibiting fractal geometry. Its complexity therefore can be characterized by the value of its fractal dimension (FD). In the computation of this metric, it has usually been employed a frequentist approach to probability, with point estimator methods yielding only the optimal values of the FD. In our study, we aimed at retrieving a more complete evaluation of the FD by utilizing a Bayesian model for the linear regression analysis of the box-counting algorithm. We used T1-weighted MRI data of 86 healthy subjects (age 44.2 ± 17.1 years, mean ± standard deviation, 48% males) in order to gain insights into the confidence of our measure and investigate the relationship between mean Bayesian FD and age. Our approach yielded a stronger and significant (P < .001) correlation between mean Bayesian FD and age as compared to the previous implementation. Thus, our results make us suppose that the Bayesian FD is a more truthful estimation for the fractal dimension of the cerebral cortex compared to the frequentist FD.
Despite the success of the ΛCDM model in describing the Universe, a possible tension between early- and late-Universe cosmological measurements is calling for new independent cosmological probes. Amongst the most promising ones, gravitational waves (GWs) can provide a self-calibrated measurement of the luminosity distance. However, to obtain cosmological constraints, additional information is needed to break the degeneracy between parameters in the gravitational waveform. In this thesis, we exploit the latest LIGO-Virgo-KAGRA Gravitational Wave Transient Catalog (GWTC-3) of GW sources to constrain the background cosmological parameters together with the astrophysical properties of Binary Black Holes (BBHs), using information from their mass distribution. We expand the public code MGCosmoPop, previously used for the application of this technique, by implementing a state-of-the-art model for the mass distribution, needed to account for the presence of non-trivial features, i.e. a truncated power law with two additional Gaussian peaks, referred to as Multipeak. We then analyse GWTC-3 comparing this model with simpler and more commonly adopted ones, both in the case of fixed and varying cosmology, and assess their goodness-of-fit with different model selection criteria, and their constraining power on the cosmological and population parameters. We also start to explore different sampling methods, namely Markov Chain Monte Carlo and Nested Sampling, comparing their performances and evaluating the advantages of both. We find concurring evidence that the Multipeak model is favoured by the data, in line with previous results, and show that this conclusion is robust to the variation of the cosmological parameters. We find a constraint on the Hubble constant of H0 = 61.10+38.65−22.43 km/s/Mpc (68% C.L.), which shows the potential of this method in providing independent constraints on cosmological parameters. The results obtained in this work have been included in [1].
In this thesis, we explore constraints which can be put on the primordial power spectrum of curvature perturbations beyond the scales probed by anisotropies of the cosmic microwave background and galaxy surveys. We exploit present and future measurements of CMB spectral distortions, and their synergy with CMB anisotropies, as well existing and future upper limits on the stochastic background of gravitational waves. We derive for the first time phenomenological templates that fit small-scale bumps in the primordial power spectrum generated in multi-field models of inflation. By using such templates, we study for the first time imprints of primordial peaks on anisotropies and spectral distortions of the cosmic microwave background and we investigate their contribution to the stochastic background of gravitational waves. Through a Monte Carlo Markov Chain analysis we infer for the first time the constraints on the amplitude, the width and the location of such bumps using Planck and FIRAS data. We also forecast how a future spectrometer like PIXIE could improve FIRAS boundaries. The results derived in this thesis have implications for the possibility of primordial black holes from inflation.
We shall study continuous-time Markov chains on the nonnegative integers which are both irreducible and transient, and which exhibit discernible stationarity before drift to infinity sets in. We will show how this 'quasi' stationary behaviour can be modelled using a limiting conditional distribution: specifically, the limiting state probabilities conditional on not having left 0 for the last time. By way of a dual chain, obtained by killing the original process on last exit from 0, we invoke the theory of quasistationarity for absorbing Markov chains. We prove that the conditioned state probabilities of the original chain are equal to the state probabilities of its dual conditioned on non-absorption, thus allowing us to establish the simultaneous existence and then equivalence, of their limiting conditional distributions. Although a limiting conditional distribution for the dual chain is always a quasistationary distribution in the usual sense, a similar statement is not possible for the original chain.
Dissertação para obtenção do Grau de Doutor em Engenharia Industrial
Dissertação para obtenção do Grau de Doutor em Engenharia Industrial
This work aims to identify and rank a set of Lean and Green practices and supply chain performance measures on which managers should focus to achieve competitiveness and improve the performance of automotive supply chains. The identification of the contextual relationships among the suggested practices and measures, was performed through literature review. Their ranking was done by interviews with professionals from the automotive industry and academics with wide knowledge on the subject. The methodology of interpretive structural modelling (ISM) is a useful methodology to identify inter relationships among Lean and Green practices and supply chain performance measures and to support the evaluation of automotive supply chain performance. Using the ISM methodology, the variables under study were clustered according to their driving power and dependence power. The ISM methodology was proposed to be used in this work. The model intends to provide a better understanding of the variables that have more influence (driving variables), the others and those which are most influenced (dependent variables) by others. The information provided by this model is strategic for managers who can use it to identify which variables they should focus on in order to have competitive supply chains.