973 resultados para homogeneous Markov chain
Resumo:
Performance-based maintenance contracts differ significantly from material and method-based contracts that have been traditionally used to maintain roads. Road agencies around the world have moved towards a performance-based contract approach because it offers several advantages like cost saving, better budgeting certainty, better customer satisfaction with better road services and conditions. Payments for the maintenance of road are explicitly linked to the contractor successfully meeting certain clearly defined minimum performance indicators in these contracts. Quantitative evaluation of the cost of performance-based contracts has several difficulties due to the complexity of the pavement deterioration process. Based on a probabilistic analysis of failures of achieving multiple performance criteria over the length of the contract period, an effort has been made to develop a model that is capable of estimating the cost of these performance-based contracts. One of the essential functions of such model is to predict performance of the pavement as accurately as possible. Prediction of future degradation of pavement is done using Markov Chain Process, which requires estimating transition probabilities from previous deterioration rate for similar pavements. Transition probabilities were derived using historical pavement condition rating data, both for predicting pavement deterioration when there is no maintenance, and for predicting pavement improvement when maintenance activities are performed. A methodological framework has been developed to estimate the cost of maintaining road based on multiple performance criteria such as crack, rut and, roughness. The application of the developed model has been demonstrated via a real case study of Miami Dade Expressways (MDX) using pavement condition rating data from Florida Department of Transportation (FDOT) for a typical performance-based asphalt pavement maintenance contract. Results indicated that the pavement performance model developed could predict the pavement deterioration quite accurately. Sensitivity analysis performed shows that the model is very responsive to even slight changes in pavement deterioration rate and performance constraints. It is expected that the use of this model will assist the highway agencies and contractors in arriving at a fair contract value for executing long term performance-based pavement maintenance works.
Resumo:
The aim of this thesis is to study the angular momentum of a sample of S0 galaxies. In the quest to understand whether the formation of S0 galaxies is more closely linked to that of ellipticals or that of spirals, our goal is to compare the amount of their specific angular momentum as a function of stellar mass with respect to spirals. Through kinematic comparison between these different classes of galaxies we aim to understand if a scenario of passive evolution, in which the galaxy’s gas is consumed and the star formation is quenched, can be considered as plausible mechanism to explain the transformation from spirals to S0s. In order to derive the structural and photometric parameters of galaxy sub-components we performed a bulge-disc decomposition of optical images using GALFIT. The stellar kinematic of the galaxies was measured using integral field spectroscopic data from CALIFA survey. The development of new original software, based on a Monte Carlo Markov Chain algorithm, allowed us to obtain the values of the line of sight velocity and velocity dispersion of disc and bulge components. The result that we obtained is that S0 discs have a distribution of stellar specific angular momentum that is in full agreement with that of spiral discs, so the mechanism of simple fading can be considered as one of the most important for transformation from spirals to S0s.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
While molecular and cellular processes are often modeled as stochastic processes, such as Brownian motion, chemical reaction networks and gene regulatory networks, there are few attempts to program a molecular-scale process to physically implement stochastic processes. DNA has been used as a substrate for programming molecular interactions, but its applications are restricted to deterministic functions and unfavorable properties such as slow processing, thermal annealing, aqueous solvents and difficult readout limit them to proof-of-concept purposes. To date, whether there exists a molecular process that can be programmed to implement stochastic processes for practical applications remains unknown.
In this dissertation, a fully specified Resonance Energy Transfer (RET) network between chromophores is accurately fabricated via DNA self-assembly, and the exciton dynamics in the RET network physically implement a stochastic process, specifically a continuous-time Markov chain (CTMC), which has a direct mapping to the physical geometry of the chromophore network. Excited by a light source, a RET network generates random samples in the temporal domain in the form of fluorescence photons which can be detected by a photon detector. The intrinsic sampling distribution of a RET network is derived as a phase-type distribution configured by its CTMC model. The conclusion is that the exciton dynamics in a RET network implement a general and important class of stochastic processes that can be directly and accurately programmed and used for practical applications of photonics and optoelectronics. Different approaches to using RET networks exist with vast potential applications. As an entropy source that can directly generate samples from virtually arbitrary distributions, RET networks can benefit applications that rely on generating random samples such as 1) fluorescent taggants and 2) stochastic computing.
By using RET networks between chromophores to implement fluorescent taggants with temporally coded signatures, the taggant design is not constrained by resolvable dyes and has a significantly larger coding capacity than spectrally or lifetime coded fluorescent taggants. Meanwhile, the taggant detection process becomes highly efficient, and the Maximum Likelihood Estimation (MLE) based taggant identification guarantees high accuracy even with only a few hundred detected photons.
Meanwhile, RET-based sampling units (RSU) can be constructed to accelerate probabilistic algorithms for wide applications in machine learning and data analytics. Because probabilistic algorithms often rely on iteratively sampling from parameterized distributions, they can be inefficient in practice on the deterministic hardware traditional computers use, especially for high-dimensional and complex problems. As an efficient universal sampling unit, the proposed RSU can be integrated into a processor / GPU as specialized functional units or organized as a discrete accelerator to bring substantial speedups and power savings.
Resumo:
Dynamic positron emission tomography (PET) imaging can be used to track the distribution of injected radio-labelled molecules over time in vivo. This is a powerful technique, which provides researchers and clinicians the opportunity to study the status of healthy and pathological tissue by examining how it processes substances of interest. Widely used tracers include 18F-uorodeoxyglucose, an analog of glucose, which is used as the radiotracer in over ninety percent of PET scans. This radiotracer provides a way of quantifying the distribution of glucose utilisation in vivo. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue function. As the residue represents the amount of tracer remaining in the tissue, this can be thought of as a survival function; these functions been examined in great detail by the statistics community. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as ow, ux and volume of distribution. This thesis presents a Markov chain formulation of blood tissue exchange and explores how this relates to established compartmental forms. A nonparametric approach to the estimation of the residue is examined and the improvement in this model relative to compartmental model is evaluated using simulations and cross-validation techniques. The reference distribution of the test statistics, generated in comparing the models, is also studied. We explore these models further with simulated studies and an FDG-PET dataset from subjects with gliomas, which has previously been analysed with compartmental modelling. We also consider the performance of a recently proposed mixture modelling technique in this study.
Resumo:
L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.
Resumo:
Human standing posture is inherently unstable. The postural control system (PCS), which maintains standing posture, is composed of the sensory, musculoskeletal, and central nervous systems. Together these systems integrate sensory afferents and generate appropriate motor efferents to adjust posture. The PCS maintains the body center of mass (COM) with respect to the base of support while constantly resisting destabilizing forces from internal and external perturbations. To assess the human PCS, postural sway during quiet standing or in response to external perturbation have frequently been examined descriptively. Minimal work has been done to understand and quantify the robustness of the PCS to perturbations. Further, there have been some previous attempts to assess the dynamical systems aspects of the PCS or time evolutionary properties of postural sway. However those techniques can only provide summary information about the PCS characteristics; they cannot provide specific information about or recreate the actual sway behavior. This dissertation consists of two parts: part I, the development of two novel methods to assess the human PCS and, part II, the application of these methods. In study 1, a systematic method for analyzing the human PCS during perturbed stance was developed. A mild impulsive perturbation that subjects can easily experience in their daily lives was used. A measure of robustness of the PCS, 1/MaxSens that was based on the inverse of the sensitivity of the system, was introduced. 1/MaxSens successfully quantified the reduced robustness to external perturbations due to age-related degradation of the PCS. In study 2, a stochastic model was used to better understand the human PCS in terms of dynamical systems aspect. This methodology also has the advantage over previous methods in that the sway behavior is captured in a model that can be used to recreate the random oscillatory properties of the PCS. The invariant density which describes the long-term stationary behavior of the center of pressure (COP) was computed from a Markov chain model that was applied to postural sway data during quiet stance. In order to validate the Invariant Density Analysis (IDA), we applied the technique to COP data from different age groups. We found that older adults swayed farther from the centroid and in more stochastic and random manner than young adults. In part II, the tools developed in part I were applied to both occupational and clinical situations. In study 3, 1/MaxSens and IDA were applied to a population of firefighters to investigate the effects of air bottle configuration (weight and size) and vision on the postural stability of firefighters. We found that both air bottle weight and loss of vision, but not size of air bottle, significantly decreased balance performance and increased fall risk. In study 4, IDA was applied to data collected on 444 community-dwelling elderly adults from the MOBILIZE Boston Study. Four out of five IDA parameters were able to successfully differentiate recurrent fallers from non-fallers, while only five out of 30 more common descriptive and stochastic COP measures could distinguish the two groups. Fall history and the IDA parameter of entropy were found to be significant risk factors for falls. This research proposed a new measure for the PCS robustness (1/MaxSens) and a new technique for quantifying the dynamical systems aspect of the PCS (IDA). These new PCS analysis techniques provide easy and effective ways to assess the PCS in occupational and clinical environments.
Resumo:
The recent advent of new technologies has led to huge amounts of genomic data. With these data come new opportunities to understand biological cellular processes underlying hidden regulation mechanisms and to identify disease related biomarkers for informative diagnostics. However, extracting biological insights from the immense amounts of genomic data is a challenging task. Therefore, effective and efficient computational techniques are needed to analyze and interpret genomic data. In this thesis, novel computational methods are proposed to address such challenges: a Bayesian mixture model, an extended Bayesian mixture model, and an Eigen-brain approach. The Bayesian mixture framework involves integration of the Bayesian network and the Gaussian mixture model. Based on the proposed framework and its conjunction with K-means clustering and principal component analysis (PCA), biological insights are derived such as context specific/dependent relationships and nested structures within microarray where biological replicates are encapsulated. The Bayesian mixture framework is then extended to explore posterior distributions of network space by incorporating a Markov chain Monte Carlo (MCMC) model. The extended Bayesian mixture model summarizes the sampled network structures by extracting biologically meaningful features. Finally, an Eigen-brain approach is proposed to analyze in situ hybridization data for the identification of the cell-type specific genes, which can be useful for informative blood diagnostics. Computational results with region-based clustering reveals the critical evidence for the consistency with brain anatomical structure.
Resumo:
Statistical methodology is proposed for comparing molecular shapes. In order to account for the continuous nature of molecules, classical shape analysis methods are combined with techniques used for predicting random fields in spatial statistics. Applying a modification of Procrustes analysis, Bayesian inference is carried out using Markov chain Monte Carlo methods for the pairwise alignment of the resulting molecular fields. Superimposing entire fields rather than the configuration matrices of nuclear positions thereby solves the problem that there is usually no clear one--to--one correspondence between the atoms of the two molecules under consideration. Using a similar concept, we also propose an adaptation of the generalised Procrustes analysis algorithm for the simultaneous alignment of multiple molecular fields. The methodology is applied to a dataset of 31 steroid molecules.
Resumo:
Background: Partially clonal organisms are very common in nature, yet the influence of partial asexuality on the temporal dynamics of genetic diversity remains poorly understood. Mathematical models accounting for clonality predict deviations only for extremely rare sex and only towards mean inbreeding coefficient (F-IS) over bar < 0. Yet in partially clonal species, both F-IS < 0 and F-IS > 0 are frequently observed also in populations where there is evidence for a significant amount of sexual reproduction. Here, we studied the joint effects of partial clonality, mutation and genetic drift with a state-and-time discrete Markov chain model to describe the dynamics of F-IS over time under increasing rates of clonality. Results: Results of the mathematical model and simulations show that partial clonality slows down the asymptotic convergence to F-IS = 0. Thus, although clonality alone does not lead to departures from Hardy-Weinberg expectations once reached the final equilibrium state, both negative and positive F-IS values can arise transiently even at intermediate rates of clonality. More importantly, such "transient" departures from Hardy Weinberg proportions may last long as clonality tunes up the temporal variation of F-IS and reduces its rate of change over time, leading to a hyperbolic increase of the maximal time needed to reach the final mean (F-IS,F-infinity) over bar value expected at equilibrium. Conclusion: Our results argue for a dynamical interpretation of F-IS in clonal populations. Negative values cannot be interpreted as unequivocal evidence for extremely scarce sex but also as intermediate rates of clonality in finite populations. Complementary observations (e.g. frequency distribution of multiloci genotypes, population history) or time series data may help to discriminate between different possible conclusions on the extent of clonality when mean (F-IS) over bar values deviating from zero and/or a large variation of F-IS over loci are observed.
Resumo:
Scientific curiosity, exploration of georesources and environmental concerns are pushing the geoscientific research community toward subsurface investigations of ever-increasing complexity. This review explores various approaches to formulate and solve inverse problems in ways that effectively integrate geological concepts with geophysical and hydrogeological data. Modern geostatistical simulation algorithms can produce multiple subsurface realizations that are in agreement with conceptual geological models and statistical rock physics can be used to map these realizations into physical properties that are sensed by the geophysical or hydrogeological data. The inverse problem consists of finding one or an ensemble of such subsurface realizations that are in agreement with the data. The most general inversion frameworks are presently often computationally intractable when applied to large-scale problems and it is necessary to better understand the implications of simplifying (1) the conceptual geological model (e.g., using model compression); (2) the physical forward problem (e.g., using proxy models); and (3) the algorithm used to solve the inverse problem (e.g., Markov chain Monte Carlo or local optimization methods) to reach practical and robust solutions given today's computer resources and knowledge. We also highlight the need to not only use geophysical and hydrogeological data for parameter estimation purposes, but also to use them to falsify or corroborate alternative geological scenarios.
Resumo:
The fundamental objective for health research is to determine whether changes should be made to clinical decisions. Decisions made by veterinary surgeons in the light of new research evidence are known to be influenced by their prior beliefs, especially their initial opinions about the plausibility of possible results. In this paper, clinical trial results for a bovine mastitis control plan were evaluated within a Bayesian context, to incorporate a community of prior distributions that represented a spectrum of clinical prior beliefs. The aim was to quantify the effect of veterinary surgeons’ initial viewpoints on the interpretation of the trial results. A Bayesian analysis was conducted using Markov chain Monte Carlo procedures. Stochastic models included a financial cost attributed to a change in clinical mastitis following implementation of the control plan. Prior distributions were incorporated that covered a realistic range of possible clinical viewpoints, including scepticism, enthusiasm and uncertainty. Posterior distributions revealed important differences in the financial gain that clinicians with different starting viewpoints would anticipate from the mastitis control plan, given the actual research results. For example, a severe sceptic would ascribe a probability of 0.50 for a return of <£5 per cow in an average herd that implemented the plan, whereas an enthusiast would ascribe this probability for a return of >£20 per cow. Simulations using increased trial sizes indicated that if the original study was four times as large, an initial sceptic would be more convinced about the efficacy of the control plan but would still anticipate less financial return than an initial enthusiast would anticipate after the original study. In conclusion, it is possible to estimate how clinicians’ prior beliefs influence their interpretation of research evidence. Further research on the extent to which different interpretations of evidence result in changes to clinical practice would be worthwhile.
Resumo:
Synthetic biology, by co-opting molecular machinery from existing organisms, can be used as a tool for building new genetic systems from scratch, for understanding natural networks through perturbation, or for hybrid circuits that piggy-back on existing cellular infrastructure. Although the toolbox for genetic circuits has greatly expanded in recent years, it is still difficult to separate the circuit function from its specific molecular implementation. In this thesis, we discuss the function-driven design of two synthetic circuit modules, and use mathematical models to understand the fundamental limits of circuit topology versus operating regimes as determined by the specific molecular implementation. First, we describe a protein concentration tracker circuit that sets the concentration of an output protein relative to the concentration of a reference protein. The functionality of this circuit relies on a single negative feedback loop that is implemented via small programmable protein scaffold domains. We build a mass-action model to understand the relevant timescales of the tracking behavior and how the input/output ratios and circuit gain might be tuned with circuit components. Second, we design an event detector circuit with permanent genetic memory that can record order and timing between two chemical events. This circuit was implemented using bacteriophage integrases that recombine specific segments of DNA in response to chemical inputs. We simulate expected population-level outcomes using a stochastic Markov-chain model, and investigate how inferences on past events can be made from differences between single-cell and population-level responses. Additionally, we present some preliminary investigations on spatial patterning using the event detector circuit as well as the design of stationary phase promoters for growth-phase dependent activation. These results advance our understanding of synthetic gene circuits, and contribute towards the use of circuit modules as building blocks for larger and more complex synthetic networks.
Resumo:
We measured the distribution in absolute magnitude - circular velocity space for a well-defined sample of 199 rotating galaxies of the Calar Alto Legacy Integral Field Area Survey (CALIFA) using their stellar kinematics. Our aim in this analysis is to avoid subjective selection criteria and to take volume and large-scale structure factors into account. Using stellar velocity fields instead of gas emission line kinematics allows including rapidly rotating early-type galaxies. Our initial sample contains 277 galaxies with available stellar velocity fields and growth curve r-band photometry. After rejecting 51 velocity fields that could not be modelled because of the low number of bins, foreground contamination, or significant interaction, we performed Markov chain Monte Carlo modelling of the velocity fields, from which we obtained the rotation curve and kinematic parameters and their realistic uncertainties. We performed an extinction correction and calculated the circular velocity v_circ accounting for the pressure support of a given galaxy. The resulting galaxy distribution on the M-r - v(circ) plane was then modelled as a mixture of two distinct populations, allowing robust and reproducible rejection of outliers, a significant fraction of which are slow rotators. The selection effects are understood well enough that we were able to correct for the incompleteness of the sample. The 199 galaxies were weighted by volume and large-scale structure factors, which enabled us to fit a volume-corrected Tully-Fisher relation (TFR). More importantly, we also provide the volume-corrected distribution of galaxies in the M_r - v_circ plane, which can be compared with cosmological simulations. The joint distribution of the luminosity and circular velocity space densities, representative over the range of -20 > M_r > -22 mag, can place more stringent constraints on the galaxy formation and evolution scenarios than linear TFR fit parameters or the luminosity function alone.