7 resultados para Exponential random graph models
em Helda - Digital Repository of University of Helsinki
Resumo:
Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.
Resumo:
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.
Resumo:
Planar curves arise naturally as interfaces between two regions of the plane. An important part of statistical physics is the study of lattice models. This thesis is about the interfaces of 2D lattice models. The scaling limit is an infinite system limit which is taken by letting the lattice mesh decrease to zero. At criticality, the scaling limit of an interface is one of the SLE curves (Schramm-Loewner evolution), introduced by Oded Schramm. This family of random curves is parametrized by a real variable, which determines the universality class of the model. The first and the second paper of this thesis study properties of SLEs. They contain two different methods to study the whole SLE curve, which is, in fact, the most interesting object from the statistical physics point of view. These methods are applied to study two symmetries of SLE: reversibility and duality. The first paper uses an algebraic method and a representation of the Virasoro algebra to find common martingales to different processes, and that way, to confirm the symmetries for polynomial expected values of natural SLE data. In the second paper, a recursion is obtained for the same kind of expected values. The recursion is based on stationarity of the law of the whole SLE curve under a SLE induced flow. The third paper deals with one of the most central questions of the field and provides a framework of estimates for describing 2D scaling limits by SLE curves. In particular, it is shown that a weak estimate on the probability of an annulus crossing implies that a random curve arising from a statistical physics model will have scaling limits and those will be well-described by Loewner evolutions with random driving forces.
Resumo:
This thesis addresses modeling of financial time series, especially stock market returns and daily price ranges. Modeling data of this kind can be approached with so-called multiplicative error models (MEM). These models nest several well known time series models such as GARCH, ACD and CARR models. They are able to capture many well established features of financial time series including volatility clustering and leptokurtosis. In contrast to these phenomena, different kinds of asymmetries have received relatively little attention in the existing literature. In this thesis asymmetries arise from various sources. They are observed in both conditional and unconditional distributions, for variables with non-negative values and for variables that have values on the real line. In the multivariate context asymmetries can be observed in the marginal distributions as well as in the relationships of the variables modeled. New methods for all these cases are proposed. Chapter 2 considers GARCH models and modeling of returns of two stock market indices. The chapter introduces the so-called generalized hyperbolic (GH) GARCH model to account for asymmetries in both conditional and unconditional distribution. In particular, two special cases of the GARCH-GH model which describe the data most accurately are proposed. They are found to improve the fit of the model when compared to symmetric GARCH models. The advantages of accounting for asymmetries are also observed through Value-at-Risk applications. Both theoretical and empirical contributions are provided in Chapter 3 of the thesis. In this chapter the so-called mixture conditional autoregressive range (MCARR) model is introduced, examined and applied to daily price ranges of the Hang Seng Index. The conditions for the strict and weak stationarity of the model as well as an expression for the autocorrelation function are obtained by writing the MCARR model as a first order autoregressive process with random coefficients. The chapter also introduces inverse gamma (IG) distribution to CARR models. The advantages of CARR-IG and MCARR-IG specifications over conventional CARR models are found in the empirical application both in- and out-of-sample. Chapter 4 discusses the simultaneous modeling of absolute returns and daily price ranges. In this part of the thesis a vector multiplicative error model (VMEM) with asymmetric Gumbel copula is found to provide substantial benefits over the existing VMEM models based on elliptical copulas. The proposed specification is able to capture the highly asymmetric dependence of the modeled variables thereby improving the performance of the model considerably. The economic significance of the results obtained is established when the information content of the volatility forecasts derived is examined.
Resumo:
Metabolism is the cellular subsystem responsible for generation of energy from nutrients and production of building blocks for larger macromolecules. Computational and statistical modeling of metabolism is vital to many disciplines including bioengineering, the study of diseases, drug target identification, and understanding the evolution of metabolism. In this thesis, we propose efficient computational methods for metabolic modeling. The techniques presented are targeted particularly at the analysis of large metabolic models encompassing the whole metabolism of one or several organisms. We concentrate on three major themes of metabolic modeling: metabolic pathway analysis, metabolic reconstruction and the study of evolution of metabolism. In the first part of this thesis, we study metabolic pathway analysis. We propose a novel modeling framework called gapless modeling to study biochemically viable metabolic networks and pathways. In addition, we investigate the utilization of atom-level information on metabolism to improve the quality of pathway analyses. We describe efficient algorithms for discovering both gapless and atom-level metabolic pathways, and conduct experiments with large-scale metabolic networks. The presented gapless approach offers a compromise in terms of complexity and feasibility between the previous graph-theoretic and stoichiometric approaches to metabolic modeling. Gapless pathway analysis shows that microbial metabolic networks are not as robust to random damage as suggested by previous studies. Furthermore the amino acid biosynthesis pathways of the fungal species Trichoderma reesei discovered from atom-level data are shown to closely correspond to those of Saccharomyces cerevisiae. In the second part, we propose computational methods for metabolic reconstruction in the gapless modeling framework. We study the task of reconstructing a metabolic network that does not suffer from connectivity problems. Such problems often limit the usability of reconstructed models, and typically require a significant amount of manual postprocessing. We formulate gapless metabolic reconstruction as an optimization problem and propose an efficient divide-and-conquer strategy to solve it with real-world instances. We also describe computational techniques for solving problems stemming from ambiguities in metabolite naming. These techniques have been implemented in a web-based sofware ReMatch intended for reconstruction of models for 13C metabolic flux analysis. In the third part, we extend our scope from single to multiple metabolic networks and propose an algorithm for inferring gapless metabolic networks of ancestral species from phylogenetic data. Experimenting with 16 fungal species, we show that the method is able to generate results that are easily interpretable and that provide hypotheses about the evolution of metabolism.
Resumo:
Cosmological inflation is the dominant paradigm in explaining the origin of structure in the universe. According to the inflationary scenario, there has been a period of nearly exponential expansion in the very early universe, long before the nucleosynthesis. Inflation is commonly considered as a consequence of some scalar field or fields whose energy density starts to dominate the universe. The inflationary expansion converts the quantum fluctuations of the fields into classical perturbations on superhorizon scales and these primordial perturbations are the seeds of the structure in the universe. Moreover, inflation also naturally explains the high degree of homogeneity and spatial flatness of the early universe. The real challenge of the inflationary cosmology lies in trying to establish a connection between the fields driving inflation and theories of particle physics. In this thesis we concentrate on inflationary models at scales well below the Planck scale. The low scale allows us to seek for candidates for the inflationary matter within extensions of the Standard Model but typically also implies fine-tuning problems. We discuss a low scale model where inflation is driven by a flat direction of the Minimally Supersymmetric Standard Model. The relation between the potential along the flat direction and the underlying supergravity model is studied. The low inflationary scale requires an extremely flat potential but we find that in this particular model the associated fine-tuning problems can be solved in a rather natural fashion in a class of supergravity models. For this class of models, the flatness is a consequence of the structure of the supergravity model and is insensitive to the vacuum expectation values of the fields that break supersymmetry. Another low scale model considered in the thesis is the curvaton scenario where the primordial perturbations originate from quantum fluctuations of a curvaton field, which is different from the fields driving inflation. The curvaton gives a negligible contribution to the total energy density during inflation but its perturbations become significant in the post-inflationary epoch. The separation between the fields driving inflation and the fields giving rise to primordial perturbations opens up new possibilities to lower the inflationary scale without introducing fine-tuning problems. The curvaton model typically gives rise to relatively large level of non-gaussian features in the statistics of primordial perturbations. We find that the level of non-gaussian effects is heavily dependent on the form of the curvaton potential. Future observations that provide more accurate information of the non-gaussian statistics can therefore place constraining bounds on the curvaton interactions.
Resumo:
Markov random fields (MRF) are popular in image processing applications to describe spatial dependencies between image units. Here, we take a look at the theory and the models of MRFs with an application to improve forest inventory estimates. Typically, autocorrelation between study units is a nuisance in statistical inference, but we take an advantage of the dependencies to smooth noisy measurements by borrowing information from the neighbouring units. We build a stochastic spatial model, which we estimate with a Markov chain Monte Carlo simulation method. The smooth values are validated against another data set increasing our confidence that the estimates are more accurate than the originals.