8 resultados para Statistical modeling technique

em Helda - Digital Repository of University of Helsinki


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Metabolism is the cellular subsystem responsible for generation of energy from nutrients and production of building blocks for larger macromolecules. Computational and statistical modeling of metabolism is vital to many disciplines including bioengineering, the study of diseases, drug target identification, and understanding the evolution of metabolism. In this thesis, we propose efficient computational methods for metabolic modeling. The techniques presented are targeted particularly at the analysis of large metabolic models encompassing the whole metabolism of one or several organisms. We concentrate on three major themes of metabolic modeling: metabolic pathway analysis, metabolic reconstruction and the study of evolution of metabolism. In the first part of this thesis, we study metabolic pathway analysis. We propose a novel modeling framework called gapless modeling to study biochemically viable metabolic networks and pathways. In addition, we investigate the utilization of atom-level information on metabolism to improve the quality of pathway analyses. We describe efficient algorithms for discovering both gapless and atom-level metabolic pathways, and conduct experiments with large-scale metabolic networks. The presented gapless approach offers a compromise in terms of complexity and feasibility between the previous graph-theoretic and stoichiometric approaches to metabolic modeling. Gapless pathway analysis shows that microbial metabolic networks are not as robust to random damage as suggested by previous studies. Furthermore the amino acid biosynthesis pathways of the fungal species Trichoderma reesei discovered from atom-level data are shown to closely correspond to those of Saccharomyces cerevisiae. In the second part, we propose computational methods for metabolic reconstruction in the gapless modeling framework. We study the task of reconstructing a metabolic network that does not suffer from connectivity problems. Such problems often limit the usability of reconstructed models, and typically require a significant amount of manual postprocessing. We formulate gapless metabolic reconstruction as an optimization problem and propose an efficient divide-and-conquer strategy to solve it with real-world instances. We also describe computational techniques for solving problems stemming from ambiguities in metabolite naming. These techniques have been implemented in a web-based sofware ReMatch intended for reconstruction of models for 13C metabolic flux analysis. In the third part, we extend our scope from single to multiple metabolic networks and propose an algorithm for inferring gapless metabolic networks of ancestral species from phylogenetic data. Experimenting with 16 fungal species, we show that the method is able to generate results that are easily interpretable and that provide hypotheses about the evolution of metabolism.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Periglacial processes act on cold, non-glacial regions where the landscape deveploment is mainly controlled by frost activity. Circa 25 percent of Earth's surface can be considered as periglacial. Geographical Information System combined with advanced statistical modeling methods, provides an efficient tool and new theoretical perspective for study of cold environments. The aim of this study was to: 1) model and predict the abundance of periglacial phenomena in subarctic environment with statistical modeling, 2) investigate the most import factors affecting the occurence of these phenomena with hierarchical partitioning, 3) compare two widely used statistical modeling methods: Generalized Linear Models and Generalized Additive Models, 4) study modeling resolution's effect on prediction and 5) study how spatially continous prediction can be obtained from point data. The observational data of this study consist of 369 points that were collected during the summers of 2009 and 2010 at the study area in Kilpisjärvi northern Lapland. The periglacial phenomena of interest were cryoturbations, slope processes, weathering, deflation, nivation and fluvial processes. The features were modeled using Generalized Linear Models (GLM) and Generalized Additive Models (GAM) based on Poisson-errors. The abundance of periglacial features were predicted based on these models to a spatial grid with a resolution of one hectare. The most important environmental factors were examined with hierarchical partitioning. The effect of modeling resolution was investigated with in a small independent study area with a spatial resolution of 0,01 hectare. The models explained 45-70 % of the occurence of periglacial phenomena. When spatial variables were added to the models the amount of explained deviance was considerably higher, which signalled a geographical trend structure. The ability of the models to predict periglacial phenomena were assessed with independent evaluation data. Spearman's correlation varied 0,258 - 0,754 between the observed and predicted values. Based on explained deviance, and the results of hierarchical partitioning, the most important environmental variables were mean altitude, vegetation and mean slope angle. The effect of modeling resolution was clear, too coarse resolution caused a loss of information, while finer resolution brought out more localized variation. The models ability to explain and predict periglacial phenomena in the study area were mostly good and moderate respectively. Differences between modeling methods were small, although the explained deviance was higher with GLM-models than GAMs. In turn, GAMs produced more realistic spatial predictions. The single most important environmental variable controlling the occurence of periglacial phenomena was mean altitude, which had strong correlations with many other explanatory variables. The ongoing global warming will have great impact especially in cold environments on high latitudes, and for this reason, an important research topic in the near future will be the response of periglacial environments to a warming climate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many species inhabit fragmented landscapes, resulting either from anthropogenic or from natural processes. The ecological and evolutionary dynamics of spatially structured populations are affected by a complex interplay between endogenous and exogenous factors. The metapopulation approach, simplifying the landscape to a discrete set of patches of breeding habitat surrounded by unsuitable matrix, has become a widely applied paradigm for the study of species inhabiting highly fragmented landscapes. In this thesis, I focus on the construction of biologically realistic models and their parameterization with empirical data, with the general objective of understanding how the interactions between individuals and their spatially structured environment affect ecological and evolutionary processes in fragmented landscapes. I study two hierarchically structured model systems, which are the Glanville fritillary butterfly in the Åland Islands, and a system of two interacting aphid species in the Tvärminne archipelago, both being located in South-Western Finland. The interesting and challenging feature of both study systems is that the population dynamics occur over multiple spatial scales that are linked by various processes. My main emphasis is in the development of mathematical and statistical methodologies. For the Glanville fritillary case study, I first build a Bayesian framework for the estimation of death rates and capture probabilities from mark-recapture data, with the novelty of accounting for variation among individuals in capture probabilities and survival. I then characterize the dispersal phase of the butterflies by deriving a mathematical approximation of a diffusion-based movement model applied to a network of patches. I use the movement model as a building block to construct an individual-based evolutionary model for the Glanville fritillary butterfly metapopulation. I parameterize the evolutionary model using a pattern-oriented approach, and use it to study how the landscape structure affects the evolution of dispersal. For the aphid case study, I develop a Bayesian model of hierarchical multi-scale metapopulation dynamics, where the observed extinction and colonization rates are decomposed into intrinsic rates operating specifically at each spatial scale. In summary, I show how analytical approaches, hierarchical Bayesian methods and individual-based simulations can be used individually or in combination to tackle complex problems from many different viewpoints. In particular, hierarchical Bayesian methods provide a useful tool for decomposing ecological complexity into more tractable components.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis deals with theoretical modeling of the electrodynamics of auroral ionospheres. In the five research articles forming the main part of the thesis we have concentrated on two main themes: Development of new data-analysis techniques and study of inductive phenomena in the ionospheric electrodynamics. The introductory part of the thesis provides a background for these new results and places them in the wider context of ionospheric research. In this thesis we have developed a new tool (called 1D SECS) for analysing ground based magnetic measurements from a 1-dimensional magnetometer chain (usually aligned in the North-South direction) and a new method for obtaining ionospheric electric field from combined ground based magnetic measurements and estimated ionospheric electric conductance. Both these methods are based on earlier work, but contain important new features: 1D SECS respects the spherical geometry of large scale ionospheric electrojet systems and due to an innovative way of implementing boundary conditions the new method for obtaining electric fields can be applied also at local scale studies. These new calculation methods have been tested using both simulated and real data. The tests indicate that the new methods are more reliable than the previous techniques. Inductive phenomena are intimately related to temporal changes in electric currents. As the large scale ionospheric current systems change relatively slowly, in time scales of several minutes or hours, inductive effects are usually assumed to be negligible. However, during the past ten years, it has been realised that induction can play an important part in some ionospheric phenomena. In this thesis we have studied the role of inductive electric fields and currents in ionospheric electrodynamics. We have formulated the induction problem so that only ionospheric electric parameters are used in the calculations. This is in contrast to previous studies, which require knowledge of the magnetospheric-ionosphere coupling. We have applied our technique to several realistic models of typical auroral phenomena. The results indicate that inductive electric fields and currents are locally important during the most dynamical phenomena (like the westward travelling surge, WTS). In these situations induction may locally contribute up to 20-30% of the total ionospheric electric field and currents. Inductive phenomena do also change the field-aligned currents flowing between the ionosphere and magnetosphere, thus modifying the coupling between the two regions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we deal with the concept of risk. The objective is to bring together and conclude on some normative information regarding quantitative portfolio management and risk assessment. The first essay concentrates on return dependency. We propose an algorithm for classifying markets into rising and falling. Given the algorithm, we derive a statistic: the Trend Switch Probability, for detection of long-term return dependency in the first moment. The empirical results suggest that the Trend Switch Probability is robust over various volatility specifications. The serial dependency in bear and bull markets behaves however differently. It is strongly positive in rising market whereas in bear markets it is closer to a random walk. Realized volatility, a technique for estimating volatility from high frequency data, is investigated in essays two and three. In the second essay we find, when measuring realized variance on a set of German stocks, that the second moment dependency structure is highly unstable and changes randomly. Results also suggest that volatility is non-stationary from time to time. In the third essay we examine the impact from market microstructure on the error between estimated realized volatility and the volatility of the underlying process. With simulation-based techniques we show that autocorrelation in returns leads to biased variance estimates and that lower sampling frequency and non-constant volatility increases the error variation between the estimated variance and the variance of the underlying process. From these essays we can conclude that volatility is not easily estimated, even from high frequency data. It is neither very well behaved in terms of stability nor dependency over time. Based on these observations, we would recommend the use of simple, transparent methods that are likely to be more robust over differing volatility regimes than models with a complex parameter universe. In analyzing long-term return dependency in the first moment we find that the Trend Switch Probability is a robust estimator. This is an interesting area for further research, with important implications for active asset allocation.