11 resultados para Modeling methods
em Helda - Digital Repository of University of Helsinki
Resumo:
Periglacial processes act on cold, non-glacial regions where the landscape deveploment is mainly controlled by frost activity. Circa 25 percent of Earth's surface can be considered as periglacial. Geographical Information System combined with advanced statistical modeling methods, provides an efficient tool and new theoretical perspective for study of cold environments. The aim of this study was to: 1) model and predict the abundance of periglacial phenomena in subarctic environment with statistical modeling, 2) investigate the most import factors affecting the occurence of these phenomena with hierarchical partitioning, 3) compare two widely used statistical modeling methods: Generalized Linear Models and Generalized Additive Models, 4) study modeling resolution's effect on prediction and 5) study how spatially continous prediction can be obtained from point data. The observational data of this study consist of 369 points that were collected during the summers of 2009 and 2010 at the study area in Kilpisjärvi northern Lapland. The periglacial phenomena of interest were cryoturbations, slope processes, weathering, deflation, nivation and fluvial processes. The features were modeled using Generalized Linear Models (GLM) and Generalized Additive Models (GAM) based on Poisson-errors. The abundance of periglacial features were predicted based on these models to a spatial grid with a resolution of one hectare. The most important environmental factors were examined with hierarchical partitioning. The effect of modeling resolution was investigated with in a small independent study area with a spatial resolution of 0,01 hectare. The models explained 45-70 % of the occurence of periglacial phenomena. When spatial variables were added to the models the amount of explained deviance was considerably higher, which signalled a geographical trend structure. The ability of the models to predict periglacial phenomena were assessed with independent evaluation data. Spearman's correlation varied 0,258 - 0,754 between the observed and predicted values. Based on explained deviance, and the results of hierarchical partitioning, the most important environmental variables were mean altitude, vegetation and mean slope angle. The effect of modeling resolution was clear, too coarse resolution caused a loss of information, while finer resolution brought out more localized variation. The models ability to explain and predict periglacial phenomena in the study area were mostly good and moderate respectively. Differences between modeling methods were small, although the explained deviance was higher with GLM-models than GAMs. In turn, GAMs produced more realistic spatial predictions. The single most important environmental variable controlling the occurence of periglacial phenomena was mean altitude, which had strong correlations with many other explanatory variables. The ongoing global warming will have great impact especially in cold environments on high latitudes, and for this reason, an important research topic in the near future will be the response of periglacial environments to a warming climate.
Resumo:
In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.
Resumo:
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.
Resumo:
This thesis addresses modeling of financial time series, especially stock market returns and daily price ranges. Modeling data of this kind can be approached with so-called multiplicative error models (MEM). These models nest several well known time series models such as GARCH, ACD and CARR models. They are able to capture many well established features of financial time series including volatility clustering and leptokurtosis. In contrast to these phenomena, different kinds of asymmetries have received relatively little attention in the existing literature. In this thesis asymmetries arise from various sources. They are observed in both conditional and unconditional distributions, for variables with non-negative values and for variables that have values on the real line. In the multivariate context asymmetries can be observed in the marginal distributions as well as in the relationships of the variables modeled. New methods for all these cases are proposed. Chapter 2 considers GARCH models and modeling of returns of two stock market indices. The chapter introduces the so-called generalized hyperbolic (GH) GARCH model to account for asymmetries in both conditional and unconditional distribution. In particular, two special cases of the GARCH-GH model which describe the data most accurately are proposed. They are found to improve the fit of the model when compared to symmetric GARCH models. The advantages of accounting for asymmetries are also observed through Value-at-Risk applications. Both theoretical and empirical contributions are provided in Chapter 3 of the thesis. In this chapter the so-called mixture conditional autoregressive range (MCARR) model is introduced, examined and applied to daily price ranges of the Hang Seng Index. The conditions for the strict and weak stationarity of the model as well as an expression for the autocorrelation function are obtained by writing the MCARR model as a first order autoregressive process with random coefficients. The chapter also introduces inverse gamma (IG) distribution to CARR models. The advantages of CARR-IG and MCARR-IG specifications over conventional CARR models are found in the empirical application both in- and out-of-sample. Chapter 4 discusses the simultaneous modeling of absolute returns and daily price ranges. In this part of the thesis a vector multiplicative error model (VMEM) with asymmetric Gumbel copula is found to provide substantial benefits over the existing VMEM models based on elliptical copulas. The proposed specification is able to capture the highly asymmetric dependence of the modeled variables thereby improving the performance of the model considerably. The economic significance of the results obtained is established when the information content of the volatility forecasts derived is examined.
Resumo:
Metabolism is the cellular subsystem responsible for generation of energy from nutrients and production of building blocks for larger macromolecules. Computational and statistical modeling of metabolism is vital to many disciplines including bioengineering, the study of diseases, drug target identification, and understanding the evolution of metabolism. In this thesis, we propose efficient computational methods for metabolic modeling. The techniques presented are targeted particularly at the analysis of large metabolic models encompassing the whole metabolism of one or several organisms. We concentrate on three major themes of metabolic modeling: metabolic pathway analysis, metabolic reconstruction and the study of evolution of metabolism. In the first part of this thesis, we study metabolic pathway analysis. We propose a novel modeling framework called gapless modeling to study biochemically viable metabolic networks and pathways. In addition, we investigate the utilization of atom-level information on metabolism to improve the quality of pathway analyses. We describe efficient algorithms for discovering both gapless and atom-level metabolic pathways, and conduct experiments with large-scale metabolic networks. The presented gapless approach offers a compromise in terms of complexity and feasibility between the previous graph-theoretic and stoichiometric approaches to metabolic modeling. Gapless pathway analysis shows that microbial metabolic networks are not as robust to random damage as suggested by previous studies. Furthermore the amino acid biosynthesis pathways of the fungal species Trichoderma reesei discovered from atom-level data are shown to closely correspond to those of Saccharomyces cerevisiae. In the second part, we propose computational methods for metabolic reconstruction in the gapless modeling framework. We study the task of reconstructing a metabolic network that does not suffer from connectivity problems. Such problems often limit the usability of reconstructed models, and typically require a significant amount of manual postprocessing. We formulate gapless metabolic reconstruction as an optimization problem and propose an efficient divide-and-conquer strategy to solve it with real-world instances. We also describe computational techniques for solving problems stemming from ambiguities in metabolite naming. These techniques have been implemented in a web-based sofware ReMatch intended for reconstruction of models for 13C metabolic flux analysis. In the third part, we extend our scope from single to multiple metabolic networks and propose an algorithm for inferring gapless metabolic networks of ancestral species from phylogenetic data. Experimenting with 16 fungal species, we show that the method is able to generate results that are easily interpretable and that provide hypotheses about the evolution of metabolism.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Many species inhabit fragmented landscapes, resulting either from anthropogenic or from natural processes. The ecological and evolutionary dynamics of spatially structured populations are affected by a complex interplay between endogenous and exogenous factors. The metapopulation approach, simplifying the landscape to a discrete set of patches of breeding habitat surrounded by unsuitable matrix, has become a widely applied paradigm for the study of species inhabiting highly fragmented landscapes. In this thesis, I focus on the construction of biologically realistic models and their parameterization with empirical data, with the general objective of understanding how the interactions between individuals and their spatially structured environment affect ecological and evolutionary processes in fragmented landscapes. I study two hierarchically structured model systems, which are the Glanville fritillary butterfly in the Åland Islands, and a system of two interacting aphid species in the Tvärminne archipelago, both being located in South-Western Finland. The interesting and challenging feature of both study systems is that the population dynamics occur over multiple spatial scales that are linked by various processes. My main emphasis is in the development of mathematical and statistical methodologies. For the Glanville fritillary case study, I first build a Bayesian framework for the estimation of death rates and capture probabilities from mark-recapture data, with the novelty of accounting for variation among individuals in capture probabilities and survival. I then characterize the dispersal phase of the butterflies by deriving a mathematical approximation of a diffusion-based movement model applied to a network of patches. I use the movement model as a building block to construct an individual-based evolutionary model for the Glanville fritillary butterfly metapopulation. I parameterize the evolutionary model using a pattern-oriented approach, and use it to study how the landscape structure affects the evolution of dispersal. For the aphid case study, I develop a Bayesian model of hierarchical multi-scale metapopulation dynamics, where the observed extinction and colonization rates are decomposed into intrinsic rates operating specifically at each spatial scale. In summary, I show how analytical approaches, hierarchical Bayesian methods and individual-based simulations can be used individually or in combination to tackle complex problems from many different viewpoints. In particular, hierarchical Bayesian methods provide a useful tool for decomposing ecological complexity into more tractable components.
Resumo:
This thesis deals with theoretical modeling of the electrodynamics of auroral ionospheres. In the five research articles forming the main part of the thesis we have concentrated on two main themes: Development of new data-analysis techniques and study of inductive phenomena in the ionospheric electrodynamics. The introductory part of the thesis provides a background for these new results and places them in the wider context of ionospheric research. In this thesis we have developed a new tool (called 1D SECS) for analysing ground based magnetic measurements from a 1-dimensional magnetometer chain (usually aligned in the North-South direction) and a new method for obtaining ionospheric electric field from combined ground based magnetic measurements and estimated ionospheric electric conductance. Both these methods are based on earlier work, but contain important new features: 1D SECS respects the spherical geometry of large scale ionospheric electrojet systems and due to an innovative way of implementing boundary conditions the new method for obtaining electric fields can be applied also at local scale studies. These new calculation methods have been tested using both simulated and real data. The tests indicate that the new methods are more reliable than the previous techniques. Inductive phenomena are intimately related to temporal changes in electric currents. As the large scale ionospheric current systems change relatively slowly, in time scales of several minutes or hours, inductive effects are usually assumed to be negligible. However, during the past ten years, it has been realised that induction can play an important part in some ionospheric phenomena. In this thesis we have studied the role of inductive electric fields and currents in ionospheric electrodynamics. We have formulated the induction problem so that only ionospheric electric parameters are used in the calculations. This is in contrast to previous studies, which require knowledge of the magnetospheric-ionosphere coupling. We have applied our technique to several realistic models of typical auroral phenomena. The results indicate that inductive electric fields and currents are locally important during the most dynamical phenomena (like the westward travelling surge, WTS). In these situations induction may locally contribute up to 20-30% of the total ionospheric electric field and currents. Inductive phenomena do also change the field-aligned currents flowing between the ionosphere and magnetosphere, thus modifying the coupling between the two regions.
Resumo:
Solar UV radiation is harmful for life on planet Earth, but fortunately the atmospheric oxygen and ozone absorb almost entirely the most energetic UVC radiation photons. However, part of the UVB radiation and much of the UVA radiation reaches the surface of the Earth, and affect human health, environment, materials and drive atmospheric and aquatic photochemical processes. In order to quantify these effects and processes there is a need for ground-based UV measurements and radiative transfer modeling to estimate the amounts of UV radiation reaching the biosphere. Satellite measurements with their near-global spatial coverage and long-term data conti-nuity offer an attractive option for estimation of the surface UV radiation. This work focuses on radiative transfer theory based methods used for estimation of the UV radiation reaching the surface of the Earth. The objectives of the thesis were to implement the surface UV algorithm originally developed at NASA Goddard Space Flight Center for estimation of the surface UV irradiance from the meas-urements of the Dutch-Finnish built Ozone Monitoring Instrument (OMI), to improve the original surface UV algorithm especially in relation with snow cover, to validate the OMI-derived daily surface UV doses against ground-based measurements, and to demonstrate how the satellite-derived surface UV data can be used to study the effects of the UV radiation. The thesis consists of seven original papers and a summary. The summary includes an introduction of the OMI instrument, a review of the methods used for modeling of the surface UV using satellite data as well as the con-clusions of the main results of the original papers. The first two papers describe the algorithm used for estimation of the surface UV amounts from the OMI measurements as well as the unique Very Fast Delivery processing system developed for processing of the OMI data received at the Sodankylä satellite data centre. The third and the fourth papers present algorithm improvements related to the surface UV albedo of the snow-covered land. Fifth paper presents the results of the comparison of the OMI-derived daily erythemal doses with those calculated from the ground-based measurement data. It gives an estimate of the expected accuracy of the OMI-derived sur-face UV doses for various atmospheric and other conditions, and discusses the causes of the differences between the satellite-derived and ground-based data. The last two papers demonstrate the use of the satellite-derived sur-face UV data. Sixth paper presents an assessment of the photochemical decomposition rates in aquatic environment. Seventh paper presents use of satellite-derived daily surface UV doses for planning of the outdoor material weathering tests.
Resumo:
11β-hydroksisteroididehydrogenaasientsyymit (11β-HSD) 1 ja 2 säätelevät kortisonin ja kortisolin määrää kudoksissa. 11β-HSD1 -entsyymin ylimäärä erityisesti viskeraalisessa rasvakudoksessa aiheuttaa metaboliseen oireyhtymän klassisia oireita, mikä tarjoaa mahdollisuuden metabolisen oireyhtymän hoitoon 11β-HSD1 -entsyymin selektiivisellä estämisellä. 11β-HSD2 -entsyymin inhibitio aiheuttaa kortisonivälitteisen mineralokortikoidireseptorien aktivoitumisen, mikä puolestaan johtaa hypertensiivisiin haittavaikutuksiin. Haittavaikutuksista huolimatta 11β-HSD2 -entsyymin estäminen saattaa olla hyödyllistä tilanteissa, joissa halutaan nostaa kortisolin määrä elimistössä. Lukuisia selektiivisiä 11β-HSD1 inhibiittoreita on kehitetty, mutta 11β-HSD2-inhibiittoreita on raportoitu vähemmän. Ero näiden kahden isotsyymin aktiivisen kohdan välillä on myös tuntematon, mikä vaikeuttaa selektiivisten inhibiittoreiden kehittämistä kummallekin entsyymille. Tällä työllä oli kaksi tarkoitusta: (1) löytää ero 11β-HSD entsyymien välillä ja (2) kehittää farmakoforimalli, jota voitaisiin käyttää selektiivisten 11β-HSD2 -inhibiittoreiden virtuaaliseulontaan. Ongelmaa lähestyttiin tietokoneavusteisesti: homologimallinnuksella, pienmolekyylien telakoinnilla proteiiniin, ligandipohjaisella farmakoforimallinnuksella ja virtuaaliseulonnalla. Homologimallinnukseen käytettiin SwissModeler -ohjelmaa, ja luotu malli oli hyvin päällekäinaseteltavissa niin templaattinsa (17β-HSD1) kuin 11β-HSD1 -entsyymin kanssa. Eroa entsyymien välillä ei löytynyt tarkastelemalla päällekäinaseteltuja entsyymejä. Seitsemän yhdistettä, joista kuusi on 11β-HSD2 -selektiivisiä, telakoitiin molempiin entsyymeihin käyttäen ohjelmaa GOLD. 11β-HSD1 -entsyymiin yhdisteet kiinnittyivät kuten suurin osa 11β-HSD1 -selektiivisistä tai epäselektiivisistä inhibiittoreista, kun taas 11β-HSD2 -entsyymiin kaikki yhdisteet olivat telakoituneet käänteisesti. Tällainen sitoutumistapa mahdollistaa vetysidokset Ser310:een ja Asn171:een, aminohappoihin, jotka olivat nähtävissä vain 11β-HSD2 -entsyymissä. Farmakoforimallinnukseen käytettiin ohjelmaa LigandScout3.0, jolla ajettiin myös virtuaaliseulonnat. Luodut kaksi farmakoforimallia, jotka perustuivat aiemmin telakointiinkin käytettyihin kuuteen 11β-HSD2 -selektiiviseen yhdisteeseen, koostuivat kuudesta ominaisuudesta (vetysidosakseptori, vetysidosdonori ja hydrofobinen), ja kieltoalueista. 11β-HSD2 -selektiivisyyden kannalta tärkeimmät ominaisuudet ovat vetysidosakseptori, joka voi muodostaa sidoksen Ser310 kanssa ja vetysidosdonori sen vieressä. Tälle vetysidosdonorille ei löytynyt vuorovaikutusparia 11β-HSD2-mallista. Sopivasti proteiiniin orientoitunut vesimolekyyli voisi kuitenkin olla sopiva ratkaisu puuttuvalle vuorovaikutusparille. Koska molemmat farmakoforimallit löysivät 11β-HSD2 -selektiivisiä yhdisteitä ja jättivät epäselektiivisiä pois testiseulonnassa, käytettiin molempia malleja Innsbruckin yliopistossa säilytettävistä yhdisteistä (2700 kappaletta) koostetun tietokannan seulontaan. Molemmista seulonnoista löytyneistä hiteistä valittiin yhteensä kymmenen kappaletta, jotka lähetettiin biologisiin testeihin. Biologisien testien tulokset vahvistavat lopullisesti sen kuinka hyvin luodut mallit edustavat todellisuudessa 11β-HSD2 -selektiivisyyttä.
Resumo:
In this thesis we deal with the concept of risk. The objective is to bring together and conclude on some normative information regarding quantitative portfolio management and risk assessment. The first essay concentrates on return dependency. We propose an algorithm for classifying markets into rising and falling. Given the algorithm, we derive a statistic: the Trend Switch Probability, for detection of long-term return dependency in the first moment. The empirical results suggest that the Trend Switch Probability is robust over various volatility specifications. The serial dependency in bear and bull markets behaves however differently. It is strongly positive in rising market whereas in bear markets it is closer to a random walk. Realized volatility, a technique for estimating volatility from high frequency data, is investigated in essays two and three. In the second essay we find, when measuring realized variance on a set of German stocks, that the second moment dependency structure is highly unstable and changes randomly. Results also suggest that volatility is non-stationary from time to time. In the third essay we examine the impact from market microstructure on the error between estimated realized volatility and the volatility of the underlying process. With simulation-based techniques we show that autocorrelation in returns leads to biased variance estimates and that lower sampling frequency and non-constant volatility increases the error variation between the estimated variance and the variance of the underlying process. From these essays we can conclude that volatility is not easily estimated, even from high frequency data. It is neither very well behaved in terms of stability nor dependency over time. Based on these observations, we would recommend the use of simple, transparent methods that are likely to be more robust over differing volatility regimes than models with a complex parameter universe. In analyzing long-term return dependency in the first moment we find that the Trend Switch Probability is a robust estimator. This is an interesting area for further research, with important implications for active asset allocation.