33 resultados para Gaussian Distribution
em Helda - Digital Repository of University of Helsinki
Resumo:
Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.
Resumo:
Inflation is a period of accelerated expansion in the very early universe, which has the appealing aspect that it can create primordial perturbations via quantum fluctuations. These primordial perturbations have been observed in the cosmic microwave background, and these perturbations also function as the seeds of all large-scale structure in the universe. Curvaton models are simple modifications of the standard inflationary paradigm, where inflation is driven by the energy density of the inflaton, but another field, the curvaton, is responsible for producing the primordial perturbations. The curvaton decays after inflation as ended, where the isocurvature perturbations of the curvaton are converted into adiabatic perturbations. Since the curvaton must decay, it must have some interactions. Additionally realistic curvaton models typically have some self-interactions. In this work we consider self-interacting curvaton models, where the self-interaction is a monomial in the potential, suppressed by the Planck scale, and thus the self-interaction is very weak. Nevertheless, since the self-interaction makes the equations of motion non-linear, it can modify the behaviour of the model very drastically. The most intriguing aspect of this behaviour is that the final properties of the perturbations become highly dependent on the initial values. Departures of Gaussian distribution are important observables of the primordial perturbations. Due to the non-linearity of the self-interacting curvaton model and its sensitivity to initial conditions, it can produce significant non-Gaussianity of the primordial perturbations. In this work we investigate the non-Gaussianity produced by the self-interacting curvaton, and demonstrate that the non-Gaussianity parameters do not obey the analytically derived approximate relations often cited in the literature. Furthermore we also consider a self-interacting curvaton with a mass in the TeV-scale. Motivated by realistic particle physics models such as the Minimally Supersymmetric Standard Model, we demonstrate that a curvaton model within the mass range can be responsible for the observed perturbations if it can decay late enough.
Resumo:
One of the most fundamental and widely accepted ideas in finance is that investors are compensated through higher returns for taking on non-diversifiable risk. Hence the quantification, modeling and prediction of risk have been, and still are one of the most prolific research areas in financial economics. It was recognized early on that there are predictable patterns in the variance of speculative prices. Later research has shown that there may also be systematic variation in the skewness and kurtosis of financial returns. Lacking in the literature so far, is an out-of-sample forecast evaluation of the potential benefits of these new more complicated models with time-varying higher moments. Such an evaluation is the topic of this dissertation. Essay 1 investigates the forecast performance of the GARCH (1,1) model when estimated with 9 different error distributions on Standard and Poor’s 500 Index Future returns. By utilizing the theory of realized variance to construct an appropriate ex post measure of variance from intra-day data it is shown that allowing for a leptokurtic error distribution leads to significant improvements in variance forecasts compared to using the normal distribution. This result holds for daily, weekly as well as monthly forecast horizons. It is also found that allowing for skewness and time variation in the higher moments of the distribution does not further improve forecasts. In Essay 2, by using 20 years of daily Standard and Poor 500 index returns, it is found that density forecasts are much improved by allowing for constant excess kurtosis but not improved by allowing for skewness. By allowing the kurtosis and skewness to be time varying the density forecasts are not further improved but on the contrary made slightly worse. In Essay 3 a new model incorporating conditional variance, skewness and kurtosis based on the Normal Inverse Gaussian (NIG) distribution is proposed. The new model and two previously used NIG models are evaluated by their Value at Risk (VaR) forecasts on a long series of daily Standard and Poor’s 500 returns. The results show that only the new model produces satisfactory VaR forecasts for both 1% and 5% VaR Taken together the results of the thesis show that kurtosis appears not to exhibit predictable time variation, whereas there is found some predictability in the skewness. However, the dynamic properties of the skewness are not completely captured by any of the models.
Resumo:
Anthesis was studied at the canopy level in 10 Norway spruce stands from 9 localities in Finland from 1963 to 1974. Distributions of pollen catches were compared to the normal Gaussian distribution. The basis for the timing studies was the 50 per cent point of the anthesis-fitted normal distribution. Development up to this point was given in calendar days, in degree days (>5 °C) and in period units. The count of each parameter began on March 19 (included). Male flowering in Norway spruce stands was found to have more annual variation in quantity than in Scots pine stands studied earlier. Anthesis in spruce in northern Finland occurred at a later date than in the south. The heat sums needed for anthesis varied latitudinally less in spruce than in pine. The variation of pollen catches in spruce increased towards north-west as in the case of Scots pine. In the unprocessed data, calendar days were found to be the most accurate forecast of anthesis in Norway spruce both for a single year and for the majority of cases of stand averages over several years. Locally, the period unit could be a more accurate parameter for the stand average. However, on a calendar day basis, when annual deviations between expected and measured heat sums were converted to days, period units were narrowly superior to days. The geographical correlations respect to timing of flowering, calculated against distances measured along simulated post-glacial migration routes, were stronger than purely latitudinal correlations. Effects of the reinvasion of Norway spruce into Finland are thus still visible in spruce populations just as they were in Scots pine populations. The proportion of the average annual heat sum needed for spruce anthesis grew rapidly north of a latitude of ca. 63° and the heat sum needed for anthesis decreased only slighty towards the timberline. In light of flowering phenology, it seems probable that the northwesterly third of Finnish Norway spruce populations are incompletely adapted to the prevailing cold climate. A moderate warming of the climate would therefore be beneficial for Norway spruce. This accords roughly with the adaptive situation in Scots pine.
Resumo:
Male flowering was studied at the canopy level in 10 silver birch (Betula pendula Roth) stands from 8 localities and in 14 downy birch (B. pubescens Ehrh.) stands from 10 localities in Finland from 1963 to 1973. Distributions of cumulative pollen catches were compared to the normal Gaussian distribution. The basis for the timing of flowering was the 50 per cent point of the anthesis-fitted normal distribution. To eliminate effects of background pollen, only the central, normally distributed part of the cumulative distribution was used. Development up to the median point of the distribution was measured and tested in calendar days, in degree days (> 5 °C) and in period units. The count of each parameter began on and included March 19. Male flowering in silver birch occurred from late April to late June depending on latitude, and flowering in downy birch took place from early May to early July. The heat sums needed for male flowering varied in downy birch stands latitudinally but there was practically no latitudinal variation in heat sums needed for silver birch flowering. The amount of male flowering in stands of both birch species were found to have a large annual variation but without any clear periodicity. The between years pollen catch variation in stands of either birch species did not show any significant latitudinal correlation in contrast to Norway spruce stands. The period unit heat sum gave the most accurate forecast of the timing of flowering for 60 per cent of the silver birch stands and for 78.6 per cent of the for downy birch stands. Calendar days, however, gave the best forecast for silver birch in 25 per cent of the cases, while degree days gave the best forecast for downy birch in 21.4 per cent of the cases. Silver birch seems to have a local inclination for a more fixed flowering date compared to downy birch, which could mean a considerable photoperiodic influence on flowering time of silver birch. Silver birch and downy birch had different geographical correlations. Frequent hybridization of birch species occurs more often in northern Finland in than in more southern latitudes. The different timing in flowering caused increasing scatter in flowering times in the north, especially in the case of downy birch. The chance of simultaneous flowering of silver birch and downy birch so increased northwards due to a more variable climate and also higher altitudinal variations. Compared with conifers, the reproduction cycles of both birch species were found to be well protected from damage by frost.
Resumo:
Basement membranes are specialized sheets of extracellular matrix found in contact with epithelia, endothelia, and certain isolated cells. They support tissue architecture and regulate cell behaviour. Laminins are among the main constituents of basement membranes. Due to differences between laminin isoforms, laminins confer structural and functional diversity to basement membranes. The first aim of this study was to gain insights into the potential functions of the then least characterized laminins, alpha4 chain laminins, by evaluating their distribution in human tissues. We thus created a monoclonal antibody specific for laminin alpha4 chain. By immunohistochemistry, alpha4 chain laminins were primarily localized to basement membranes of blood vessel endothelia, skeletal, heart, and smooth muscle cells, nerves, and adipocytes. In addition, alpha4 chain laminins were found in the region of certain epithelial basement membranes in the epidermis, salivary gland, pancreas, esophagus, stomach, intestine, and kidney. Because of the consistent presence of alpha4 chain laminins in endothelial basement membranes of blood vessels, we evaluated the potential roles of endothelial laminins in blood vessels, lymphatic vessels, and carcinomas. Human endothelial cells produced alpha4 and alpha5 chain laminins. In quantitative and morphological adhesion assays, human endothelial cells barely adhered to alpha4 chain-containing laminin-411. The weak interaction of endothelial cells with laminin-411 appeared to be mediated by alpha6beta1 integrin. The alpha5 chain-containing laminin-511 promoted endothelial cell adhesion better than laminin-411, but it did not promote the formation of cell-extracellular matrix adhesion complexes. The adhesion of endothelial cells to laminin-511 appeared to be mediated by Lutheran glycoprotein together with beta1 and alphavbeta3 integrins. The results suggest that these laminins may induce a migratory phenotype in endothelial cells. In lymphatic capillaries, endothelial basement membranes showed immunoreactivity for laminin alpha4, beta1, beta2, and gamma1 chains, type IV and XVIII collagens, and nidogen-1. Considering the assumed inability of alpha4 chain laminins to polymerize and to promote basement membrane assembly, the findings may in part explain the incomplete basement membrane formation in these vessels. Lymphatic capillaries of ovarian carcinomas showed immunoreactivity also for laminin alpha5 chain and its receptor Lutheran glycoprotein, emphasizing a difference between normal and ovarian carcinoma lymphatic capillaries. In renal cell carcinomas, immunoreactivity for laminin alpha4 chain was found in stroma and basement membranes of blood vessels. In most tumours, immunoreactivity for laminin alpha4 chain was also observed in the basement membrane region of tumour cell islets. Renal carcinoma cells produced alpha4 chain laminins. Laminin-411 did not promote adhesion of renal carcinoma cells, but inhibited their adhesion to fibronectin. Renal carcinoma cells migrated more on laminin-411 than on fibronectin. The results suggest that alpha4 chain laminins have a counteradhesive function, and may thus have a role in detachment and invasion of renal carcinoma cells.
Resumo:
This thesis presents novel modelling applications for environmental geospatial data using remote sensing, GIS and statistical modelling techniques. The studied themes can be classified into four main themes: (i) to develop advanced geospatial databases. Paper (I) demonstrates the creation of a geospatial database for the Glanville fritillary butterfly (Melitaea cinxia) in the Åland Islands, south-western Finland; (ii) to analyse species diversity and distribution using GIS techniques. Paper (II) presents a diversity and geographical distribution analysis for Scopulini moths at a world-wide scale; (iii) to study spatiotemporal forest cover change. Paper (III) presents a study of exotic and indigenous tree cover change detection in Taita Hills Kenya using airborne imagery and GIS analysis techniques; (iv) to explore predictive modelling techniques using geospatial data. In Paper (IV) human population occurrence and abundance in the Taita Hills highlands was predicted using the generalized additive modelling (GAM) technique. Paper (V) presents techniques to enhance fire prediction and burned area estimation at a regional scale in East Caprivi Namibia. Paper (VI) compares eight state-of-the-art predictive modelling methods to improve fire prediction, burned area estimation and fire risk mapping in East Caprivi Namibia. The results in Paper (I) showed that geospatial data can be managed effectively using advanced relational database management systems. Metapopulation data for Melitaea cinxia butterfly was successfully combined with GPS-delimited habitat patch information and climatic data. Using the geospatial database, spatial analyses were successfully conducted at habitat patch level or at more coarse analysis scales. Moreover, this study showed it appears evident that at a large-scale spatially correlated weather conditions are one of the primary causes of spatially correlated changes in Melitaea cinxia population sizes. In Paper (II) spatiotemporal characteristics of Socupulini moths description, diversity and distribution were analysed at a world-wide scale and for the first time GIS techniques were used for Scopulini moth geographical distribution analysis. This study revealed that Scopulini moths have a cosmopolitan distribution. The majority of the species have been described from the low latitudes, sub-Saharan Africa being the hot spot of species diversity. However, the taxonomical effort has been uneven among biogeographical regions. Paper III showed that forest cover change can be analysed in great detail using modern airborne imagery techniques and historical aerial photographs. However, when spatiotemporal forest cover change is studied care has to be taken in co-registration and image interpretation when historical black and white aerial photography is used. In Paper (IV) human population distribution and abundance could be modelled with fairly good results using geospatial predictors and non-Gaussian predictive modelling techniques. Moreover, land cover layer is not necessary needed as a predictor because first and second-order image texture measurements derived from satellite imagery had more power to explain the variation in dwelling unit occurrence and abundance. Paper V showed that generalized linear model (GLM) is a suitable technique for fire occurrence prediction and for burned area estimation. GLM based burned area estimations were found to be more superior than the existing MODIS burned area product (MCD45A1). However, spatial autocorrelation of fires has to be taken into account when using the GLM technique for fire occurrence prediction. Paper VI showed that novel statistical predictive modelling techniques can be used to improve fire prediction, burned area estimation and fire risk mapping at a regional scale. However, some noticeable variation between different predictive modelling techniques for fire occurrence prediction and burned area estimation existed.
Resumo:
The Taita Hills in southeastern Kenya form the northernmost part of Africa’s Eastern Arc Mountains, which have been identified by Conservation International as one of the top ten biodiversity hotspots on Earth. As with many areas of the developing world, over recent decades the Taita Hills have experienced significant population growth leading to associated major changes in land use and land cover (LULC), as well as escalating land degradation, particularly soil erosion. Multi-temporal medium resolution multispectral optical satellite data, such as imagery from the SPOT HRV, HRVIR, and HRG sensors, provides a valuable source of information for environmental monitoring and modelling at a landscape level at local and regional scales. However, utilization of multi-temporal SPOT data in quantitative remote sensing studies requires the removal of atmospheric effects and the derivation of surface reflectance factor. Furthermore, for areas of rugged terrain, such as the Taita Hills, topographic correction is necessary to derive comparable reflectance throughout a SPOT scene. Reliable monitoring of LULC change over time and modelling of land degradation and human population distribution and abundance are of crucial importance to sustainable development, natural resource management, biodiversity conservation, and understanding and mitigating climate change and its impacts. The main purpose of this thesis was to develop and validate enhanced processing of SPOT satellite imagery for use in environmental monitoring and modelling at a landscape level, in regions of the developing world with limited ancillary data availability. The Taita Hills formed the application study site, whilst the Helsinki metropolitan region was used as a control site for validation and assessment of the applied atmospheric correction techniques, where multiangular reflectance field measurements were taken and where horizontal visibility meteorological data concurrent with image acquisition were available. The proposed historical empirical line method (HELM) for absolute atmospheric correction was found to be the only applied technique that could derive surface reflectance factor within an RMSE of < 0.02 ps in the SPOT visible and near-infrared bands; an accuracy level identified as a benchmark for successful atmospheric correction. A multi-scale segmentation/object relationship modelling (MSS/ORM) approach was applied to map LULC in the Taita Hills from the multi-temporal SPOT imagery. This object-based procedure was shown to derive significant improvements over a uni-scale maximum-likelihood technique. The derived LULC data was used in combination with low cost GIS geospatial layers describing elevation, rainfall and soil type, to model degradation in the Taita Hills in the form of potential soil loss, utilizing the simple universal soil loss equation (USLE). Furthermore, human population distribution and abundance were modelled with satisfactory results using only SPOT and GIS derived data and non-Gaussian predictive modelling techniques. The SPOT derived LULC data was found to be unnecessary as a predictor because the first and second order image texture measurements had greater power to explain variation in dwelling unit occurrence and abundance. The ability of the procedures to be implemented locally in the developing world using low-cost or freely available data and software was considered. The techniques discussed in this thesis are considered equally applicable to other medium- and high-resolution optical satellite imagery, as well the utilized SPOT data.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Composition operators, Aleksandrov measures and value distribution of analytic maps in the unit disc
Resumo:
A composition operator is a linear operator that precomposes any given function with another function, which is held fixed and called the symbol of the composition operator. This dissertation studies such operators and questions related to their theory in the case when the functions to be composed are analytic in the unit disc of the complex plane. Thus the subject of the dissertation lies at the intersection of analytic function theory and operator theory. The work contains three research articles. The first article is concerned with the value distribution of analytic functions. In the literature there are two different conditions which characterize when a composition operator is compact on the Hardy spaces of the unit disc. One condition is in terms of the classical Nevanlinna counting function, defined inside the disc, and the other condition involves a family of certain measures called the Aleksandrov (or Clark) measures and supported on the boundary of the disc. The article explains the connection between these two approaches from a function-theoretic point of view. It is shown that the Aleksandrov measures can be interpreted as kinds of boundary limits of the Nevanlinna counting function as one approaches the boundary from within the disc. The other two articles investigate the compactness properties of the difference of two composition operators, which is beneficial for understanding the structure of the set of all composition operators. The second article considers this question on the Hardy and related spaces of the disc, and employs Aleksandrov measures as its main tool. The results obtained generalize those existing for the case of a single composition operator. However, there are some peculiarities which do not occur in the theory of a single operator. The third article studies the compactness of the difference operator on the Bloch and Lipschitz spaces, improving and extending results given in the previous literature. Moreover, in this connection one obtains a general result which characterizes the compactness and weak compactness of the difference of two weighted composition operators on certain weighted Hardy-type spaces.
Resumo:
We study integral representations of Gaussian processes with a pre-specified law in terms of other Gaussian processes. The dissertation consists of an introduction and of four research articles. In the introduction, we provide an overview about Volterra Gaussian processes in general, and fractional Brownian motion in particular. In the first article, we derive a finite interval integral transformation, which changes fractional Brownian motion with a given Hurst index into fractional Brownian motion with an other Hurst index. Based on this transformation, we construct a prelimit which formally converges to an analogous, infinite interval integral transformation. In the second article, we prove this convergence rigorously and show that the infinite interval transformation is a direct consequence of the finite interval transformation. In the third article, we consider general Volterra Gaussian processes. We derive measure-preserving transformations of these processes and their inherently related bridges. Also, as a related result, we obtain a Fourier-Laguerre series expansion for the first Wiener chaos of a Gaussian martingale. In the fourth article, we derive a class of ergodic transformations of self-similar Volterra Gaussian processes.
Resumo:
The stochastic filtering has been in general an estimation of indirectly observed states given observed data. This means that one is discussing conditional expected values as being one of the most accurate estimation, given the observations in the context of probability space. In my thesis, I have presented the theory of filtering using two different kind of observation process: the first one is a diffusion process which is discussed in the first chapter, while the third chapter introduces the latter which is a counting process. The majority of the fundamental results of the stochastic filtering is stated in form of interesting equations, such the unnormalized Zakai equation that leads to the Kushner-Stratonovich equation. The latter one which is known also by the normalized Zakai equation or equally by Fujisaki-Kallianpur-Kunita (FKK) equation, shows the divergence between the estimate using a diffusion process and a counting process. I have also introduced an example for the linear gaussian case, which is mainly the concept to build the so-called Kalman-Bucy filter. As the unnormalized and the normalized Zakai equations are in terms of the conditional distribution, a density of these distributions will be developed through these equations and stated by Kushner Theorem. However, Kushner Theorem has a form of a stochastic partial differential equation that needs to be verify in the sense of the existence and uniqueness of its solution, which is covered in the second chapter.