6 resultados para generalized additive models
em Helda - Digital Repository of University of Helsinki
Resumo:
Periglacial processes act on cold, non-glacial regions where the landscape deveploment is mainly controlled by frost activity. Circa 25 percent of Earth's surface can be considered as periglacial. Geographical Information System combined with advanced statistical modeling methods, provides an efficient tool and new theoretical perspective for study of cold environments. The aim of this study was to: 1) model and predict the abundance of periglacial phenomena in subarctic environment with statistical modeling, 2) investigate the most import factors affecting the occurence of these phenomena with hierarchical partitioning, 3) compare two widely used statistical modeling methods: Generalized Linear Models and Generalized Additive Models, 4) study modeling resolution's effect on prediction and 5) study how spatially continous prediction can be obtained from point data. The observational data of this study consist of 369 points that were collected during the summers of 2009 and 2010 at the study area in Kilpisjärvi northern Lapland. The periglacial phenomena of interest were cryoturbations, slope processes, weathering, deflation, nivation and fluvial processes. The features were modeled using Generalized Linear Models (GLM) and Generalized Additive Models (GAM) based on Poisson-errors. The abundance of periglacial features were predicted based on these models to a spatial grid with a resolution of one hectare. The most important environmental factors were examined with hierarchical partitioning. The effect of modeling resolution was investigated with in a small independent study area with a spatial resolution of 0,01 hectare. The models explained 45-70 % of the occurence of periglacial phenomena. When spatial variables were added to the models the amount of explained deviance was considerably higher, which signalled a geographical trend structure. The ability of the models to predict periglacial phenomena were assessed with independent evaluation data. Spearman's correlation varied 0,258 - 0,754 between the observed and predicted values. Based on explained deviance, and the results of hierarchical partitioning, the most important environmental variables were mean altitude, vegetation and mean slope angle. The effect of modeling resolution was clear, too coarse resolution caused a loss of information, while finer resolution brought out more localized variation. The models ability to explain and predict periglacial phenomena in the study area were mostly good and moderate respectively. Differences between modeling methods were small, although the explained deviance was higher with GLM-models than GAMs. In turn, GAMs produced more realistic spatial predictions. The single most important environmental variable controlling the occurence of periglacial phenomena was mean altitude, which had strong correlations with many other explanatory variables. The ongoing global warming will have great impact especially in cold environments on high latitudes, and for this reason, an important research topic in the near future will be the response of periglacial environments to a warming climate.
Resumo:
This thesis presents novel modelling applications for environmental geospatial data using remote sensing, GIS and statistical modelling techniques. The studied themes can be classified into four main themes: (i) to develop advanced geospatial databases. Paper (I) demonstrates the creation of a geospatial database for the Glanville fritillary butterfly (Melitaea cinxia) in the Åland Islands, south-western Finland; (ii) to analyse species diversity and distribution using GIS techniques. Paper (II) presents a diversity and geographical distribution analysis for Scopulini moths at a world-wide scale; (iii) to study spatiotemporal forest cover change. Paper (III) presents a study of exotic and indigenous tree cover change detection in Taita Hills Kenya using airborne imagery and GIS analysis techniques; (iv) to explore predictive modelling techniques using geospatial data. In Paper (IV) human population occurrence and abundance in the Taita Hills highlands was predicted using the generalized additive modelling (GAM) technique. Paper (V) presents techniques to enhance fire prediction and burned area estimation at a regional scale in East Caprivi Namibia. Paper (VI) compares eight state-of-the-art predictive modelling methods to improve fire prediction, burned area estimation and fire risk mapping in East Caprivi Namibia. The results in Paper (I) showed that geospatial data can be managed effectively using advanced relational database management systems. Metapopulation data for Melitaea cinxia butterfly was successfully combined with GPS-delimited habitat patch information and climatic data. Using the geospatial database, spatial analyses were successfully conducted at habitat patch level or at more coarse analysis scales. Moreover, this study showed it appears evident that at a large-scale spatially correlated weather conditions are one of the primary causes of spatially correlated changes in Melitaea cinxia population sizes. In Paper (II) spatiotemporal characteristics of Socupulini moths description, diversity and distribution were analysed at a world-wide scale and for the first time GIS techniques were used for Scopulini moth geographical distribution analysis. This study revealed that Scopulini moths have a cosmopolitan distribution. The majority of the species have been described from the low latitudes, sub-Saharan Africa being the hot spot of species diversity. However, the taxonomical effort has been uneven among biogeographical regions. Paper III showed that forest cover change can be analysed in great detail using modern airborne imagery techniques and historical aerial photographs. However, when spatiotemporal forest cover change is studied care has to be taken in co-registration and image interpretation when historical black and white aerial photography is used. In Paper (IV) human population distribution and abundance could be modelled with fairly good results using geospatial predictors and non-Gaussian predictive modelling techniques. Moreover, land cover layer is not necessary needed as a predictor because first and second-order image texture measurements derived from satellite imagery had more power to explain the variation in dwelling unit occurrence and abundance. Paper V showed that generalized linear model (GLM) is a suitable technique for fire occurrence prediction and for burned area estimation. GLM based burned area estimations were found to be more superior than the existing MODIS burned area product (MCD45A1). However, spatial autocorrelation of fires has to be taken into account when using the GLM technique for fire occurrence prediction. Paper VI showed that novel statistical predictive modelling techniques can be used to improve fire prediction, burned area estimation and fire risk mapping at a regional scale. However, some noticeable variation between different predictive modelling techniques for fire occurrence prediction and burned area estimation existed.
Resumo:
Taman tutkielman tarkoituksena oli selvittaa metsikon rakenteen seka hakkuiden vaikutuksia pintakasvillisuuden lajikoostumukseen ja biomassaan Etela-Suomen lehtomaisilla, tuoreilla ja kuivahkoilla kankailla. Aineistona tassa tyossa on 8. valtakunnan metsien inventoinnin yhteydessa vuosina 1985–86 metsaluonnon ja ympariston tilan seurantaa varten perustetuista noin 3 000 pysyvasta koealasta poimittu otos. Pintakasvillisuuden lajisto muuttuu metsikon kehitysvaiheen mukaan. Hakkuu on huomattava hairio, joka aiheuttaa nopeita ja suuria muutoksia pintakasvillisuudessa. Pintakasvillisuutta on tarkasteltu lahinna lajiryhmittain (heinat, ruohot, varvut, sammalet seka jakalat). Kunkin lajiryhman peittavyyden eroavaisuuksia testattiin varianssianalyysilla kun selittavana muuttujana ovat luokittain metsikon ika ja edellisesta hakkuusta kulunut aika. Lajikohtaisia tarkasteluja on sen sijaan tehty kasvillisuuden ordinaatioanalyyseilla. Tassa kaytetty ordinaatiomenetelma on epametrinen moniulotteinen skaalaus (Non-metric multidimensional scaling, NMDS), jonka avulla voidaan tehda paatelmia kasvillisuuden rakenteen ekologisesta vaihtelusta ymparistomuuttujien suhteen. Harvennus- ja avohakkuiden vaikutuksia pintakasvillisuuteen myos mallinnettiin lajiryhmittain kayttaen yleistettyja lineaarisia malleja (Generalized linear models). Lajiryhmien peittavyyksien kehitysta mallinnettiin puuston pohjapinta-alan funktiona. Metsikon ian kasvaessa heinien ja ruohojen osuus pienenee, kun taas varpujen ja sammalten osuus lisaantyy. Harvennushakkuiden vaikutukset ovat lievempia kuin avohakkuiden eivatka ne useimmiten aiheuttaneet tilastollisesti merkittavia muutoksia pintakasvillisuuden peittavyyksissa. Avohakkuu sen sijaan on voimakkaampi ja aiheuttaa merkittavia muutoksia. Heinia ja ruohoja esiintyy hakkuun jalkeen enemman ja vastaavasti sammalet ja varvut taantuvat. Kasvillisuuden kokonaispeittavyys ja biomassa ovat suurimmillaan hakkaamattomissa metsikoissa. Harvennushakkuun jalkeen peittavyys ja biomassa voi kuitenkin hetkellisesti olla suurimmillaan kun harvennuksesta on kulunut muutama vuosi. Yleistetyt lineaariset mallit kuvasivat pintakasvillisuuden kehitysta metsikon pohjapinta-alan funktiona luotettavasti. Malleja voidaan kayttaa myos ennustamaan miten pintakasvillisuus kehittyy avohakkuun jalkeen. Malleja voidaan soveltaa esimerkiksi laskettaessa pintakasvillisuuden sitoman hiilen maaraa eriikaisissa metsissa. Niiden avulla voidaan myos arvioida esimerkiksi avohakkuuta voimaperaisemman energiapuun korjuun vaikutuksia pintakasvillisuuden runsauteen.
Resumo:
This thesis addresses modeling of financial time series, especially stock market returns and daily price ranges. Modeling data of this kind can be approached with so-called multiplicative error models (MEM). These models nest several well known time series models such as GARCH, ACD and CARR models. They are able to capture many well established features of financial time series including volatility clustering and leptokurtosis. In contrast to these phenomena, different kinds of asymmetries have received relatively little attention in the existing literature. In this thesis asymmetries arise from various sources. They are observed in both conditional and unconditional distributions, for variables with non-negative values and for variables that have values on the real line. In the multivariate context asymmetries can be observed in the marginal distributions as well as in the relationships of the variables modeled. New methods for all these cases are proposed. Chapter 2 considers GARCH models and modeling of returns of two stock market indices. The chapter introduces the so-called generalized hyperbolic (GH) GARCH model to account for asymmetries in both conditional and unconditional distribution. In particular, two special cases of the GARCH-GH model which describe the data most accurately are proposed. They are found to improve the fit of the model when compared to symmetric GARCH models. The advantages of accounting for asymmetries are also observed through Value-at-Risk applications. Both theoretical and empirical contributions are provided in Chapter 3 of the thesis. In this chapter the so-called mixture conditional autoregressive range (MCARR) model is introduced, examined and applied to daily price ranges of the Hang Seng Index. The conditions for the strict and weak stationarity of the model as well as an expression for the autocorrelation function are obtained by writing the MCARR model as a first order autoregressive process with random coefficients. The chapter also introduces inverse gamma (IG) distribution to CARR models. The advantages of CARR-IG and MCARR-IG specifications over conventional CARR models are found in the empirical application both in- and out-of-sample. Chapter 4 discusses the simultaneous modeling of absolute returns and daily price ranges. In this part of the thesis a vector multiplicative error model (VMEM) with asymmetric Gumbel copula is found to provide substantial benefits over the existing VMEM models based on elliptical copulas. The proposed specification is able to capture the highly asymmetric dependence of the modeled variables thereby improving the performance of the model considerably. The economic significance of the results obtained is established when the information content of the volatility forecasts derived is examined.
Resumo:
This thesis studies quantile residuals and uses different methodologies to develop test statistics that are applicable in evaluating linear and nonlinear time series models based on continuous distributions. Models based on mixtures of distributions are of special interest because it turns out that for those models traditional residuals, often referred to as Pearson's residuals, are not appropriate. As such models have become more and more popular in practice, especially with financial time series data there is a need for reliable diagnostic tools that can be used to evaluate them. The aim of the thesis is to show how such diagnostic tools can be obtained and used in model evaluation. The quantile residuals considered here are defined in such a way that, when the model is correctly specified and its parameters are consistently estimated, they are approximately independent with standard normal distribution. All the tests derived in the thesis are pure significance type tests and are theoretically sound in that they properly take the uncertainty caused by parameter estimation into account. -- In Chapter 2 a general framework based on the likelihood function and smooth functions of univariate quantile residuals is derived that can be used to obtain misspecification tests for various purposes. Three easy-to-use tests aimed at detecting non-normality, autocorrelation, and conditional heteroscedasticity in quantile residuals are formulated. It also turns out that these tests can be interpreted as Lagrange Multiplier or score tests so that they are asymptotically optimal against local alternatives. Chapter 3 extends the concept of quantile residuals to multivariate models. The framework of Chapter 2 is generalized and tests aimed at detecting non-normality, serial correlation, and conditional heteroscedasticity in multivariate quantile residuals are derived based on it. Score test interpretations are obtained for the serial correlation and conditional heteroscedasticity tests and in a rather restricted special case for the normality test. In Chapter 4 the tests are constructed using the empirical distribution function of quantile residuals. So-called Khmaladze s martingale transformation is applied in order to eliminate the uncertainty caused by parameter estimation. Various test statistics are considered so that critical bounds for histogram type plots as well as Quantile-Quantile and Probability-Probability type plots of quantile residuals are obtained. Chapters 2, 3, and 4 contain simulations and empirical examples which illustrate the finite sample size and power properties of the derived tests and also how the tests and related graphical tools based on residuals are applied in practice.
Resumo:
The most prominent objective of the thesis is the development of the generalized descriptive set theory, as we call it. There, we study the space of all functions from a fixed uncountable cardinal to itself, or to a finite set of size two. These correspond to generalized notions of the universal Baire space (functions from natural numbers to themselves with the product topology) and the Cantor space (functions from natural numbers to the {0,1}-set) respectively. We generalize the notion of Borel sets in three different ways and study the corresponding Borel structures with the aims of generalizing classical theorems of descriptive set theory or providing counter examples. In particular we are interested in equivalence relations on these spaces and their Borel reducibility to each other. The last chapter shows, using game-theoretic techniques, that the order of Borel equivalence relations under Borel reduciblity has very high complexity. The techniques in the above described set theoretical side of the thesis include forcing, general topological notions such as meager sets and combinatorial games of infinite length. By coding uncountable models to functions, we are able to apply the understanding of the generalized descriptive set theory to the model theory of uncountable models. The links between the theorems of model theory (including Shelah's classification theory) and the theorems in pure set theory are provided using game theoretic techniques from Ehrenfeucht-Fraïssé games in model theory to cub-games in set theory. The bottom line of the research declairs that the descriptive (set theoretic) complexity of an isomorphism relation of a first-order definable model class goes in synch with the stability theoretical complexity of the corresponding first-order theory. The first chapter of the thesis has slightly different focus and is purely concerned with a certain modification of the well known Ehrenfeucht-Fraïssé games. There we (me and my supervisor Tapani Hyttinen) answer some natural questions about that game mainly concerning determinacy and its relation to the standard EF-game