27 resultados para maximum pseudolikelihood (MPL) estimation
em Helda - Digital Repository of University of Helsinki
Resumo:
There is an increasing need to compare the results obtained with different methods of estimation of tree biomass in order to reduce the uncertainty in the assessment of forest biomass carbon. In this study, tree biomass was investigated in a 30-year-old Scots pine (Pinus sylvestris) (Young-Stand) and a 130-year-old mixed Norway spruce (Picea abies)-Scots pine stand (Mature-Stand) located in southern Finland (61º50' N, 24º22' E). In particular, a comparison of the results of different estimation methods was conducted to assess the reliability and suitability of their applications. For the trees in Mature-Stand, annual stem biomass increment fluctuated following a sigmoid equation, and the fitting curves reached a maximum level (from about 1 kg/yr for understorey spruce to 7 kg/yr for dominant pine) when the trees were 100 years old. Tree biomass was estimated to be about 70 Mg/ha in Young-Stand and about 220 Mg/ha in Mature-Stand. In the region (58.00-62.13 ºN, 14-34 ºE, ≤ 300 m a.s.l.) surrounding the study stands, the tree biomass accumulation in Norway spruce and Scots pine stands followed a sigmoid equation with stand age, with a maximum of 230 Mg/ha at the age of 140 years. In Mature-Stand, lichen biomass on the trees was 1.63 Mg/ha with more than half of the biomass occurring on dead branches, and the standing crop of litter lichen on the ground was about 0.09 Mg/ha. There were substantial differences among the results estimated by different methods in the stands. These results imply that a possible estimation error should be taken into account when calculating tree biomass in a stand with an indirect approach.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
This thesis examines the feasibility of a forest inventory method based on two-phase sampling in estimating forest attributes at the stand or substand levels for forest management purposes. The method is based on multi-source forest inventory combining auxiliary data consisting of remote sensing imagery or other geographic information and field measurements. Auxiliary data are utilized as first-phase data for covering all inventory units. Various methods were examined for improving the accuracy of the forest estimates. Pre-processing of auxiliary data in the form of correcting the spectral properties of aerial imagery was examined (I), as was the selection of aerial image features for estimating forest attributes (II). Various spatial units were compared for extracting image features in a remote sensing aided forest inventory utilizing very high resolution imagery (III). A number of data sources were combined and different weighting procedures were tested in estimating forest attributes (IV, V). Correction of the spectral properties of aerial images proved to be a straightforward and advantageous method for improving the correlation between the image features and the measured forest attributes. Testing different image features that can be extracted from aerial photographs (and other very high resolution images) showed that the images contain a wealth of relevant information that can be extracted only by utilizing the spatial organization of the image pixel values. Furthermore, careful selection of image features for the inventory task generally gives better results than inputting all extractable features to the estimation procedure. When the spatial units for extracting very high resolution image features were examined, an approach based on image segmentation generally showed advantages compared with a traditional sample plot-based approach. Combining several data sources resulted in more accurate estimates than any of the individual data sources alone. The best combined estimate can be derived by weighting the estimates produced by the individual data sources by the inverse values of their mean square errors. Despite the fact that the plot-level estimation accuracy in two-phase sampling inventory can be improved in many ways, the accuracy of forest estimates based mainly on single-view satellite and aerial imagery is a relatively poor basis for making stand-level management decisions.
Resumo:
Remote sensing provides methods to infer land cover information over large geographical areas at a variety of spatial and temporal resolutions. Land cover is input data for a range of environmental models and information on land cover dynamics is required for monitoring the implications of global change. Such data are also essential in support of environmental management and policymaking. Boreal forests are a key component of the global climate and a major sink of carbon. The northern latitudes are expected to experience a disproportionate and rapid warming, which can have a major impact on vegetation at forest limits. This thesis examines the use of optical remote sensing for estimating aboveground biomass, leaf area index (LAI), tree cover and tree height in the boreal forests and tundra taiga transition zone in Finland. The continuous fields of forest attributes are required, for example, to improve the mapping of forest extent. The thesis focus on studying the feasibility of satellite data at multiple spatial resolutions, assessing the potential of multispectral, -angular and -temporal information, and provides regional evaluation for global land cover data. Preprocessed ASTER, MISR and MODIS products are the principal satellite data. The reference data consist of field measurements, forest inventory data and fine resolution land cover maps. Fine resolution studies demonstrate how statistical relationships between biomass and satellite data are relatively strong in single species and low biomass mountain birch forests in comparison to higher biomass coniferous stands. The combination of forest stand data and fine resolution ASTER images provides a method for biomass estimation using medium resolution MODIS data. The multiangular data improve the accuracy of land cover mapping in the sparsely forested tundra taiga transition zone, particularly in mires. Similarly, multitemporal data improve the accuracy of coarse resolution tree cover estimates in comparison to single date data. Furthermore, the peak of the growing season is not necessarily the optimal time for land cover mapping in the northern boreal regions. The evaluated coarse resolution land cover data sets have considerable shortcomings in northernmost Finland and should be used with caution in similar regions. The quantitative reference data and upscaling methods for integrating multiresolution data are required for calibration of statistical models and evaluation of land cover data sets. The preprocessed image products have potential for wider use as they can considerably reduce the time and effort used for data processing.
Resumo:
Olkiluoto Island is situated in the northern Baltic Sea, near the southwestern coast of Finland, and is the proposed location of a spent nuclear fuel repository. This study examined Holocene palaeoseismicity in the Olkiluoto area and in the surrounding sea areas by computer simulations together with acoustic-seismic, sedimentological and dating methods. The most abundant rock type on the island is migmatic mica gneiss, intruded by tonalites, granodiorites and granites. The surrounding Baltic Sea seabed consists of Palaeoproterozoic crystalline bedrock, which is to a great extent covered by younger Mesoproterozoic sedimentary rocks. The area contains several ancient deep-seated fracture zones that divide it into bedrock blocks. The response of bedrock at the Olkiluoto site was modelled considering four future ice-age scenarios. Each scenario produced shear displacements of fractures with different times of occurrence and varying recovery rates. Generally, the larger the maximum ice load, the larger were the permanent shear displacements. For a basic case, the maximum shear displacements were a few centimetres at the proposed nuclear waste repository level, at proximately 500 m b.s.l. High-resolution, low-frequency echo-sounding was used to examine the Holocene submarine sedimentary structures and possible direct and indirect indicators of palaeoseismic activity in the northern Baltic Sea. Echo-sounding profiles of Holocene submarine sediments revealed slides and slumps, normal faults, debris flows and turbidite-type structures. The profiles also showed pockmarks and other structures related to gas or groundwater seepages, which might be related to fracture zone activation. Evidence of postglacial reactivation in the study area was derived from the spatial occurrence of some of the structures, especial the faults and the seepages, in the vicinity of some old bedrock fracture zones. Palaeoseismic event(s) (a single or several events) in the Olkiluoto area were dated and the palaeoenvironment was characterized using palaeomagnetic, biostratigraphical and lithostratigraphical methods, enhancing the reliability of the chronology. Combined lithostratigraphy, biostratigraphy and palaeomagnetic stratigraphy revealed an age estimation of 10 650 to 10 200 cal. years BP for the palaeoseismic event(s). All Holocene sediment faults in the northern Baltic Sea occur at the same stratigraphical level, the age of which is estimated at 10 700 cal. years BP (9500 radiocarbon years BP). Their movement is suggested to have been triggered by palaeoseismic event(s) when the Late Weichselian ice sheet was retreating from the site and bedrock stresses were released along the bedrock fracture zones. Since no younger or repeated traces of seismic events were found, it corroborates the suggestion that the major seismic activity occurred within a short time during and after the last deglaciation. The origin of the gas/groundwater seepages remains unclear. Their reflections in the echo-sounding profiles imply that part of the gas is derived from the organic-bearing Litorina and modern gyttja clays. However, at least some of the gas is derived from the bedrock. Additional information could be gained by pore water analysis from the pockmarks. Information on postglacial fault activation and possible gas and/or fluid discharges under high hydraulic heads has relevance in evaluating the safety assessment of a planned spent nuclear fuel repository in the region.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
The focus of this study is on statistical analysis of categorical responses, where the response values are dependent of each other. The most typical example of this kind of dependence is when repeated responses have been obtained from the same study unit. For example, in Paper I, the response of interest is the pneumococcal nasopharengyal carriage (yes/no) on 329 children. For each child, the carriage is measured nine times during the first 18 months of life, and thus repeated respones on each child cannot be assumed independent of each other. In the case of the above example, the interest typically lies in the carriage prevalence, and whether different risk factors affect the prevalence. Regression analysis is the established method for studying the effects of risk factors. In order to make correct inferences from the regression model, the associations between repeated responses need to be taken into account. The analysis of repeated categorical responses typically focus on regression modelling. However, further insights can also be gained by investigating the structure of the association. The central theme in this study is on the development of joint regression and association models. The analysis of repeated, or otherwise clustered, categorical responses is computationally difficult. Likelihood-based inference is often feasible only when the number of repeated responses for each study unit is small. In Paper IV, an algorithm is presented, which substantially facilitates maximum likelihood fitting, especially when the number of repeated responses increase. In addition, a notable result arising from this work is the freely available software for likelihood-based estimation of clustered categorical responses.
Resumo:
This thesis focuses on the issue of testing sleepiness quantitatively. The issue is relevant to policymakers concerned with traffic- and occupational safety; such testing provides a tool for safety legislation and -surveillance. The findings of this thesis provide guidelines for a posturographic sleepiness tester. Sleepiness ensuing from staying awake merely 17 h impairs our performance as much as the legally proscribed blood alcohol concentration 0.5 does. Hence, sleepiness is a major risk factor in transportation and occupational accidents. The lack of convenient, commercial sleepiness tests precludes testing impending sleepiness levels contrary to simply breath testing for alcohol intoxication. Posturography is a potential sleepiness test, since clinical diurnal balance testing suggests the hypothesis that time awake could be posturographically estimable. Relying on this hypothesis this thesis examines posturographic sleepiness testing for instrumentation purposes. Empirical results from 63 subjects for whom we tested balance with a force platform during wakefulness for maximum 36 h show that sustained wakefulness impairs balance. The results show that time awake is posturographically estimable with 88% accuracy and 97% precision which validates our hypothesis. Results also show that balance scores tested at 13:30 hours serve as a threshold to detect excessive sleepiness. Analytical results show that the test length has a marked effect on estimation accuracy: 18 s tests suffice to identify sleepiness related balance changes, but trades off some of the accuracy achieved with 30 s tests. The procedure to estimate time awake relies on equating the subject s test score to a reference table (comprising balance scores tested during sustained wakefulness, regressed against time awake). Empirical results showed that sustained wakefulness explains 60% of the diurnal balance variations, whereas the time of day explains 40% of the balance variations. The latter fact implies that time awake estimations also must rely on knowing the local times of both test and reference scores.
Resumo:
This study examines the properties of Generalised Regression (GREG) estimators for domain class frequencies and proportions. The family of GREG estimators forms the class of design-based model-assisted estimators. All GREG estimators utilise auxiliary information via modelling. The classic GREG estimator with a linear fixed effects assisting model (GREG-lin) is one example. But when estimating class frequencies, the study variable is binary or polytomous. Therefore logistic-type assisting models (e.g. logistic or probit model) should be preferred over the linear one. However, other GREG estimators than GREG-lin are rarely used, and knowledge about their properties is limited. This study examines the properties of L-GREG estimators, which are GREG estimators with fixed-effects logistic-type models. Three research questions are addressed. First, I study whether and when L-GREG estimators are more accurate than GREG-lin. Theoretical results and Monte Carlo experiments which cover both equal and unequal probability sampling designs and a wide variety of model formulations show that in standard situations, the difference between L-GREG and GREG-lin is small. But in the case of a strong assisting model, two interesting situations arise: if the domain sample size is reasonably large, L-GREG is more accurate than GREG-lin, and if the domain sample size is very small, estimation of assisting model parameters may be inaccurate, resulting in bias for L-GREG. Second, I study variance estimation for the L-GREG estimators. The standard variance estimator (S) for all GREG estimators resembles the Sen-Yates-Grundy variance estimator, but it is a double sum of prediction errors, not of the observed values of the study variable. Monte Carlo experiments show that S underestimates the variance of L-GREG especially if the domain sample size is minor, or if the assisting model is strong. Third, since the standard variance estimator S often fails for the L-GREG estimators, I propose a new augmented variance estimator (A). The difference between S and the new estimator A is that the latter takes into account the difference between the sample fit model and the census fit model. In Monte Carlo experiments, the new estimator A outperformed the standard estimator S in terms of bias, root mean square error and coverage rate. Thus the new estimator provides a good alternative to the standard estimator.
Resumo:
Leaf and needle biomasses are key factors in forest health. Insects that feed on needles cause growth losses and tree mortality. Insect outbreaks in Finnish forests have increased rapidly during the last decade and due to climate change the damages are expected to become more serious. There is a need for cost-efficient methods for inventorying these outbreaks. Remote sensing is a promising means for estimating forests and damages. The purpose of this study is to investigate the usability of airborne laser scanning in estimating Scots pine defoliation caused by the common pine sawfly (Diprion pini L.). The study area is situated in Ilomantsi district, eastern Finland. Study materials included high-pulse airborne laser scannings from July and October 2008. Reference data consisted of 90 circular field plots measured in May-June 2009. Defoliation percentage on these field plots was estimated visually. The study was made on plot-level and methods used were linear regression, unsupervised classification, Maximum likelihood method, and stepwise linear regression. Field plots were divided in defoliation classes in two different ways: When divided in two classes the defoliation percentages used were 0–20 % and 20–100 % and when divided in four classes 0–10 %, 10–20 %, 20–30 % and 30–100 %. The results varied depending on method and laser scanning. In the first laser scanning the best results were obtained with stepwise linear regression. The kappa value was 0,47 when using two classes and 0,37 when divided in four classes. In the second laser scanning the best results were obtained with Maximum likelihood. The kappa values were 0,42 and 0,37, correspondingly. The feature that explained defoliation best was vegetation index (pulses reflected from height > 2m / all pulses). There was no significant difference in the results between the two laser scannings so the seasonal change in defoliation could not be detected in this study.