4 resultados para Sampling design

em Helda - Digital Repository of University of Helsinki


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The increase in global temperature has been attributed to increased atmospheric concentrations of greenhouse gases (GHG), mainly that of CO2. The threat of severe and complex socio-economic and ecological implications of climate change have initiated an international process that aims to reduce emissions, to increase C sinks, and to protect existing C reservoirs. The famous Kyoto protocol is an offspring of this process. The Kyoto protocol and its accords state that signatory countries need to monitor their forest C pools, and to follow the guidelines set by the IPCC in the preparation, reporting and quality assessment of the C pool change estimates. The aims of this thesis were i) to estimate the changes in carbon stocks vegetation and soil in the forests in Finnish forests from 1922 to 2004, ii) to evaluate the applied methodology by using empirical data, iii) to assess the reliability of the estimates by means of uncertainty analysis, iv) to assess the effect of forest C sinks on the reliability of the entire national GHG inventory, and finally, v) to present an application of model-based stratification to a large-scale sampling design of soil C stock changes. The applied methodology builds on the forest inventory measured data (or modelled stand data), and uses statistical modelling to predict biomasses and litter productions, as well as a dynamic soil C model to predict the decomposition of litter. The mean vegetation C sink of Finnish forests from 1922 to 2004 was 3.3 Tg C a-1, and in soil was 0.7 Tg C a-1. Soil is slowly accumulating C as a consequence of increased growing stock and unsaturated soil C stocks in relation to current detritus input to soil that is higher than in the beginning of the period. Annual estimates of vegetation and soil C stock changes fluctuated considerably during the period, were frequently opposite (e.g. vegetation was a sink but soil was a source). The inclusion of vegetation sinks into the national GHG inventory of 2003 increased its uncertainty from between -4% and 9% to ± 19% (95% CI), and further inclusion of upland mineral soils increased it to ± 24%. The uncertainties of annual sinks can be reduced most efficiently by concentrating on the quality of the model input data. Despite the decreased precision of the national GHG inventory, the inclusion of uncertain sinks improves its accuracy due to the larger sectoral coverage of the inventory. If the national soil sink estimates were prepared by repeated soil sampling of model-stratified sample plots, the uncertainties would be accounted for in the stratum formation and sample allocation. Otherwise, the increases of sampling efficiency by stratification remain smaller. The highly variable and frequently opposite annual changes in ecosystem C pools imply the importance of full ecosystem C accounting. If forest C sink estimates will be used in practice average sink estimates seem a more reasonable basis than the annual estimates. This is due to the fact that annual forest sinks vary considerably and annual estimates are uncertain, and they have severe consequences for the reliability of the total national GHG balance. The estimation of average sinks should still be based on annual or even more frequent data due to the non-linear decomposition process that is influenced by the annual climate. The methodology used in this study to predict forest C sinks can be transferred to other countries with some modifications. The ultimate verification of sink estimates should be based on comparison to empirical data, in which case the model-based stratification presented in this study can serve to improve the efficiency of the sampling design.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Modern sample surveys started to spread after statistician at the U.S. Bureau of the Census in the 1940s had developed a sampling design for the Current Population Survey (CPS). A significant factor was also that digital computers became available for statisticians. In the beginning of 1950s, the theory was documented in textbooks on survey sampling. This thesis is about the development of the statistical inference for sample surveys. For the first time the idea of statistical inference was enunciated by a French scientist, P. S. Laplace. In 1781, he published a plan for a partial investigation in which he determined the sample size needed to reach the desired accuracy in estimation. The plan was based on Laplace s Principle of Inverse Probability and on his derivation of the Central Limit Theorem. They were published in a memoir in 1774 which is one of the origins of statistical inference. Laplace s inference model was based on Bernoulli trials and binominal probabilities. He assumed that populations were changing constantly. It was depicted by assuming a priori distributions for parameters. Laplace s inference model dominated statistical thinking for a century. Sample selection in Laplace s investigations was purposive. In 1894 in the International Statistical Institute meeting, Norwegian Anders Kiaer presented the idea of the Representative Method to draw samples. Its idea was that the sample would be a miniature of the population. It is still prevailing. The virtues of random sampling were known but practical problems of sample selection and data collection hindered its use. Arhtur Bowley realized the potentials of Kiaer s method and in the beginning of the 20th century carried out several surveys in the UK. He also developed the theory of statistical inference for finite populations. It was based on Laplace s inference model. R. A. Fisher contributions in the 1920 s constitute a watershed in the statistical science He revolutionized the theory of statistics. In addition, he introduced a new statistical inference model which is still the prevailing paradigm. The essential idea is to draw repeatedly samples from the same population and the assumption that population parameters are constants. Fisher s theory did not include a priori probabilities. Jerzy Neyman adopted Fisher s inference model and applied it to finite populations with the difference that Neyman s inference model does not include any assumptions of the distributions of the study variables. Applying Fisher s fiducial argument he developed the theory for confidence intervals. Neyman s last contribution to survey sampling presented a theory for double sampling. This gave the central idea for statisticians at the U.S. Census Bureau to develop the complex survey design for the CPS. Important criterion was to have a method in which the costs of data collection were acceptable, and which provided approximately equal interviewer workloads, besides sufficient accuracy in estimation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of soil microbiota and their activities is central to the understanding of many ecosystem processes such as decomposition and nutrient cycling. The collection of microbiological data from soils generally involves several sequential steps of sampling, pretreatment and laboratory measurements. The reliability of results is dependent on reliable methods in every step. The aim of this thesis was to critically evaluate some central methods and procedures used in soil microbiological studies in order to increase our understanding of the factors that affect the measurement results and to provide guidance and new approaches for the design of experiments. The thesis focuses on four major themes: 1) soil microbiological heterogeneity and sampling, 2) storage of soil samples, 3) DNA extraction from soil, and 4) quantification of specific microbial groups by the most-probable-number (MPN) procedure. Soil heterogeneity and sampling are discussed as a single theme because knowledge on spatial (horizontal and vertical) and temporal variation is crucial when designing sampling procedures. Comparison of adjacent forest, meadow and cropped field plots showed that land use has a strong impact on the degree of horizontal variation of soil enzyme activities and bacterial community structure. However, regardless of the land use, the variation of microbiological characteristics appeared not to have predictable spatial structure at 0.5-10 m. Temporal and soil depth-related patterns were studied in relation to plant growth in cropped soil. The results showed that most enzyme activities and microbial biomass have a clear decreasing trend in the top 40 cm soil profile and a temporal pattern during the growing season. A new procedure for sampling of soil microbiological characteristics based on stratified sampling and pre-characterisation of samples was developed. A practical example demonstrated the potential of the new procedure to reduce the analysis efforts involved in laborious microbiological measurements without loss of precision. The investigation of storage of soil samples revealed that freezing (-20 °C) of small sample aliquots retains the activity of hydrolytic enzymes and the structure of the bacterial community in different soil matrices relatively well whereas air-drying cannot be recommended as a storage method for soil microbiological properties due to large reductions in activity. Freezing below -70 °C was the preferred method of storage for samples with high organic matter content. Comparison of different direct DNA extraction methods showed that the cell lysis treatment has a strong impact on the molecular size of DNA obtained and on the bacterial community structure detected. An improved MPN method for the enumeration of soil naphthalene degraders was introduced as an alternative to more complex MPN protocols or the DNA-based quantification approach. The main advantage of the new method is the simple protocol and the possibility to analyse a large number of samples and replicates simultaneously.