6 resultados para Mean squared error

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The relationship between site characteristics and understorey vegetation composition was analysed with quantitative methods, especially from the viewpoint of site quality estimation. Theoretical models were applied to an empirical data set collected from the upland forests of southern Finland comprising 104 sites dominated by Scots pine (Pinus sylvestris L.), and 165 sites dominated by Norway spruce (Picea abies (L.) Karsten). Site index H100 was used as an independent measure of site quality. A new model for the estimation of site quality at sites with a known understorey vegetation composition was introduced. It is based on the application of Bayes' theorem to the density function of site quality within the study area combined with the species-specific presence-absence response curves. The resulting posterior probability density function may be used for calculating an estimate for the site variable. Using this method, a jackknife estimate of site index H100 was calculated separately for pine- and spruce-dominated sites. The results indicated that the cross-validation root mean squared error (RMSEcv) of the estimates improved from 2.98 m down to 2.34 m relative to the "null" model (standard deviation of the sample distribution) in pine-dominated forests. In spruce-dominated forests RMSEcv decreased from 3.94 m down to 3.16 m. In order to assess these results, four other estimation methods based on understorey vegetation composition were applied to the same data set. The results showed that none of the methods was clearly superior to the others. In pine-dominated forests, RMSEcv varied between 2.34 and 2.47 m, and the corresponding range for spruce-dominated forests was from 3.13 to 3.57 m.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In recent years, thanks to developments in information technology, large-dimensional datasets have been increasingly available. Researchers now have access to thousands of economic series and the information contained in them can be used to create accurate forecasts and to test economic theories. To exploit this large amount of information, researchers and policymakers need an appropriate econometric model.Usual time series models, vector autoregression for example, cannot incorporate more than a few variables. There are two ways to solve this problem: use variable selection procedures or gather the information contained in the series to create an index model. This thesis focuses on one of the most widespread index model, the dynamic factor model (the theory behind this model, based on previous literature, is the core of the first part of this study), and its use in forecasting Finnish macroeconomic indicators (which is the focus of the second part of the thesis). In particular, I forecast economic activity indicators (e.g. GDP) and price indicators (e.g. consumer price index), from 3 large Finnish datasets. The first dataset contains a large series of aggregated data obtained from the Statistics Finland database. The second dataset is composed by economic indicators from Bank of Finland. The last dataset is formed by disaggregated data from Statistic Finland, which I call micro dataset. The forecasts are computed following a two steps procedure: in the first step I estimate a set of common factors from the original dataset. The second step consists in formulating forecasting equations including the factors extracted previously. The predictions are evaluated using relative mean squared forecast error, where the benchmark model is a univariate autoregressive model. The results are dataset-dependent. The forecasts based on factor models are very accurate for the first dataset (the Statistics Finland one), while they are considerably worse for the Bank of Finland dataset. The forecasts derived from the micro dataset are still good, but less accurate than the ones obtained in the first case. This work leads to multiple research developments. The results here obtained can be replicated for longer datasets. The non-aggregated data can be represented in an even more disaggregated form (firm level). Finally, the use of the micro data, one of the major contributions of this thesis, can be useful in the imputation of missing values and the creation of flash estimates of macroeconomic indicator (nowcasting).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this study, a quality assessment method based on sampling of primary laser inventory units (microsegments) was analysed. The accuracy of a laser inventory carried out in Kuhmo was analysed as a case study. Field sample plots were measured on the sampled microsegments in the Kuhmo inventory area. Two main questions were considered. Did the ALS based inventory meet the accuracy requirements set for the provider and how should a reliable, cost-efficient and independent quality assessment be undertaken. The agreement between control measurement and ALS based inventory was analysed in four ways: 1) The root mean squared errors (RMSEs) and bias were calculated. 2) Scatter plots with 95% confidence intervals were plotted and the placing of identity lines was checked. 3) Bland-Altman plots were drawn so that the mean difference of attributes between the control method and ALS-method was calculated and plotted against average value of attributes. 4) The tolerance limits were defined and combined with Bland-Altman plots. The RMSE values were compared to a reference study from which the accuracy requirements had been set to the service provider. The accuracy requirements in Kuhmo were achieved, however comparison of RMSE values proved to be difficult. Field control measurements are costly and time-consuming, but they are considered to be robust. However, control measurements might include errors, which are difficult to take into account. Using the Bland-Altman plots none of the compared methods are considered to be completely exact, so this offers a fair way to interpret results of assessment. The tolerance limits to be set on order combined with Bland-Altman plots were suggested to be taken in practise. In addition, bias should be calculated for total area. Some other approaches for quality control were briefly examined. No method was found to fulfil all the required demands of statistical reliability, cost-efficiency, time efficiency, simplicity and speed of implementation. Some benefits and shortcomings of the studied methods were discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This study examines the properties of Generalised Regression (GREG) estimators for domain class frequencies and proportions. The family of GREG estimators forms the class of design-based model-assisted estimators. All GREG estimators utilise auxiliary information via modelling. The classic GREG estimator with a linear fixed effects assisting model (GREG-lin) is one example. But when estimating class frequencies, the study variable is binary or polytomous. Therefore logistic-type assisting models (e.g. logistic or probit model) should be preferred over the linear one. However, other GREG estimators than GREG-lin are rarely used, and knowledge about their properties is limited. This study examines the properties of L-GREG estimators, which are GREG estimators with fixed-effects logistic-type models. Three research questions are addressed. First, I study whether and when L-GREG estimators are more accurate than GREG-lin. Theoretical results and Monte Carlo experiments which cover both equal and unequal probability sampling designs and a wide variety of model formulations show that in standard situations, the difference between L-GREG and GREG-lin is small. But in the case of a strong assisting model, two interesting situations arise: if the domain sample size is reasonably large, L-GREG is more accurate than GREG-lin, and if the domain sample size is very small, estimation of assisting model parameters may be inaccurate, resulting in bias for L-GREG. Second, I study variance estimation for the L-GREG estimators. The standard variance estimator (S) for all GREG estimators resembles the Sen-Yates-Grundy variance estimator, but it is a double sum of prediction errors, not of the observed values of the study variable. Monte Carlo experiments show that S underestimates the variance of L-GREG especially if the domain sample size is minor, or if the assisting model is strong. Third, since the standard variance estimator S often fails for the L-GREG estimators, I propose a new augmented variance estimator (A). The difference between S and the new estimator A is that the latter takes into account the difference between the sample fit model and the census fit model. In Monte Carlo experiments, the new estimator A outperformed the standard estimator S in terms of bias, root mean square error and coverage rate. Thus the new estimator provides a good alternative to the standard estimator.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The factors affecting the non-industrial, private forest landowners' (hereafter referred to using the acronym NIPF) strategic decisions in management planning are studied. A genetic algorithm is used to induce a set of rules predicting potential cut of the landowners' choices of preferred timber management strategies. The rules are based on variables describing the characteristics of the landowners and their forest holdings. The predictive ability of a genetic algorithm is compared to linear regression analysis using identical data sets. The data are cross-validated seven times applying both genetic algorithm and regression analyses in order to examine the data-sensitivity and robustness of the generated models. The optimal rule set derived from genetic algorithm analyses included the following variables: mean initial volume, landowner's positive price expectations for the next eight years, landowner being classified as farmer, and preference for the recreational use of forest property. When tested with previously unseen test data, the optimal rule set resulted in a relative root mean square error of 0.40. In the regression analyses, the optimal regression equation consisted of the following variables: mean initial volume, proportion of forestry income, intention to cut extensively in future, and positive price expectations for the next two years. The R2 of the optimal regression equation was 0.34 and the relative root mean square error obtained from the test data was 0.38. In both models, mean initial volume and positive stumpage price expectations were entered as significant predictors of potential cut of preferred timber management strategy. When tested with the complete data set of 201 observations, both the optimal rule set and the optimal regression model achieved the same level of accuracy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Energiataseen mallinnus on osa KarjaKompassi-hankkeeseen liittyvää kehitystyötä. Tutkielman tavoitteena oli kehittää lypsylehmän energiatasetta etukäteen ennustavia ja tuotoskauden aikana saatavia tietoja hyödyntäviä matemaattisia malleja. Selittävinä muuttujina olivat dieetti-, rehu-, maitotuotos-, koelypsy-, elopaino- ja kuntoluokkatiedot. Tutkimuksen aineisto kerättiin 12 Suomessa tehdyistä 8 – 28 laktaatioviikon pituisesta ruokintakokeesta, jotka alkoivat heti poikimisen jälkeen. Mukana olleista 344 lypsylehmästä yksi neljäsosa oli friisiläis- ja loput ayshire-rotuisia. Vanhempien lehmien päätiedosto sisälsi 2647 havaintoa (koe * lehmä * laktaatioviikko) ja ensikoiden 1070. Aineisto käsiteltiin SAS-ohjelmiston Mixed-proseduuria käyttäen ja poikkeavat havainnot poistettiin Tukeyn menetelmällä. Korrelaatioanalyysillä tarkasteltiin energiataseen ja selittävien muuttujien välisiä yhteyksiä. Energiatase mallinnettiin regressioanalyysillä. Laktaatiopäivän vaikutusta energiataseeseen selitettiin viiden eri funktion avulla. Satunnaisena tekijänä mallissa oli lehmä kokeen sisällä. Mallin sopivuutta aineistoon tarkasteltiin jäännösvirheen, selitysasteen ja Bayesin informaatiokriteerin avulla. Parhaat mallit testattiin riippumattomassa aineistossa. Laktaatiopäivän vaikutusta energiataseeseen selitti hyvin Ali-Schaefferin funktio, jota käytettiin perusmallina. Kaikissa energiatasemalleissa vaihtelu kasvoi laktaatioviikosta 12. alkaen, kun havaintojen määrä väheni ja energiatase muuttui positiiviseksi. Ennen poikimista käytettävissä olevista muuttujista dieetin väkirehuosuus ja väkirehun syönti-indeksi paransivat selitysastetta ja pienensivät jäännösvirhettä. Ruokinnan onnistumista voidaan seurata maitotuotoksen, maidon rasvapitoisuuden ja rasva-valkuaissuhteen tai EKM:n sisältävillä malleilla. EKM:n vakiointi pienensi mallin jäännösvirhettä. Elopaino ja kuntoluokka olivat heikkoja selittäjiä. Malleja voidaan hyödyntää karjatason ruokinnan suunnittelussa ja seurannassa, mutta yksittäisen lehmän energiataseen ennustamiseen ne eivät sovellu.