8 resultados para mean-square error (MSE)
em Helda - Digital Repository of University of Helsinki
Resumo:
This study examines the properties of Generalised Regression (GREG) estimators for domain class frequencies and proportions. The family of GREG estimators forms the class of design-based model-assisted estimators. All GREG estimators utilise auxiliary information via modelling. The classic GREG estimator with a linear fixed effects assisting model (GREG-lin) is one example. But when estimating class frequencies, the study variable is binary or polytomous. Therefore logistic-type assisting models (e.g. logistic or probit model) should be preferred over the linear one. However, other GREG estimators than GREG-lin are rarely used, and knowledge about their properties is limited. This study examines the properties of L-GREG estimators, which are GREG estimators with fixed-effects logistic-type models. Three research questions are addressed. First, I study whether and when L-GREG estimators are more accurate than GREG-lin. Theoretical results and Monte Carlo experiments which cover both equal and unequal probability sampling designs and a wide variety of model formulations show that in standard situations, the difference between L-GREG and GREG-lin is small. But in the case of a strong assisting model, two interesting situations arise: if the domain sample size is reasonably large, L-GREG is more accurate than GREG-lin, and if the domain sample size is very small, estimation of assisting model parameters may be inaccurate, resulting in bias for L-GREG. Second, I study variance estimation for the L-GREG estimators. The standard variance estimator (S) for all GREG estimators resembles the Sen-Yates-Grundy variance estimator, but it is a double sum of prediction errors, not of the observed values of the study variable. Monte Carlo experiments show that S underestimates the variance of L-GREG especially if the domain sample size is minor, or if the assisting model is strong. Third, since the standard variance estimator S often fails for the L-GREG estimators, I propose a new augmented variance estimator (A). The difference between S and the new estimator A is that the latter takes into account the difference between the sample fit model and the census fit model. In Monte Carlo experiments, the new estimator A outperformed the standard estimator S in terms of bias, root mean square error and coverage rate. Thus the new estimator provides a good alternative to the standard estimator.
Resumo:
The factors affecting the non-industrial, private forest landowners' (hereafter referred to using the acronym NIPF) strategic decisions in management planning are studied. A genetic algorithm is used to induce a set of rules predicting potential cut of the landowners' choices of preferred timber management strategies. The rules are based on variables describing the characteristics of the landowners and their forest holdings. The predictive ability of a genetic algorithm is compared to linear regression analysis using identical data sets. The data are cross-validated seven times applying both genetic algorithm and regression analyses in order to examine the data-sensitivity and robustness of the generated models. The optimal rule set derived from genetic algorithm analyses included the following variables: mean initial volume, landowner's positive price expectations for the next eight years, landowner being classified as farmer, and preference for the recreational use of forest property. When tested with previously unseen test data, the optimal rule set resulted in a relative root mean square error of 0.40. In the regression analyses, the optimal regression equation consisted of the following variables: mean initial volume, proportion of forestry income, intention to cut extensively in future, and positive price expectations for the next two years. The R2 of the optimal regression equation was 0.34 and the relative root mean square error obtained from the test data was 0.38. In both models, mean initial volume and positive stumpage price expectations were entered as significant predictors of potential cut of preferred timber management strategy. When tested with the complete data set of 201 observations, both the optimal rule set and the optimal regression model achieved the same level of accuracy.
Resumo:
Energiataseen mallinnus on osa KarjaKompassi-hankkeeseen liittyvää kehitystyötä. Tutkielman tavoitteena oli kehittää lypsylehmän energiatasetta etukäteen ennustavia ja tuotoskauden aikana saatavia tietoja hyödyntäviä matemaattisia malleja. Selittävinä muuttujina olivat dieetti-, rehu-, maitotuotos-, koelypsy-, elopaino- ja kuntoluokkatiedot. Tutkimuksen aineisto kerättiin 12 Suomessa tehdyistä 8 – 28 laktaatioviikon pituisesta ruokintakokeesta, jotka alkoivat heti poikimisen jälkeen. Mukana olleista 344 lypsylehmästä yksi neljäsosa oli friisiläis- ja loput ayshire-rotuisia. Vanhempien lehmien päätiedosto sisälsi 2647 havaintoa (koe * lehmä * laktaatioviikko) ja ensikoiden 1070. Aineisto käsiteltiin SAS-ohjelmiston Mixed-proseduuria käyttäen ja poikkeavat havainnot poistettiin Tukeyn menetelmällä. Korrelaatioanalyysillä tarkasteltiin energiataseen ja selittävien muuttujien välisiä yhteyksiä. Energiatase mallinnettiin regressioanalyysillä. Laktaatiopäivän vaikutusta energiataseeseen selitettiin viiden eri funktion avulla. Satunnaisena tekijänä mallissa oli lehmä kokeen sisällä. Mallin sopivuutta aineistoon tarkasteltiin jäännösvirheen, selitysasteen ja Bayesin informaatiokriteerin avulla. Parhaat mallit testattiin riippumattomassa aineistossa. Laktaatiopäivän vaikutusta energiataseeseen selitti hyvin Ali-Schaefferin funktio, jota käytettiin perusmallina. Kaikissa energiatasemalleissa vaihtelu kasvoi laktaatioviikosta 12. alkaen, kun havaintojen määrä väheni ja energiatase muuttui positiiviseksi. Ennen poikimista käytettävissä olevista muuttujista dieetin väkirehuosuus ja väkirehun syönti-indeksi paransivat selitysastetta ja pienensivät jäännösvirhettä. Ruokinnan onnistumista voidaan seurata maitotuotoksen, maidon rasvapitoisuuden ja rasva-valkuaissuhteen tai EKM:n sisältävillä malleilla. EKM:n vakiointi pienensi mallin jäännösvirhettä. Elopaino ja kuntoluokka olivat heikkoja selittäjiä. Malleja voidaan hyödyntää karjatason ruokinnan suunnittelussa ja seurannassa, mutta yksittäisen lehmän energiataseen ennustamiseen ne eivät sovellu.
Resumo:
This thesis examines the feasibility of a forest inventory method based on two-phase sampling in estimating forest attributes at the stand or substand levels for forest management purposes. The method is based on multi-source forest inventory combining auxiliary data consisting of remote sensing imagery or other geographic information and field measurements. Auxiliary data are utilized as first-phase data for covering all inventory units. Various methods were examined for improving the accuracy of the forest estimates. Pre-processing of auxiliary data in the form of correcting the spectral properties of aerial imagery was examined (I), as was the selection of aerial image features for estimating forest attributes (II). Various spatial units were compared for extracting image features in a remote sensing aided forest inventory utilizing very high resolution imagery (III). A number of data sources were combined and different weighting procedures were tested in estimating forest attributes (IV, V). Correction of the spectral properties of aerial images proved to be a straightforward and advantageous method for improving the correlation between the image features and the measured forest attributes. Testing different image features that can be extracted from aerial photographs (and other very high resolution images) showed that the images contain a wealth of relevant information that can be extracted only by utilizing the spatial organization of the image pixel values. Furthermore, careful selection of image features for the inventory task generally gives better results than inputting all extractable features to the estimation procedure. When the spatial units for extracting very high resolution image features were examined, an approach based on image segmentation generally showed advantages compared with a traditional sample plot-based approach. Combining several data sources resulted in more accurate estimates than any of the individual data sources alone. The best combined estimate can be derived by weighting the estimates produced by the individual data sources by the inverse values of their mean square errors. Despite the fact that the plot-level estimation accuracy in two-phase sampling inventory can be improved in many ways, the accuracy of forest estimates based mainly on single-view satellite and aerial imagery is a relatively poor basis for making stand-level management decisions.
Resumo:
Data on the influence of unilateral vocal fold paralysis on breathing, especially other than information obtained by spirometry, are relatively scarce. Even less is known about the effect of its treatment by vocal fold medialization. Consequently, there was a need to study the issue by combining multiple instruments capable of assessing airflow dynamics and voice. This need was emphasized by a recently developed medialization technique, autologous fascia injection; its effects on breathing have not previously been investigated. A cohort of ten patients with unilateral vocal fold paralysis was studied before and after autologous fascia injection by using flow-volume spirometry, body plethysmography and acoustic analysis of breathing and voice. Preoperative results were compared with those of ten healthy controls. A second cohort of 11 subjects with unilateral vocal fold paralysis was studied pre- and postoperatively by using flow-volume spirometry, impulse oscillometry, acoustic analysis of voice, voice handicap index and subjective assessment of dyspnoea. Preoperative peak inspiratory flow and specific airway conductance were significantly lower and airway resistance was significantly higher in the patients than in the healthy controls (78% vs. 107%, 73% vs. 116% and 182% vs. 125% of predicted; p = 0.004, p = 0.004 and p = 0.026, respectively). Patients had a higher root mean square of spectral power of tracheal sounds than controls, and three of them had wheezes as opposed to no wheezing in healthy subjects. Autologous fascia injection significantly improved acoustic parameters of the voice in both cohorts and voice handicap index in the latter cohort, indicating that this procedure successfully improved voice in unilateral vocal fold paralysis. Peak inspiratory flow decreased significantly as a consequence of this procedure (from 4.54 ± 1.68 l to 4.21 ± 1.26 l, p = 0.03, in pooled data of both cohorts), but no change occurred in the other variables of flow-volume spirometry, body-plethysmography and impulse oscillometry. Eight of the ten patients studied by acoustic analysis of breathing had wheezes after vocal fold medialization compared with only three patients before the procedure, and the numbers of wheezes per recorded inspirium and expirium increased significantly (from 0.02 to 0.42 and from 0.03 to 0.36; p = 0.028 and p = 0.043, respectively). In conclusion, unilateral vocal fold paralysis was observed to disturb forced breathing and also to cause some signs of disturbed tidal breathing. Findings of flow volume spirometry were consistent with variable extra-thoracic obstruction. Vocal fold medialization by autologous fascia injection improved the quality of the voice in patients with unilateral vocal fold paralysis, but also decreased peak inspiratory flow and induced wheezing during tidal breathing. However, these airflow changes did not appear to cause significant symptoms in patients.
Resumo:
Atrial fibrillation (AF) is the most common tachyarrhythmia and is associated with substantial morbidity, increased mortality and cost. The treatment modalities of AF have increased, but results are still far from optimal. More individualized therapy may be beneficial. Aiming for this calls improved diagnostics. Aim of this study was to find non-invasive parameters obtained during sinus rhythm reflecting electrophysiological patterns related to propensity to AF and particularly to AF occurring without any associated heart disease, lone AF. Overall 240 subjects were enrolled, 136 patients with paroxysmal lone AF and 104 controls (mean age 45 years, 75% males). Signal measurements were performed by non-invasive magnetocardiography (MCG) and by invasive electroanatomic mapping (EAM). High-pass filtering techniques and a new method based on a surface gradient technique were adapted to analyze atrial MCG signal. The EAM was used to elucidate atrial activation in patients and as a reference for MCG. The results showed that MCG mapping is an accurate method to detect atrial electrophysiologic properties. In lone paroxysmal AF, duration of the atrial depolarization complex was marginally prolonged. The difference was more obvious in women and was also related to interatrial conduction patterns. In the focal type of AF (75%), the root mean square (RMS) amplitudes of the atrial signal were normal, but in AF without demonstrable triggers the late atrial RMS amplitudes were reduced. In addition, the atrial characteristics tended to remain similar even when examined several years after the first AF episodes. The intra-atrial recordings confirmed the occurrence of three distinct sites of electrical connection from right to left atrium (LA): the Bachmann bundle (BB), the margin of the fossa ovalis (FO), and the coronary sinus ostial area (CS). The propagation of atrial signal could also be evaluated non-invasively. Three MCG atrial wave types were identified, each of which represented a distinct interatrial activation pattern. In conclusion, in paroxysmal lone AF, active focal triggers are common, atrial depolarization is slightly prolonged, but with a normal amplitude, and the arrhythmia does not necessarily lead to electrical or mechanical dysfunction of the atria. In women the prolongation of atrial depolarization is more obvious. This may be related to gender differences in presentation of AF. A significant minority of patients with lone AF lack frequent focal triggers, and in them, the late atrial signal amplitude is reduced, possibly signifying a wider degenerative process in the LA. In lone AF, natural impulse propagation to LA during sinus rhythm goes through one or more of the principal pathways described. The BB is the most common route, but in one-third, the earliest LA activation occurs outside the BB. Susceptibility to paroxysmal lone AF is associated with propagation of the atrial signal via the margin of the FO or via multiple pathways. When conduction occurs via the BB, it is related with prolonged atrial activation. Thus, altered and alternative conduction pathways may contribute to pathogenesis of lone AF. There is growing evidence of variability in genesis of AF also within lone paroxysmal AF. Present study suggests that this variation may be reflected in cardiac signal pattern. Recognizing the distinct signal profiles may assist in understanding the pathogenesis of AF and identifying subgroups for patient-tailored therapy.
Resumo:
The aim of this study was to evaluate and test methods which could improve local estimates of a general model fitted to a large area. In the first three studies, the intention was to divide the study area into sub-areas that were as homogeneous as possible according to the residuals of the general model, and in the fourth study, the localization was based on the local neighbourhood. According to spatial autocorrelation (SA), points closer together in space are more likely to be similar than those that are farther apart. Local indicators of SA (LISAs) test the similarity of data clusters. A LISA was calculated for every observation in the dataset, and together with the spatial position and residual of the global model, the data were segmented using two different methods: classification and regression trees (CART) and the multiresolution segmentation algorithm (MS) of the eCognition software. The general model was then re-fitted (localized) to the formed sub-areas. In kriging, the SA is modelled with a variogram, and the spatial correlation is a function of the distance (and direction) between the observation and the point of calculation. A general trend is corrected with the residual information of the neighbourhood, whose size is controlled by the number of the nearest neighbours. Nearness is measured as Euclidian distance. With all methods, the root mean square errors (RMSEs) were lower, but with the methods that segmented the study area, the deviance in single localized RMSEs was wide. Therefore, an element capable of controlling the division or localization should be included in the segmentation-localization process. Kriging, on the other hand, provided stable estimates when the number of neighbours was sufficient (over 30), thus offering the best potential for further studies. Even CART could be combined with kriging or non-parametric methods, such as most similar neighbours (MSN).
Resumo:
The relationship between site characteristics and understorey vegetation composition was analysed with quantitative methods, especially from the viewpoint of site quality estimation. Theoretical models were applied to an empirical data set collected from the upland forests of southern Finland comprising 104 sites dominated by Scots pine (Pinus sylvestris L.), and 165 sites dominated by Norway spruce (Picea abies (L.) Karsten). Site index H100 was used as an independent measure of site quality. A new model for the estimation of site quality at sites with a known understorey vegetation composition was introduced. It is based on the application of Bayes' theorem to the density function of site quality within the study area combined with the species-specific presence-absence response curves. The resulting posterior probability density function may be used for calculating an estimate for the site variable. Using this method, a jackknife estimate of site index H100 was calculated separately for pine- and spruce-dominated sites. The results indicated that the cross-validation root mean squared error (RMSEcv) of the estimates improved from 2.98 m down to 2.34 m relative to the "null" model (standard deviation of the sample distribution) in pine-dominated forests. In spruce-dominated forests RMSEcv decreased from 3.94 m down to 3.16 m. In order to assess these results, four other estimation methods based on understorey vegetation composition were applied to the same data set. The results showed that none of the methods was clearly superior to the others. In pine-dominated forests, RMSEcv varied between 2.34 and 2.47 m, and the corresponding range for spruce-dominated forests was from 3.13 to 3.57 m.