38 resultados para Asymptotic Mean Squared Errors
em CentAUR: Central Archive University of Reading - UK
Resumo:
The precision farmer wants to manage the variation in soil nutrient status continuously, which requires reliable predictions at places between sampling sites. Ordinary kriging can be used for prediction if the data are spatially dependent and there is a suitable variogram model. However, even if data are spatially correlated, there are often few soil sampling sites in relation to the area to be managed. If intensive ancillary data are available and these are coregionalized with the sparse soil data, they could be used to increase the accuracy of predictions of the soil properties by methods such as cokriging, kriging with external drift and regression kriging. This paper compares the accuracy of predictions of the plant available N properties (mineral N and potentially available N) for two arable fields in Bedfordshire, United Kingdom, from ordinary kriging, cokriging, kriging with external drift and regression kriging. For the last three, intensive elevation data were used with the soil data. The mean squared errors of prediction from these methods of kriging were determined at validation sites where the values were known. Kriging with external drift resulted in the smallest mean squared error for two of the three properties examined, and cokriging for the other. The results suggest that the use of intensive ancillary data can increase the accuracy of predictions of soil properties in arable fields provided that the variables are related spatially. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
Data such as digitized aerial photographs, electrical conductivity and yield are intensive and relatively inexpensive to obtain compared with collecting soil data by sampling. If such ancillary data are co-regionalized with the soil data they should be suitable for co-kriging. The latter requires that information for both variables is co-located at several locations; this is rarely so for soil and ancillary data. To solve this problem, we have derived values for the ancillary variable at the soil sampling locations by averaging the values within a radius of 15 m, taking the nearest-neighbour value, kriging over 5 m blocks, and punctual kriging. The cross-variograms from these data with clay content and also the pseudo cross-variogram were used to co-krige to validation points and the root mean squared errors (RMSEs) were calculated. In general, the data averaged within 15m and the punctually kriged values resulted in more accurate predictions.
Resumo:
This paper investigates whether using natural logarithms (logs) of price indices for forecasting inflation rates is preferable to employing the original series. Univariate forecasts for annual inflation rates for a number of European countries and the USA based on monthly seasonal consumer price indices are considered. Stochastic seasonality and deterministic seasonality models are used. In many cases, the forecasts based on the original variables result in substantially smaller root mean squared errors than models based on logs. In turn, if forecasts based on logs are superior, the gains are typically small. This outcome sheds doubt on the common practice in the academic literature to forecast inflation rates based on differences of logs.
Resumo:
During the development of new therapies, it is not uncommon to test whether a new treatment works better than the existing treatment for all patients who suffer from a condition (full population) or for a subset of the full population (subpopulation). One approach that may be used for this objective is to have two separate trials, where in the first trial, data are collected to determine if the new treatment benefits the full population or the subpopulation. The second trial is a confirmatory trial to test the new treatment in the population selected in the first trial. In this paper, we consider the more efficient two-stage adaptive seamless designs (ASDs), where in stage 1, data are collected to select the population to test in stage 2. In stage 2, additional data are collected to perform confirmatory analysis for the selected population. Unlike the approach that uses two separate trials, for ASDs, stage 1 data are also used in the confirmatory analysis. Although ASDs are efficient, using stage 1 data both for selection and confirmatory analysis introduces selection bias and consequently statistical challenges in making inference. We will focus on point estimation for such trials. In this paper, we describe the extent of bias for estimators that ignore multiple hypotheses and selecting the population that is most likely to give positive trial results based on observed stage 1 data. We then derive conditionally unbiased estimators and examine their mean squared errors for different scenarios.
Resumo:
Recent observations from the Argo dataset of temperature and salinity profiles are used to evaluate a series of 3-year data assimilation experiments in a global ice–ocean general circulation model. The experiments are designed to evaluate a new data assimilation system whereby salinity is assimilated along isotherms, S(T ). In addition, the role of a balancing salinity increment to maintain water mass properties is investigated. This balancing increment is found to effectively prevent spurious mixing in tropical regions induced by univariate temperature assimilation, allowing the correction of isotherm geometries without adversely influencing temperature–salinity relationships. In addition, the balancing increment is able to correct a fresh bias associated with a weak subtropical gyre in the North Atlantic using only temperature observations. The S(T ) assimilation method is found to provide an important improvement over conventional depth level assimilation, with lower root-mean-squared forecast errors over the upper 500 m in the tropical Atlantic and Pacific Oceans. An additional set of experiments is performed whereby Argo data are withheld and used for independent evaluation. The most significant improvements from Argo assimilation are found in less well-observed regions (Indian, South Atlantic and South Pacific Oceans). When Argo salinity data are assimilated in addition to temperature, improvements to modelled temperature fields are obtained due to corrections to model density gradients and the resulting circulation. It is found that observations from the Argo array provide an invaluable tool for both correcting modelled water mass properties through data assimilation and for evaluating the assimilation methods themselves.
Resumo:
The correlated k-distribution (CKD) method is widely used in the radiative transfer schemes of atmospheric models and involves dividing the spectrum into a number of bands and then reordering the gaseous absorption coefficients within each one. The fluxes and heating rates for each band may then be computed by discretizing the reordered spectrum into of order 10 quadrature points per major gas and performing a monochromatic radiation calculation for each point. In this presentation it is shown that for clear-sky longwave calculations, sufficient accuracy for most applications can be achieved without the need for bands: reordering may be performed on the entire longwave spectrum. The resulting full-spectrum correlated k (FSCK) method requires significantly fewer monochromatic calculations than standard CKD to achieve a given accuracy. The concept is first demonstrated by comparing with line-by-line calculations for an atmosphere containing only water vapor, in which it is shown that the accuracy of heating-rate calculations improves approximately in proportion to the square of the number of quadrature points. For more than around 20 points, the root-mean-squared error flattens out at around 0.015 K/day due to the imperfect rank correlation of absorption spectra at different pressures in the profile. The spectral overlap of m different gases is treated by considering an m-dimensional hypercube where each axis corresponds to the reordered spectrum of one of the gases. This hypercube is then divided up into a number of volumes, each approximated by a single quadrature point, such that the total number of quadrature points is slightly fewer than the sum of the number that would be required to treat each of the gases separately. The gaseous absorptions for each quadrature point are optimized such that they minimize a cost function expressing the deviation of the heating rates and fluxes calculated by the FSCK method from line-by-line calculations for a number of training profiles. This approach is validated for atmospheres containing water vapor, carbon dioxide, and ozone, in which it is found that in the troposphere and most of the stratosphere, heating-rate errors of less than 0.2 K/day can be achieved using a total of 23 quadrature points, decreasing to less than 0.1 K/day for 32 quadrature points. It would be relatively straightforward to extend the method to include other gases.
Resumo:
The correlated k-distribution (CKD) method is widely used in the radiative transfer schemes of atmospheric models, and involves dividing the spectrum into a number of bands and then reordering the gaseous absorption coefficients within each one. The fluxes and heating rates for each band may then be computed by discretizing the reordered spectrum into of order 10 quadrature points per major gas, and performing a pseudo-monochromatic radiation calculation for each point. In this paper it is first argued that for clear-sky longwave calculations, sufficient accuracy for most applications can be achieved without the need for bands: reordering may be performed on the entire longwave spectrum. The resulting full-spectrum correlated k (FSCK) method requires significantly fewer pseudo-monochromatic calculations than standard CKD to achieve a given accuracy. The concept is first demonstrated by comparing with line-by-line calculations for an atmosphere containing only water vapor, in which it is shown that the accuracy of heating-rate calculations improves approximately in proportion to the square of the number of quadrature points. For more than around 20 points, the root-mean-squared error flattens out at around 0.015 K d−1 due to the imperfect rank correlation of absorption spectra at different pressures in the profile. The spectral overlap of m different gases is treated by considering an m-dimensional hypercube where each axis corresponds to the reordered spectrum of one of the gases. This hypercube is then divided up into a number of volumes, each approximated by a single quadrature point, such that the total number of quadrature points is slightly fewer than the sum of the number that would be required to treat each of the gases separately. The gaseous absorptions for each quadrature point are optimized such they minimize a cost function expressing the deviation of the heating rates and fluxes calculated by the FSCK method from line-by-line calculations for a number of training profiles. This approach is validated for atmospheres containing water vapor, carbon dioxide and ozone, in which it is found that in the troposphere and most of the stratosphere, heating-rate errors of less than 0.2 K d−1 can be achieved using a total of 23 quadrature points, decreasing to less than 0.1 K d−1 for 32 quadrature points. It would be relatively straightforward to extend the method to include other gases.
Resumo:
The Gram-Schmidt (GS) orthogonalisation procedure has been used to improve the convergence speed of least mean square (LMS) adaptive code-division multiple-access (CDMA) detectors. However, this algorithm updates two sets of parameters, namely the GS transform coefficients and the tap weights, simultaneously. Because of the additional adaptation noise introduced by the former, it is impossible to achieve the same performance as the ideal orthogonalised LMS filter, unlike the result implied in an earlier paper. The authors provide a lower bound on the minimum achievable mean squared error (MSE) as a function of the forgetting factor λ used in finding the GS transform coefficients, and propose a variable-λ algorithm to balance the conflicting requirements of good tracking and low misadjustment.
Resumo:
We consider the forecasting performance of two SETAR exchange rate models proposed by Kräger and Kugler [J. Int. Money Fin. 12 (1993) 195]. Assuming that the models are good approximations to the data generating process, we show that whether the non-linearities inherent in the data can be exploited to forecast better than a random walk depends on both how forecast accuracy is assessed and on the ‘state of nature’. Evaluation based on traditional measures, such as (root) mean squared forecast errors, may mask the superiority of the non-linear models. Generalized impulse response functions are also calculated as a means of portraying the asymmetric response to shocks implied by such models.
Resumo:
Models of the dynamics of nitrogen in soil (soil-N) can be used to aid the fertilizer management of a crop. The predictions of soil-N models can be validated by comparison with observed data. Validation generally involves calculating non-spatial statistics of the observations and predictions, such as their means, their mean squared-difference, and their correlation. However, when the model predictions are spatially distributed across a landscape the model requires validation with spatial statistics. There are three reasons for this: (i) the model may be more or less successful at reproducing the variance of the observations at different spatial scales; (ii) the correlation of the predictions with the observations may be different at different spatial scales; (iii) the spatial pattern of model error may be informative. In this study we used a model, parameterized with spatially variable input information about the soil, to predict the mineral-N content of soil in an arable field, and compared the results with observed data. We validated the performance of the N model spatially with a linear mixed model of the observations and model predictions, estimated by residual maximum likelihood. This novel approach allowed us to describe the joint variation of the observations and predictions as: (i) independent random variation that occurred at a fine spatial scale; (ii) correlated random variation that occurred at a coarse spatial scale; (iii) systematic variation associated with a spatial trend. The linear mixed model revealed that, in general, the performance of the N model changed depending on the spatial scale of interest. At the scales associated with random variation, the N model underestimated the variance of the observations, and the predictions were correlated poorly with the observations. At the scale of the trend, the predictions and observations shared a common surface. The spatial pattern of the error of the N model suggested that the observations were affected by the local soil condition, but this was not accounted for by the N model. In summary, the N model would be well-suited to field-scale management of soil nitrogen, but suited poorly to management at finer spatial scales. This information was not apparent with a non-spatial validation. (c),2007 Elsevier B.V. All rights reserved.
Resumo:
High resolution vibration-rotation spectra of 13C2H2 were recorded in a number of regions from 2000 to 5200 cm−1 at Doppler or pressure limited resolution. In these spectral ranges cold and hot bands involving the bending-stretching combination levels have been analyzed up to high J values. Anharmonic quartic resonances for the combination levels ν1 + mν4 + nν5, ν2 + mν4 + (n + 2) ν5 and ν3 + (m − 1) ν4 + (n + 1) ν5 have been studied, and the l-type resonances within each polyad have been explicitly taken into account in the analysis of the data. The least-squares refinement provides deperturbed values for band origins and rotational constants, obtained by fitting rotation lines only up to J ≈ 20 with root mean square errors of ≈ 0.0003 cm−1. The band origins allowed us to determine a number of the anharmonicity constants xij0.
Resumo:
This paper investigates the applications of capture–recapture methods to human populations. Capture–recapture methods are commonly used in estimating the size of wildlife populations but can also be used in epidemiology and social sciences, for estimating prevalence of a particular disease or the size of the homeless population in a certain area. Here we focus on estimating the prevalence of infectious diseases. Several estimators of population size are considered: the Lincoln–Petersen estimator and its modified version, the Chapman estimator, Chao’s lower bound estimator, the Zelterman’s estimator, McKendrick’s moment estimator and the maximum likelihood estimator. In order to evaluate these estimators, they are applied to real, three-source, capture-recapture data. By conditioning on each of the sources of three source data, we have been able to compare the estimators with the true value that they are estimating. The Chapman and Chao estimators were compared in terms of their relative bias. A variance formula derived through conditioning is suggested for Chao’s estimator, and normal 95% confidence intervals are calculated for this and the Chapman estimator. We then compare the coverage of the respective confidence intervals. Furthermore, a simulation study is included to compare Chao’s and Chapman’s estimator. Results indicate that Chao’s estimator is less biased than Chapman’s estimator unless both sources are independent. Chao’s estimator has also the smaller mean squared error. Finally, the implications and limitations of the above methods are discussed, with suggestions for further development.
Resumo:
Proportion estimators are quite frequently used in many application areas. The conventional proportion estimator (number of events divided by sample size) encounters a number of problems when the data are sparse as will be demonstrated in various settings. The problem of estimating its variance when sample sizes become small is rarely addressed in a satisfying framework. Specifically, we have in mind applications like the weighted risk difference in multicenter trials or stratifying risk ratio estimators (to adjust for potential confounders) in epidemiological studies. It is suggested to estimate p using the parametric family (see PDF for character) and p(1 - p) using (see PDF for character), where (see PDF for character). We investigate the estimation problem of choosing c 0 from various perspectives including minimizing the average mean squared error of (see PDF for character), average bias and average mean squared error of (see PDF for character). The optimal value of c for minimizing the average mean squared error of (see PDF for character) is found to be independent of n and equals c = 1. The optimal value of c for minimizing the average mean squared error of (see PDF for character) is found to be dependent of n with limiting value c = 0.833. This might justifiy to use a near-optimal value of c = 1 in practice which also turns out to be beneficial when constructing confidence intervals of the form (see PDF for character).
Resumo:
This paper investigates the applications of capture-recapture methods to human populations. Capture-recapture methods are commonly used in estimating the size of wildlife populations but can also be used in epidemiology and social sciences, for estimating prevalence of a particular disease or the size of the homeless population in a certain area. Here we focus on estimating the prevalence of infectious diseases. Several estimators of population size are considered: the Lincoln-Petersen estimator and its modified version, the Chapman estimator, Chao's lower bound estimator, the Zelterman's estimator, McKendrick's moment estimator and the maximum likelihood estimator. In order to evaluate these estimators, they are applied to real, three-source, capture-recapture data. By conditioning on each of the sources of three source data, we have been able to compare the estimators with the true value that they are estimating. The Chapman and Chao estimators were compared in terms of their relative bias. A variance formula derived through conditioning is suggested for Chao's estimator, and normal 95% confidence intervals are calculated for this and the Chapman estimator. We then compare the coverage of the respective confidence intervals. Furthermore, a simulation study is included to compare Chao's and Chapman's estimator. Results indicate that Chao's estimator is less biased than Chapman's estimator unless both sources are independent. Chao's estimator has also the smaller mean squared error. Finally, the implications and limitations of the above methods are discussed, with suggestions for further development.
Resumo:
Finding an estimate of the channel impulse response (CIR) by correlating a received known (training) sequence with the sent training sequence is commonplace. Where required, it is also common to truncate the longer correlation to a sub-set of correlation coefficients by finding the set of N sequential correlation coefficients with the maximum power. This paper presents a new approach to selecting the optimal set of N CIR coefficients from the correlation rather than relying on power. The algorithm reconstructs a set of predicted symbols using the training sequence and various sub-sets of the correlation to find the sub-set that results in the minimum mean squared error between the actual received symbols and the reconstructed symbols. The application of the algorithm is presented in the context of the TDMA based GSM/GPRS system to demonstrate an improvement in the system performance with the new algorithm and the results are presented in the paper. However, the application lends itself to any training sequence based communication system often found within wireless consumer electronic device(1).