59 resultados para Data Accuracy
Conditioning of incremental variational data assimilation, with application to the Met Office system
Resumo:
Implementations of incremental variational data assimilation require the iterative minimization of a series of linear least-squares cost functions. The accuracy and speed with which these linear minimization problems can be solved is determined by the condition number of the Hessian of the problem. In this study, we examine how different components of the assimilation system influence this condition number. Theoretical bounds on the condition number for a single parameter system are presented and used to predict how the condition number is affected by the observation distribution and accuracy and by the specified lengthscales in the background error covariance matrix. The theoretical results are verified in the Met Office variational data assimilation system, using both pseudo-observations and real data.
Resumo:
The ASTER Global Digital Elevation Model (GDEM) has made elevation data at 30 m spatial resolution freely available, enabling reinvestigation of morphometric relationships derived from limited field data using much larger sample sizes. These data are used to analyse a range of morphometric relationships derived for dunes (between dune height, spacing, and equivalent sand thickness) in the Namib Sand Sea, which was chosen because there are a number of extant studies that could be used for comparison with the results. The relative accuracy of GDEM for capturing dune height and shape was tested against multiple individual ASTER DEM scenes and against field surveys, highlighting the smoothing of the dune crest and resultant underestimation of dune height, and the omission of the smallest dunes, because of the 30 m sampling of ASTER DEM products. It is demonstrated that morphometric relationships derived from GDEM data are broadly comparable with relationships derived by previous methods, across a range of different dune types. The data confirm patterns of dune height, spacing and equivalent sand thickness mapped previously in the Namib Sand Sea, but add new detail to these patterns.
Resumo:
The relationship between valuations and the subsequent sale price continues to be a matter of both theoretical and practical interest. This paper reports the analysis of over 700 property sales made during the 1974/90 period. Initial results imply an average under-valuation of 7% and a standard error of 18% across the sample. A number of techniques are applied to the data set using other variables such as the region, the type of property and the return from the market to explain the difference between the valuation and the subsequent sale price. The analysis reduces the unexplained error; the bias is fully accounted for and the standard error is reduced to 15.3%. This model finds that about 6% of valuations over-estimated the sale price by more than 20% and about 9% of the valuations under-estimated the sale prices by more than 20%. The results suggest that valuations are marginally more accurate than might be expected, both from consideration of theoretical considerations and from comparison with the equivalent valuation in equity markets.
Resumo:
In numerical weather prediction (NWP) data assimilation (DA) methods are used to combine available observations with numerical model estimates. This is done by minimising measures of error on both observations and model estimates with more weight given to data that can be more trusted. For any DA method an estimate of the initial forecast error covariance matrix is required. For convective scale data assimilation, however, the properties of the error covariances are not well understood. An effective way to investigate covariance properties in the presence of convection is to use an ensemble-based method for which an estimate of the error covariance is readily available at each time step. In this work, we investigate the performance of the ensemble square root filter (EnSRF) in the presence of cloud growth applied to an idealised 1D convective column model of the atmosphere. We show that the EnSRF performs well in capturing cloud growth, but the ensemble does not cope well with discontinuities introduced into the system by parameterised rain. The state estimates lose accuracy, and more importantly the ensemble is unable to capture the spread (variance) of the estimates correctly. We also find, counter-intuitively, that by reducing the spatial frequency of observations and/or the accuracy of the observations, the ensemble is able to capture the states and their variability successfully across all regimes.
Resumo:
This paper describes the implementation of a 3D variational (3D-Var) data assimilation scheme for a morphodynamic model applied to Morecambe Bay, UK. A simple decoupled hydrodynamic and sediment transport model is combined with a data assimilation scheme to investigate the ability of such methods to improve the accuracy of the predicted bathymetry. The inverse forecast error covariance matrix is modelled using a Laplacian approximation which is calibrated for the length scale parameter required. Calibration is also performed for the Soulsby-van Rijn sediment transport equations. The data used for assimilation purposes comprises waterlines derived from SAR imagery covering the entire period of the model run, and swath bathymetry data collected by a ship-borne survey for one date towards the end of the model run. A LiDAR survey of the entire bay carried out in November 2005 is used for validation purposes. The comparison of the predictive ability of the model alone with the model-forecast-assimilation system demonstrates that using data assimilation significantly improves the forecast skill. An investigation of the assimilation of the swath bathymetry as well as the waterlines demonstrates that the overall improvement is initially large, but decreases over time as the bathymetry evolves away from that observed by the survey. The result of combining the calibration runs into a pseudo-ensemble provides a higher skill score than for a single optimized model run. A brief comparison of the Optimal Interpolation assimilation method with the 3D-Var method shows that the two schemes give similar results.
Resumo:
The collection of wind speed time series by means of digital data loggers occurs in many domains, including civil engineering, environmental sciences and wind turbine technology. Since averaging intervals are often significantly larger than typical system time scales, the information lost has to be recovered in order to reconstruct the true dynamics of the system. In the present work we present a simple algorithm capable of generating a real-time wind speed time series from data logger records containing the average, maximum, and minimum values of the wind speed in a fixed interval, as well as the standard deviation. The signal is generated from a generalized random Fourier series. The spectrum can be matched to any desired theoretical or measured frequency distribution. Extreme values are specified through a postprocessing step based on the concept of constrained simulation. Applications of the algorithm to 10-min wind speed records logged at a test site at 60 m height above the ground show that the recorded 10-min values can be reproduced by the simulated time series to a high degree of accuracy.
Resumo:
This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.
Resumo:
Collaborative mining of distributed data streams in a mobile computing environment is referred to as Pocket Data Mining PDM. Hoeffding trees techniques have been experimentally and analytically validated for data stream classification. In this paper, we have proposed, developed and evaluated the adoption of distributed Hoeffding trees for classifying streaming data in PDM applications. We have identified a realistic scenario in which different users equipped with smart mobile devices run a local Hoeffding tree classifier on a subset of the attributes. Thus, we have investigated the mining of vertically partitioned datasets with possible overlap of attributes, which is the more likely case. Our experimental results have validated the efficiency of our proposed model achieving promising accuracy for real deployment.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.
Resumo:
Based on surveys undertaken with local authorities and valuers who provide the valuations on which purchase prices for local authority houses under the Right to Buy are based, this paper reports on research which aims to establish the reasons for the differences between the initial valuations provided by the local authority valuers and those provided by the District Valuer on appeal. The paper reports on the reasons why tenants appeal the initial valuation and discusses issues of valuation accuracy, uncertainty and the different and imperfect data available to valuers employed by the organisations involved, as well as the factors within the valuation process, including the absence of any requirement to agree a value, which contribute to the different outcomes.
Resumo:
Airborne lidar provides accurate height information of objects on the earth and has been recognized as a reliable and accurate surveying tool in many applications. In particular, lidar data offer vital and significant features for urban land-cover classification, which is an important task in urban land-use studies. In this article, we present an effective approach in which lidar data fused with its co-registered images (i.e. aerial colour images containing red, green and blue (RGB) bands and near-infrared (NIR) images) and other derived features are used effectively for accurate urban land-cover classification. The proposed approach begins with an initial classification performed by the Dempster–Shafer theory of evidence with a specifically designed basic probability assignment function. It outputs two results, i.e. the initial classification and pseudo-training samples, which are selected automatically according to the combined probability masses. Second, a support vector machine (SVM)-based probability estimator is adopted to compute the class conditional probability (CCP) for each pixel from the pseudo-training samples. Finally, a Markov random field (MRF) model is established to combine spatial contextual information into the classification. In this stage, the initial classification result and the CCP are exploited. An efficient belief propagation (EBP) algorithm is developed to search for the global minimum-energy solution for the maximum a posteriori (MAP)-MRF framework in which three techniques are developed to speed up the standard belief propagation (BP) algorithm. Lidar and its co-registered data acquired by Toposys Falcon II are used in performance tests. The experimental results prove that fusing the height data and optical images is particularly suited for urban land-cover classification. There is no training sample needed in the proposed approach, and the computational cost is relatively low. An average classification accuracy of 93.63% is achieved.
Resumo:
Remote sensing observations often have correlated errors, but the correlations are typically ignored in data assimilation for numerical weather prediction. The assumption of zero correlations is often used with data thinning methods, resulting in a loss of information. As operational centres move towards higher-resolution forecasting, there is a requirement to retain data providing detail on appropriate scales. Thus an alternative approach to dealing with observation error correlations is needed. In this article, we consider several approaches to approximating observation error correlation matrices: diagonal approximations, eigendecomposition approximations and Markov matrices. These approximations are applied in incremental variational assimilation experiments with a 1-D shallow water model using synthetic observations. Our experiments quantify analysis accuracy in comparison with a reference or ‘truth’ trajectory, as well as with analyses using the ‘true’ observation error covariance matrix. We show that it is often better to include an approximate correlation structure in the observation error covariance matrix than to incorrectly assume error independence. Furthermore, by choosing a suitable matrix approximation, it is feasible and computationally cheap to include error correlation structure in a variational data assimilation algorithm.
Resumo:
Full-waveform laser scanning data acquired with a Riegl LMS-Q560 instrument were used to classify an orange orchard into orange trees, grass and ground using waveform parameters alone. Gaussian decomposition was performed on this data capture from the National Airborne Field Experiment in November 2006 using a custom peak-detection procedure and a trust-region-reflective algorithm for fitting Gauss functions. Calibration was carried out using waveforms returned from a road surface, and the backscattering coefficient c was derived for every waveform peak. The processed data were then analysed according to the number of returns detected within each waveform and classified into three classes based on pulse width and c. For single-peak waveforms the scatterplot of c versus pulse width was used to distinguish between ground, grass and orange trees. In the case of multiple returns, the relationship between first (or first plus middle) and last return c values was used to separate ground from other targets. Refinement of this classification, and further sub-classification into grass and orange trees was performed using the c versus pulse width scatterplots of last returns. In all cases the separation was carried out using a decision tree with empirical relationships between the waveform parameters. Ground points were successfully separated from orange tree points. The most difficult class to separate and verify was grass, but those points in general corresponded well with the grass areas identified in the aerial photography. The overall accuracy reached 91%, using photography and relative elevation as ground truth. The overall accuracy for two classes, orange tree and combined class of grass and ground, yielded 95%. Finally, the backscattering coefficient c of single-peak waveforms was also used to derive reflectance values of the three classes. The reflectance of the orange tree class (0.31) and ground class (0.60) are consistent with published values at the wavelength of the Riegl scanner (1550 nm). The grass class reflectance (0.46) falls in between the other two classes as might be expected, as this class has a mixture of the contributions of both vegetation and ground reflectance properties.
First order k-th moment finite element analysis of nonlinear operator equations with stochastic data
Resumo:
We develop and analyze a class of efficient Galerkin approximation methods for uncertainty quantification of nonlinear operator equations. The algorithms are based on sparse Galerkin discretizations of tensorized linearizations at nominal parameters. Specifically, we consider abstract, nonlinear, parametric operator equations J(\alpha ,u)=0 for random input \alpha (\omega ) with almost sure realizations in a neighborhood of a nominal input parameter \alpha _0. Under some structural assumptions on the parameter dependence, we prove existence and uniqueness of a random solution, u(\omega ) = S(\alpha (\omega )). We derive a multilinear, tensorized operator equation for the deterministic computation of k-th order statistical moments of the random solution's fluctuations u(\omega ) - S(\alpha _0). We introduce and analyse sparse tensor Galerkin discretization schemes for the efficient, deterministic computation of the k-th statistical moment equation. We prove a shift theorem for the k-point correlation equation in anisotropic smoothness scales and deduce that sparse tensor Galerkin discretizations of this equation converge in accuracy vs. complexity which equals, up to logarithmic terms, that of the Galerkin discretization of a single instance of the mean field problem. We illustrate the abstract theory for nonstationary diffusion problems in random domains.
Resumo:
We examine how the accuracy of real-time forecasts from models that include autoregressive terms can be improved by estimating the models on ‘lightly revised’ data instead of using data from the latest-available vintage. The benefits of estimating autoregressive models on lightly revised data are related to the nature of the data revision process and the underlying process for the true values. Empirically, we find improvements in root mean square forecasting error of 2–4% when forecasting output growth and inflation with univariate models, and of 8% with multivariate models. We show that multiple-vintage models, which explicitly model data revisions, require large estimation samples to deliver competitive forecasts. Copyright © 2012 John Wiley & Sons, Ltd.