916 resultados para Sampling (Statistics)
Conditioning model output statistics of regional climate model precipitation on circulation patterns
Resumo:
Dynamical downscaling of Global Climate Models (GCMs) through regional climate models (RCMs) potentially improves the usability of the output for hydrological impact studies. However, a further downscaling or interpolation of precipitation from RCMs is often needed to match the precipitation characteristics at the local scale. This study analysed three Model Output Statistics (MOS) techniques to adjust RCM precipitation; (1) a simple direct method (DM), (2) quantile-quantile mapping (QM) and (3) a distribution-based scaling (DBS) approach. The modelled precipitation was daily means from 16 RCMs driven by ERA40 reanalysis data over the 1961–2000 provided by the ENSEMBLES (ENSEMBLE-based Predictions of Climate Changes and their Impacts) project over a small catchment located in the Midlands, UK. All methods were conditioned on the entire time series, separate months and using an objective classification of Lamb's weather types. The performance of the MOS techniques were assessed regarding temporal and spatial characteristics of the precipitation fields, as well as modelled runoff using the HBV rainfall-runoff model. The results indicate that the DBS conditioned on classification patterns performed better than the other methods, however an ensemble approach in terms of both climate models and downscaling methods is recommended to account for uncertainties in the MOS methods.
Resumo:
Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference.
Resumo:
Serial sampling and stable isotope analysis performed along the growth axis of vertebrate tooth enamel records differences attributed to seasonal variation in diet, climate or animal movement. Because several months are required to obtain mature enamel in large mammals, modifications in the isotopic composition of environmental parameters are not instantaneously recorded, and stable isotope analysis of tooth enamel returns a time-averaged signal attenuated in its amplitude relative to the input signal. For convenience, stable isotope profiles are usually determined on the side of the tooth where enamel is thickest. Here we investigate the possibility of improving the time resolution by targeting the side of the tooth where enamel is thinnest. Observation of developing third molars (M3) in sheep shows that the tooth growth rate is not constant but decreases exponentially, while the angle between the first layer of enamel deposited and the enamel–dentine junction increases as a tooth approaches its maximal length. We also noted differences in thickness and geometry of enamel growth between the mesial side (i.e., the side facing the M2) and the buccal side (i.e., the side facing the cheek) of the M3. Carbon and oxygen isotope variations were measured along the M3 teeth from eight sheep raised under controlled conditions. Intra-tooth variability was systematically larger along the mesial side and the difference in amplitude between the two sides was proportional to the time of exposure to the input signal. Although attenuated, the mesial side records variations in the environmental signal more faithfully than the buccal side. This approach can be adapted to other mammals whose teeth show lateral variation in enamel thickness and could potentially be used as an internal check for diagenesis.
Resumo:
Many macroeconomic series, such as U.S. real output growth, are sampled quarterly, although potentially useful predictors are often observed at a higher frequency. We look at whether a mixed data-frequency sampling (MIDAS) approach can improve forecasts of output growth. The MIDAS specification used in the comparison uses a novel way of including an autoregressive term. We find that the use of monthly data on the current quarter leads to significant improvement in forecasting current and next quarter output growth, and that MIDAS is an effective way to exploit monthly data compared with alternative methods.
Resumo:
The objective of this paper is to apply the mis-specification (M-S) encompassing perspective to the problem of choosing between linear and log-linear unit-root models. A simple M-S encompassing test, based on an auxiliary regression stemming from the conditional second moment, is proposed and its empirical size and power are investigated using Monte Carlo simulations. It is shown that by focusing on the conditional process the sampling distributions of the relevant statistics are well behaved under both the null and alternative hypotheses. The proposed M-S encompassing test is illustrated using US total disposable income quarterly data.
Resumo:
This note caveats standard statistics which accompany chess endgame tables, EGTs. It refers to Nalimov's double-counting of pawnless positions with both Kings on a long diagonal, and to the inclusion of positions which are not reachable from the initial position.
Resumo:
This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to re-balance the class distribution of the original imbalanced data set under the principle that synthetic data sample follows the same statistical properties. Based on the over-sampled training data, the radial basis function (RBF) classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier’s structure and the parameters of RBF kernels are determined using a particle swarm optimisation algorithm based on the criterion of minimising the leave-one-out misclassification rate. The effectiveness of the proposed PDFOS approach is demonstrated by the empirical study on several imbalanced data sets.
Resumo:
The Lincoln–Petersen estimator is one of the most popular estimators used in capture–recapture studies. It was developed for a sampling situation in which two sources independently identify members of a target population. For each of the two sources, it is determined if a unit of the target population is identified or not. This leads to a 2 × 2 table with frequencies f11, f10, f01, f00 indicating the number of units identified by both sources, by the first but not the second source, by the second but not the first source and not identified by any of the two sources, respectively. However, f00 is unobserved so that the 2 × 2 table is incomplete and the Lincoln–Petersen estimator provides an estimate for f00. In this paper, we consider a generalization of this situation for which one source provides not only a binary identification outcome but also a count outcome of how many times a unit has been identified. Using a truncated Poisson count model, truncating multiple identifications larger than two, we propose a maximum likelihood estimator of the Poisson parameter and, ultimately, of the population size. This estimator shows benefits, in comparison with Lincoln–Petersen’s, in terms of bias and efficiency. It is possible to test the homogeneity assumption that is not testable in the Lincoln–Petersen framework. The approach is applied to surveillance data on syphilis from Izmir, Turkey.
Resumo:
Sensory thresholds are often collected through ascending forced-choice methods. Group thresholds are important for comparing stimuli or populations; yet, the method has two problems. An individual may correctly guess the correct answer at any concentration step and might detect correctly at low concentrations but become adapted or fatigued at higher concentrations. The survival analysis method deals with both issues. Individual sequences of incorrect and correct answers are adjusted, taking into account the group performance at each concentration. The technique reduces the chance probability where there are consecutive correct answers. Adjusted sequences are submitted to survival analysis to determine group thresholds. The technique was applied to an aroma threshold and a taste threshold study. It resulted in group thresholds similar to ASTM or logarithmic regression procedures. Significant differences in taste thresholds between younger and older adults were determined. The approach provides a more robust technique over previous estimation methods.
Resumo:
Monthly zonal mean climatologies of atmospheric measurements from satellite instruments can have biases due to the nonuniform sampling of the atmosphere by the instruments. We characterize potential sampling biases in stratospheric trace gas climatologies of the Stratospheric Processes and Their Role in Climate (SPARC) Data Initiative using chemical fields from a chemistry climate model simulation and sampling patterns from 16 satellite-borne instruments. The exercise is performed for the long-lived stratospheric trace gases O3 and H2O. Monthly sampling biases for O3 exceed 10% for many instruments in the high-latitude stratosphere and in the upper troposphere/lower stratosphere, while annual mean sampling biases reach values of up to 20% in the same regions for some instruments. Sampling biases for H2O are generally smaller than for O3, although still notable in the upper troposphere/lower stratosphere and Southern Hemisphere high latitudes. The most important mechanism leading to monthly sampling bias is nonuniform temporal sampling, i.e., the fact that for many instruments, monthly means are produced from measurements which span less than the full month in question. Similarly, annual mean sampling biases are well explained by nonuniformity in the month-to-month sampling by different instruments. Nonuniform sampling in latitude and longitude are shown to also lead to nonnegligible sampling biases, which are most relevant for climatologies which are otherwise free of biases due to nonuniform temporal sampling.
Resumo:
For certain observing types, such as those that are remotely sensed, the observation errors are correlated and these correlations are state- and time-dependent. In this work, we develop a method for diagnosing and incorporating spatially correlated and time-dependent observation error in an ensemble data assimilation system. The method combines an ensemble transform Kalman filter with a method that uses statistical averages of background and analysis innovations to provide an estimate of the observation error covariance matrix. To evaluate the performance of the method, we perform identical twin experiments using the Lorenz ’96 and Kuramoto-Sivashinsky models. Using our approach, a good approximation to the true observation error covariance can be recovered in cases where the initial estimate of the error covariance is incorrect. Spatial observation error covariances where the length scale of the true covariance changes slowly in time can also be captured. We find that using the estimated correlated observation error in the assimilation improves the analysis.
Resumo:
With a rapidly increasing fraction of electricity generation being sourced from wind, extreme wind power generation events such as prolonged periods of low (or high) generation and ramps in generation, are a growing concern for the efficient and secure operation of national power systems. As extreme events occur infrequently, long and reliable meteorological records are required to accurately estimate their characteristics. Recent publications have begun to investigate the use of global meteorological “reanalysis” data sets for power system applications, many of which focus on long-term average statistics such as monthly-mean generation. Here we demonstrate that reanalysis data can also be used to estimate the frequency of relatively short-lived extreme events (including ramping on sub-daily time scales). Verification against 328 surface observation stations across the United Kingdom suggests that near-surface wind variability over spatiotemporal scales greater than around 300 km and 6 h can be faithfully reproduced using reanalysis, with no need for costly dynamical downscaling. A case study is presented in which a state-of-the-art, 33 year reanalysis data set (MERRA, from NASA-GMAO), is used to construct an hourly time series of nationally-aggregated wind power generation in Great Britain (GB), assuming a fixed, modern distribution of wind farms. The resultant generation estimates are highly correlated with recorded data from National Grid in the recent period, both for instantaneous hourly values and for variability over time intervals greater than around 6 h. This 33 year time series is then used to quantify the frequency with which different extreme GB-wide wind power generation events occur, as well as their seasonal and inter-annual variability. Several novel insights into the nature of extreme wind power generation events are described, including (i) that the number of prolonged low or high generation events is well approximated by a Poission-like random process, and (ii) whilst in general there is large seasonal variability, the magnitude of the most extreme ramps is similar in both summer and winter. An up-to-date version of the GB case study data as well as the underlying model are freely available for download from our website: http://www.met.reading.ac.uk/~energymet/data/Cannon2014/.
Resumo:
The high computational cost of calculating the radiative heating rates in numerical weather prediction (NWP) and climate models requires that calculations are made infrequently, leading to poor sampling of the fast-changing cloud field and a poor representation of the feedback that would occur. This paper presents two related schemes for improving the temporal sampling of the cloud field. Firstly, the ‘split time-stepping’ scheme takes advantage of the independent nature of the monochromatic calculations of the ‘correlated-k’ method to split the calculation into gaseous absorption terms that are highly dependent on changes in cloud (the optically thin terms) and those that are not (optically thick). The small number of optically thin terms can then be calculated more often to capture changes in the grey absorption and scattering associated with cloud droplets and ice crystals. Secondly, the ‘incremental time-stepping’ scheme uses a simple radiative transfer calculation using only one or two monochromatic calculations representing the optically thin part of the atmospheric spectrum. These are found to be sufficient to represent the heating rate increments caused by changes in the cloud field, which can then be added to the last full calculation of the radiation code. We test these schemes in an operational forecast model configuration and find a significant improvement is achieved, for a small computational cost, over the current scheme employed at the Met Office. The ‘incremental time-stepping’ scheme is recommended for operational use, along with a new scheme to correct the surface fluxes for the change in solar zenith angle between radiation calculations.
Resumo:
The turbulent structure of a stratocumulus-topped marine boundary layer over a 2-day period is observed with a Doppler lidar at Mace Head in Ireland. Using profiles of vertical velocity statistics, the bulk of the mixing is identified as cloud driven. This is supported by the pertinent feature of negative vertical velocity skewness in the sub-cloud layer which extends, on occasion, almost to the surface. Both coupled and decoupled turbulence characteristics are observed. The length and timescales related to the cloud-driven mixing are investigated and shown to provide additional information about the structure and the source of the mixing inside the boundary layer. They are also shown to place constraints on the length of the sampling periods used to derive products, such as the turbulent dissipation rate, from lidar measurements. For this, the maximum wavelengths that belong to the inertial subrange are studied through spectral analysis of the vertical velocity. The maximum wavelength of the inertial subrange in the cloud-driven layer scales relatively well with the corresponding layer depth during pronounced decoupled structure identified from the vertical velocity skewness. However, on many occasions, combining the analysis of the inertial subrange and vertical velocity statistics suggests higher decoupling height than expected from the skewness profiles. Our results show that investigation of the length scales related to the inertial subrange significantly complements the analysis of the vertical velocity statistics and enables a more confident interpretation of complex boundary layer structures using measurements from a Doppler lidar.