909 resultados para BIAS CORRECTION
Resumo:
We developed an analytical method and constrained procedural boundary conditions that enable accurate and precise Zn isotope ratio measurements in urban aerosols. We also demonstrate the potential of this new isotope system for air pollutant source tracing. The procedural blank is around 5 ng and significantly lower than published methods due to a tailored ion chromatographic separation. Accurate mass bias correction using external correction with Cu is limited to Zn sample content of approximately 50 ng due to the combined effect of blank contribution of Cu and Zn from the ion exchange procedure and the need to maintain a Cu/Zn ratio of approximately 1. Mass bias is corrected for by applying the common analyte internal standardization method approach. Comparison with other mass bias correction methods demonstrates the accuracy of the method. The average precision of delta(66)Zn determinations in aerosols is around 0.05% per atomic mass unit. The method was tested on aerosols collected in Sin Paulo City, Brazil. The measurements reveal significant variations in delta(66)Zn(Imperial) ranging between -0.96 and -0.37% in coarse and between -1.04 and 0.02% in fine particular matter. This variability suggests that Zn isotopic compositions distinguish atmospheric sources. The isotopic light signature suggests traffic as the main source. We present further delta(66)Zn(Imperial) data for the standard reference material NIST SRM 2783 (delta 66Z(Imperial) = 0.26 +/- 0.10%).
Resumo:
This paper develops a bias correction scheme for a multivariate heteroskedastic errors-in-variables model. The applicability of this model is justified in areas such as astrophysics, epidemiology and analytical chemistry, where the variables are subject to measurement errors and the variances vary with the observations. We conduct Monte Carlo simulations to investigate the performance of the corrected estimators. The numerical results show that the bias correction scheme yields nearly unbiased estimates. We also give an application to a real data set.
Resumo:
We analyse the finite-sample behaviour of two second-order bias-corrected alternatives to the maximum-likelihood estimator of the parameters in a multivariate normal regression model with general parametrization proposed by Patriota and Lemonte [A. G. Patriota and A. J. Lemonte, Bias correction in a multivariate regression model with genereal parameterization, Stat. Prob. Lett. 79 (2009), pp. 1655-1662]. The two finite-sample corrections we consider are the conventional second-order bias-corrected estimator and the bootstrap bias correction. We present the numerical results comparing the performance of these estimators. Our results reveal that analytical bias correction outperforms numerical bias corrections obtained from bootstrapping schemes.
Resumo:
We introduce, for the first time, a new class of Birnbaum-Saunders nonlinear regression models potentially useful in lifetime data analysis. The class generalizes the regression model described by Rieck and Nedelman [Rieck, J.R., Nedelman, J.R., 1991. A log-linear model for the Birnbaum-Saunders distribution. Technometrics 33, 51-60]. We discuss maximum-likelihood estimation for the parameters of the model, and derive closed-form expressions for the second-order biases of these estimates. Our formulae are easily computed as ordinary linear regressions and are then used to define bias corrected maximum-likelihood estimates. Some simulation results show that the bias correction scheme yields nearly unbiased estimates without increasing the mean squared errors. Two empirical applications are analysed and discussed. Crown Copyright (C) 2009 Published by Elsevier B.V. All rights reserved.
Resumo:
The heteroskedasticity-consistent covariance matrix estimator proposed by White (1980), also known as HC0, is commonly used in practical applications and is implemented into a number of statistical software. Cribari–Neto, Ferrari & Cordeiro (2000) have developed a bias-adjustment scheme that delivers bias-corrected White estimators. There are several variants of the original White estimator that also commonly used by practitioners. These include the HC1, HC2 and HC3 estimators, which have proven to have superior small-sample behavior relative to White’s estimator. This paper defines a general bias-correction mechamism that can be applied not only to White’s estimator, but to variants of this estimator as well, such as HC1, HC2 and HC3. Numerical evidence on the usefulness of the proposed corrections is also presented. Overall, the results favor the sequence of improved HC2 estimators.
Resumo:
This work is an assessment of frequency of extreme values (EVs) of daily rainfall in the city of São Paulo. Brazil, over the period 1933-2005, based on the peaks-over-threshold (POT) and Generalized Pareto Distribution (GPD) approach. Usually. a GPD model is fitted to a sample of POT Values Selected With a constant threshold. However. in this work we use time-dependent thresholds, composed of relatively large p quantities (for example p of 0.97) of daily rainfall amounts computed from all available data. Samples of POT values were extracted with several Values of p. Four different GPD models (GPD-1, GPD-2, GPD-3. and GDP-4) were fitted to each one of these samples by the maximum likelihood (ML) method. The shape parameter was assumed constant for the four models, but time-varying covariates were incorporated into scale parameter of GPD-2. GPD-3, and GPD-4, describing annual cycle in GPD-2. linear trend in GPD-3, and both annual cycle and linear trend in GPD-4. The GPD-1 with constant scale and shape parameters is the simplest model. For identification of the best model among the four models WC used rescaled Akaike Information Criterion (AIC) with second-order bias correction. This criterion isolates GPD-3 as the best model, i.e. the one with positive linear trend in the scale parameter. The slope of this trend is significant compared to the null hypothesis of no trend, for about 98% confidence level. The non-parametric Mann-Kendall test also showed presence of positive trend in the annual frequency of excess over high thresholds. with p-value being virtually zero. Therefore. there is strong evidence that high quantiles of daily rainfall in the city of São Paulo have been increasing in magnitude and frequency over time. For example. 0.99 quantiles of daily rainfall amount have increased by about 40 mm between 1933 and 2005. Copyright (C) 2008 Royal Meteorological Society
Resumo:
A rigorous asymptotic theory for Wald residuals in generalized linear models is not yet available. The authors provide matrix formulae of order O(n(-1)), where n is the sample size, for the first two moments of these residuals. The formulae can be applied to many regression models widely used in practice. The authors suggest adjusted Wald residuals to these models with approximately zero mean and unit variance. The expressions were used to analyze a real dataset. Some simulation results indicate that the adjusted Wald residuals are better approximated by the standard normal distribution than the Wald residuals.
Resumo:
The quality of temperature and humidity retrievals from the infrared SEVIRI sensors on the geostationary Meteosat Second Generation (MSG) satellites is assessed by means of a one dimensional variational algorithm. The study is performed with the aim of improving the spatial and temporal resolution of available observations to feed analysis systems designed for high resolution regional scale numerical weather prediction (NWP) models. The non-hydrostatic forecast model COSMO (COnsortium for Small scale MOdelling) in the ARPA-SIM operational configuration is used to provide background fields. Only clear sky observations over sea are processed. An optimised 1D–VAR set-up comprising of the two water vapour and the three window channels is selected. It maximises the reduction of errors in the model backgrounds while ensuring ease of operational implementation through accurate bias correction procedures and correct radiative transfer simulations. The 1D–VAR retrieval quality is firstly quantified in relative terms employing statistics to estimate the reduction in the background model errors. Additionally the absolute retrieval accuracy is assessed comparing the analysis with independent radiosonde and satellite observations. The inclusion of satellite data brings a substantial reduction in the warm and dry biases present in the forecast model. Moreover it is shown that the retrieval profiles generated by the 1D–VAR are well correlated with the radiosonde measurements. Subsequently the 1D–VAR technique is applied to two three–dimensional case–studies: a false alarm case–study occurred in Friuli–Venezia–Giulia on the 8th of July 2004 and a heavy precipitation case occurred in Emilia–Romagna region between 9th and 12th of April 2005. The impact of satellite data for these two events is evaluated in terms of increments in the integrated water vapour and saturation water vapour over the column, in the 2 meters temperature and specific humidity and in the surface temperature. To improve the 1D–VAR technique a method to calculate flow–dependent model error covariance matrices is also assessed. The approach employs members from an ensemble forecast system generated by perturbing physical parameterisation schemes inside the model. The improved set–up applied to the case of 8th of July 2004 shows a substantial neutral impact.
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
We introduce an algorithm (called REDFITmc2) for spectrum estimation in the presence of timescale errors. It is based on the Lomb-Scargle periodogram for unevenly spaced time series, in combination with the Welch's Overlapped Segment Averaging procedure, bootstrap bias correction and persistence estimation. The timescale errors are modelled parametrically and included in the simulations for determining (1) the upper levels of the spectrum of the red-noise AR(1) alternative and (2) the uncertainty of the frequency of a spectral peak. Application of REDFITmc2 to ice core and stalagmite records of palaeoclimate allowed a more realistic evaluation of spectral peaks than when ignoring this source of uncertainty. The results support qualitatively the intuition that stronger effects on the spectrum estimate (decreased detectability and increased frequency uncertainty) occur for higher frequencies. The surplus information brought by algorithm REDFITmc2 is that those effects are quantified. Regarding timescale construction, not only the fixpoints, dating errors and the functional form of the age-depth model play a role. Also the joint distribution of all time points (serial correlation, stratigraphic order) determines spectrum estimation.
Resumo:
In numerous intervention studies and education field trials, random assignment to treatment occurs in clusters rather than at the level of observation. This departure of random assignment of units may be due to logistics, political feasibility, or ecological validity. Data within the same cluster or grouping are often correlated. Application of traditional regression techniques, which assume independence between observations, to clustered data produce consistent parameter estimates. However such estimators are often inefficient as compared to methods which incorporate the clustered nature of the data into the estimation procedure (Neuhaus 1993).1 Multilevel models, also known as random effects or random components models, can be used to account for the clustering of data by estimating higher level, or group, as well as lower level, or individual variation. Designing a study, in which the unit of observation is nested within higher level groupings, requires the determination of sample sizes at each level. This study investigates the design and analysis of various sampling strategies for a 3-level repeated measures design on the parameter estimates when the outcome variable of interest follows a Poisson distribution. ^ Results study suggest that second order PQL estimation produces the least biased estimates in the 3-level multilevel Poisson model followed by first order PQL and then second and first order MQL. The MQL estimates of both fixed and random parameters are generally satisfactory when the level 2 and level 3 variation is less than 0.10. However, as the higher level error variance increases, the MQL estimates become increasingly biased. If convergence of the estimation algorithm is not obtained by PQL procedure and higher level error variance is large, the estimates may be significantly biased. In this case bias correction techniques such as bootstrapping should be considered as an alternative procedure. For larger sample sizes, those structures with 20 or more units sampled at levels with normally distributed random errors produced more stable estimates with less sampling variance than structures with an increased number of level 1 units. For small sample sizes, sampling fewer units at the level with Poisson variation produces less sampling variation, however this criterion is no longer important when sample sizes are large. ^ 1Neuhaus J (1993). “Estimation efficiency and Tests of Covariate Effects with Clustered Binary Data”. Biometrics , 49, 989–996^
Resumo:
The atmospheric westerly flow in the North Atlantic (NA) sector is dominated by atmospheric waves or eddies generating via momentum flux convergence, the so-called eddy-driven jet. The position of this jet is variable and shows for the present-day winter climate three preferred latitudinal states: a northern, central, and southernposition in the NA. Here, the authors analyze the behavior of the eddy-driven jet under different glacial and interglacial boundary conditions using atmosphere–land-only simulations with the CCSM4 climate model. As state-of-the-art climate models tend to underestimate the trimodality of the jet latitude, the authors apply a bias correction and successfully extract the trimodal behavior of the jet within CCSM4. The analysis shows that during interglacial times (i.e., the early Holocene and the Eemian) the preferred jet positions are rather stable and the observed multimodality is the typical interglacial character of the jet. During glacial times, the jet is strongly enhanced, its position is shifted southward, and the trimodal behavior vanishes. This is mainly due to the presence of the Laurentide ice sheet (LIS). The LIS enhances stationary waves downstream, thereby accelerating and displacing the NA eddy-driven jet by anomalous stationary momentum flux convergence. Additionally, changes in the transient eddy activity caused by topography changes as well as other glacial boundary conditions lead to an acceleration of the westerly winds over the southern NA at the expenseof more northernareas. Consequently, bothstationaryand transient eddiesfoster the southward shift of the NA eddy-driven jet during glacial winter times.
Resumo:
(preliminary) Exchanges of carbon, water and energy between the land surface and the atmosphere are monitored by eddy covariance technique at the ecosystem level. Currently, the FLUXNET database contains more than 500 sites registered and up to 250 of them sharing data (Free Fair Use dataset). Many modelling groups use the FLUXNET dataset for evaluating ecosystem model's performances but it requires uninterrupted time series for the meteorological variables used as input. Because original in-situ data often contain gaps, from very short (few hours) up to relatively long (some months), we develop a new and robust method for filling the gaps in meteorological data measured at site level. Our approach has the benefit of making use of continuous data available globally (ERA-interim) and high temporal resolution spanning from 1989 to today. These data are however not measured at site level and for this reason a method to downscale and correct the ERA-interim data is needed. We apply this method on the level 4 data (L4) from the LaThuile collection, freely available after registration under a Fair-Use policy. The performances of the developed method vary across sites and are also function of the meteorological variable. On average overall sites, the bias correction leads to cancel from 10% to 36% of the initial mismatch between in-situ and ERA-interim data, depending of the meteorological variable considered. In comparison to the internal variability of the in-situ data, the root mean square error (RMSE) between the in-situ data and the un-biased ERA-I data remains relatively large (on average overall sites, from 27% to 76% of the standard deviation of in-situ data, depending of the meteorological variable considered). The performance of the method remains low for the Wind Speed field, in particular regarding its capacity to conserve a standard deviation similar to the one measured at FLUXNET stations.
Resumo:
An important step to assess water availability is to have monthly time series representative of the current situation. In this context, a simple methodology is presented for application in large-scale studies in regions where a properly calibrated hydrologic model is not available, using the output variables simulated by regional climate models (RCMs) of the European project PRUDENCE under current climate conditions (period 1961–1990). The methodology compares different interpolation methods and alternatives to generate annual times series that minimise the bias with respect to observed values. The objective is to identify the best alternative to obtain bias-corrected, monthly runoff time series from the output of RCM simulations. This study uses information from 338 basins in Spain that cover the entire mainland territory and whose observed values of natural runoff have been estimated by the distributed hydrological model SIMPA. Four interpolation methods for downscaling runoff to the basin scale from 10 RCMs are compared with emphasis on the ability of each method to reproduce the observed behaviour of this variable. The alternatives consider the use of the direct runoff of the RCMs and the mean annual runoff calculated using five functional forms of the aridity index, defined as the ratio between potential evapotranspiration and precipitation. In addition, the comparison with respect to the global runoff reference of the UNH/GRDC dataset is evaluated, as a contrast of the “best estimator” of current runoff on a large scale. Results show that the bias is minimised using the direct original interpolation method and the best alternative for bias correction of the monthly direct runoff time series of RCMs is the UNH/GRDC dataset, although the formula proposed by Schreiber (1904) also gives good results
Resumo:
Extreme events of maximum and minimum temperatures are a main hazard for agricultural production in Iberian Peninsula. For this purpose, in this study we analyze projections of their evolution that could be valid for the next decade, represented in this study by the 30-year period 2004-2034 (target period). For this purpose two kinds of data were used in this study: 1) observations from the station network of AEMET (Spanish National Meteorological Agency) for five Spanish locations, and 2) simulated data at a resolution of 50 50 km horizontal grid derived from the outputs of twelve Regional Climate Models (RCMs) taken from project ENSEMBLES (van der Linden and Mitchell, 2009), with a bias correction (Dosio and Paruolo, 2011; Dosio et al., 2012) regarding the observational dataset Spain02 (Herrera et al., 2012). To validate the simulated climate, the available period of observations was compared to a baseline period (1964-1994) of simulated climate for all locations. Then, to analyze the changes for the present/very next future, probability of extreme temperature events for 2004-2034 were compared to that of the baseline period. Although only minor changes are expected, small variations in variability may have a significant impact in crop performance.