969 resultados para covariance estimator
Resumo:
Statistical graphics are a fundamental, yet often overlooked, set of components in the repertoire of data analytic tools. Graphs are quick and efficient, yet simple instruments of preliminary exploration of a dataset to understand its structure and to provide insight into influential aspects of inference such as departures from assumptions and latent patterns. In this paper, we present and assess a graphical device for choosing a method for estimating population size in capture-recapture studies of closed populations. The basic concept is derived from a homogeneous Poisson distribution where the ratios of neighboring Poisson probabilities multiplied by the value of the larger neighbor count are constant. This property extends to the zero-truncated Poisson distribution which is of fundamental importance in capture–recapture studies. In practice however, this distributional property is often violated. The graphical device developed here, the ratio plot, can be used for assessing specific departures from a Poisson distribution. For example, simple contaminations of an otherwise homogeneous Poisson model can be easily detected and a robust estimator for the population size can be suggested. Several robust estimators are developed and a simulation study is provided to give some guidance on which should be used in practice. More systematic departures can also easily be detected using the ratio plot. In this paper, the focus is on Gamma mixtures of the Poisson distribution which leads to a linear pattern (called structured heterogeneity) in the ratio plot. More generally, the paper shows that the ratio plot is monotone for arbitrary mixtures of power series densities.
Obesity and diabetes, the built environment, and the ‘local’ food economy in the United States, 2007
Resumo:
Obesity and diabetes are increasingly attributed to environmental factors, however, little attention has been paid to the influence of the ‘local’ food economy. This paper examines the association of measures relating to the built environment and ‘local’ agriculture with U.S. county-level prevalence of obesity and diabetes. Key indicators of the ‘local’ food economy include the density of farmers’ markets and the presence of farms with direct sales. This paper employs a robust regression estimator to account for non-normality of the data and to accommodate outliers. Overall, the built environment is associated with the prevalence of obesity and diabetes and a strong local’ food economy may play an important role in prevention. Results imply considerable scope for community-level interventions.
Resumo:
Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semivariogram models are fit from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data fields. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial field under consideration, they yield an estimate of the semivariogram that is unbiased for small lag distances. This justifies the use of this estimation technique in geostatistical applications. Various formulations of geostatistical combination (Kriging) methods are used here for the construction of hourly precipitation grids for Switzerland based on data from a sparse realtime network of raingauges and from a spatially complete radar composite. Two variants of Ordinary Kriging (OK) are used to interpolate the sparse gauge observations. In both OK variants, the radar data are only used to determine the semivariogram model. One variant relies on a traditional parametric semivariogram estimate, whereas the other variant uses the nonparametric correlogram. The variants are tested for three cases and the impact of the semivariogram model on the Kriging prediction is illustrated. For the three test cases, the method using nonparametric correlograms performs equally well or better than the traditional method, and at the same time offers great practical advantages. Furthermore, two variants of Kriging with external drift (KED) are tested, both of which use the radar data to estimate nonparametric correlograms, and as the external drift variable. The first KED variant has been used previously for geostatistical radar-raingauge merging in Catalonia (Spain). The second variant is newly proposed here and is an extension of the first. Both variants are evaluated for the three test cases as well as an extended evaluation period. It is found that both methods yield merged fields of better quality than the original radar field or fields obtained by OK of gauge data. The newly suggested KED formulation is shown to be beneficial, in particular in mountainous regions where the quality of the Swiss radar composite is comparatively low. An analysis of the Kriging variances shows that none of the methods tested here provides a satisfactory uncertainty estimate. A suitable variable transformation is expected to improve this.
Resumo:
The coarse spacing of automatic rain gauges complicates near-real- time spatial analyses of precipitation. We test the possibility of improving such analyses by considering, in addition to the in situ measurements, the spatial covariance structure inferred from past observations with a denser network. To this end, a statistical reconstruction technique, reduced space optimal interpolation (RSOI), is applied over Switzerland, a region of complex topography. RSOI consists of two main parts. First, principal component analysis (PCA) is applied to obtain a reduced space representation of gridded high- resolution precipitation fields available for a multiyear calibration period in the past. Second, sparse real-time rain gauge observations are used to estimate the principal component scores and to reconstruct the precipitation field. In this way, climatological information at higher resolution than the near-real-time measurements is incorporated into the spatial analysis. PCA is found to efficiently reduce the dimensionality of the calibration fields, and RSOI is successful despite the difficulties associated with the statistical distribution of daily precipitation (skewness, dry days). Examples and a systematic evaluation show substantial added value over a simple interpolation technique that uses near-real-time observations only. The benefit is particularly strong for larger- scale precipitation and prominent topographic effects. Small-scale precipitation features are reconstructed at a skill comparable to that of the simple technique. Stratifying the reconstruction method by the types of weather type classifications yields little added skill. Apart from application in near real time, RSOI may also be valuable for enhancing instrumental precipitation analyses for the historic past when direct observations were sparse.
Resumo:
The need for consistent assimilation of satellite measurements for numerical weather prediction led operational meteorological centers to assimilate satellite radiances directly using variational data assimilation systems. More recently there has been a renewed interest in assimilating satellite retrievals (e.g., to avoid the use of relatively complicated radiative transfer models as observation operators for data assimilation). The aim of this paper is to provide a rigorous and comprehensive discussion of the conditions for the equivalence between radiance and retrieval assimilation. It is shown that two requirements need to be satisfied for the equivalence: (i) the radiance observation operator needs to be approximately linear in a region of the state space centered at the retrieval and with a radius of the order of the retrieval error; and (ii) any prior information used to constrain the retrieval should not underrepresent the variability of the state, so as to retain the information content of the measurements. Both these requirements can be tested in practice. When these requirements are met, retrievals can be transformed so as to represent only the portion of the state that is well constrained by the original radiance measurements and can be assimilated in a consistent and optimal way, by means of an appropriate observation operator and a unit matrix as error covariance. Finally, specific cases when retrieval assimilation can be more advantageous (e.g., when the estimate sought by the operational assimilation system depends on the first guess) are discussed.
Resumo:
Scintillometry is an established technique for determining large areal average sensible heat fluxes. The scintillometer measurement is related to sensible heat flux via Monin–Obukhov similarity theory, which was developed for ideal homogeneous land surfaces. In this study it is shown that judicious application of scintillometry over heterogeneous mixed agriculture on undulating topography yields valid results when compared to eddy covariance (EC). A large aperture scintillometer (LAS) over a 2.4 km path was compared with four EC stations measuring sensible (H) and latent (LvE) heat fluxes over different vegetation (cereals and grass) which when aggregated were representative of the LAS source area. The partitioning of available energy into H and LvE varied strongly for different vegetation types, with H varying by a factor of three between senesced winter wheat and grass pasture. The LAS derived H agrees (one-to-one within the experimental uncertainty) with H aggregated from EC with a high coefficient of determination of 0.94. Chronological analysis shows individual fields may have a varying contribution to the areal average sensible heat flux on short (weekly) time scales due to phenological development and changing soil moisture conditions. Using spatially aggregated measurements of net radiation and soil heat flux with H from the LAS, the areal averaged latent heat flux (LvELAS) was calculated as the residual of the surface energy balance. The regression of LvELAS against aggregated LvE from the EC stations has a slope of 0.94, close to ideal, and demonstrates that this is an accurate method for the landscape-scale estimation of evaporation over heterogeneous complex topography.
Resumo:
This paper studies the effects of increasing formality via tax reduction and simplification schemes on micro-firm performance. It uses the 1997 Brazilian SIMPLES program. We develop a simple theoretical model to show that SIMPLES has an impact only on a segment of the micro-firm population, for which the effect of formality on firm performance can be identified, and that can be analyzed along the single dimensional quantiles of the conditional firm revenues. To estimate the effect of formality, we use an econometric approach that compares eligible and non-eligible firms, born before and after SIMPLES in a local interval about the introduction of SIMPLES. We use an estimator that combines both quantile regression and the regression discontinuity identification strategy. The empirical results corroborate the positive effect of formality on microfirms' performance and produce a clear characterization of who benefits from these programs.
Resumo:
In numerical weather prediction (NWP) data assimilation (DA) methods are used to combine available observations with numerical model estimates. This is done by minimising measures of error on both observations and model estimates with more weight given to data that can be more trusted. For any DA method an estimate of the initial forecast error covariance matrix is required. For convective scale data assimilation, however, the properties of the error covariances are not well understood. An effective way to investigate covariance properties in the presence of convection is to use an ensemble-based method for which an estimate of the error covariance is readily available at each time step. In this work, we investigate the performance of the ensemble square root filter (EnSRF) in the presence of cloud growth applied to an idealised 1D convective column model of the atmosphere. We show that the EnSRF performs well in capturing cloud growth, but the ensemble does not cope well with discontinuities introduced into the system by parameterised rain. The state estimates lose accuracy, and more importantly the ensemble is unable to capture the spread (variance) of the estimates correctly. We also find, counter-intuitively, that by reducing the spatial frequency of observations and/or the accuracy of the observations, the ensemble is able to capture the states and their variability successfully across all regimes.
Resumo:
A direct method is presented for determining the uncertainty in reservoir pressure, flow, and net present value (NPV) using the time-dependent, one phase, two- or three-dimensional equations of flow through a porous medium. The uncertainty in the solution is modelled as a probability distribution function and is computed from given statistical data for input parameters such as permeability. The method generates an expansion for the mean of the pressure about a deterministic solution to the system equations using a perturbation to the mean of the input parameters. Hierarchical equations that define approximations to the mean solution at each point and to the field covariance of the pressure are developed and solved numerically. The procedure is then used to find the statistics of the flow and the risked value of the field, defined by the NPV, for a given development scenario. This method involves only one (albeit complicated) solution of the equations and contrasts with the more usual Monte-Carlo approach where many such solutions are required. The procedure is applied easily to other physical systems modelled by linear or nonlinear partial differential equations with uncertain data.
Resumo:
For data assimilation in numerical weather prediction, the initial forecast-error covariance matrix Pf is required. For variational assimilation it is particularly important to prescribe an accurate initial matrix Pf, since Pf is either static (in the 3D-Var case) or constant at the beginning of each assimilation window (in the 4D-Var case). At large scales the atmospheric flow is well approximated by hydrostatic balance and this balance is strongly enforced in the initial matrix Pf used in operational variational assimilation systems such as that of the Met Office. However, at convective scales this balance does not necessarily hold any more. Here we examine the extent to which hydrostatic balance is valid in the vertical forecast-error covariances for high-resolution models in order to determine whether there is a need to relax this balance constraint in convective-scale data assimilation. We use the Met Office Global and Regional Ensemble Prediction System (MOGREPS) and a 1.5 km resolution version of the Unified Model for a case study characterized by the presence of convective activity. An ensemble of high-resolution forecasts valid up to three hours after the onset of convection is produced. We show that at 1.5 km resolution hydrostatic balance does not hold for forecast errors in regions of convection. This indicates that in the presence of convection hydrostatic balance should not be enforced in the covariance matrix used for variational data assimilation at this scale. The results show the need to investigate covariance models that may be better suited for convective-scale data assimilation. Finally, we give a measure of the balance present in the forecast perturbations as a function of the horizontal scale (from 3–90 km) using a set of diagnostics. Copyright © 2012 Royal Meteorological Society and British Crown Copyright, the Met Office
Resumo:
This paper describes the implementation of a 3D variational (3D-Var) data assimilation scheme for a morphodynamic model applied to Morecambe Bay, UK. A simple decoupled hydrodynamic and sediment transport model is combined with a data assimilation scheme to investigate the ability of such methods to improve the accuracy of the predicted bathymetry. The inverse forecast error covariance matrix is modelled using a Laplacian approximation which is calibrated for the length scale parameter required. Calibration is also performed for the Soulsby-van Rijn sediment transport equations. The data used for assimilation purposes comprises waterlines derived from SAR imagery covering the entire period of the model run, and swath bathymetry data collected by a ship-borne survey for one date towards the end of the model run. A LiDAR survey of the entire bay carried out in November 2005 is used for validation purposes. The comparison of the predictive ability of the model alone with the model-forecast-assimilation system demonstrates that using data assimilation significantly improves the forecast skill. An investigation of the assimilation of the swath bathymetry as well as the waterlines demonstrates that the overall improvement is initially large, but decreases over time as the bathymetry evolves away from that observed by the survey. The result of combining the calibration runs into a pseudo-ensemble provides a higher skill score than for a single optimized model run. A brief comparison of the Optimal Interpolation assimilation method with the 3D-Var method shows that the two schemes give similar results.
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.
Resumo:
We show that the four-dimensional variational data assimilation method (4DVar) can be interpreted as a form of Tikhonov regularization, a very familiar method for solving ill-posed inverse problems. It is known from image restoration problems that L1-norm penalty regularization recovers sharp edges in the image more accurately than Tikhonov, or L2-norm, penalty regularization. We apply this idea from stationary inverse problems to 4DVar, a dynamical inverse problem, and give examples for an L1-norm penalty approach and a mixed total variation (TV) L1–L2-norm penalty approach. For problems with model error where sharp fronts are present and the background and observation error covariances are known, the mixed TV L1–L2-norm penalty performs better than either the L1-norm method or the strong constraint 4DVar (L2-norm)method. A strength of the mixed TV L1–L2-norm regularization is that in the case where a simplified form of the background error covariance matrix is used it produces a much more accurate analysis than 4DVar. The method thus has the potential in numerical weather prediction to overcome operational problems with poorly tuned background error covariance matrices.
Resumo:
A neurofuzzy classifier identification algorithm is introduced for two class problems. The initial fuzzy base construction is based on fuzzy clustering utilizing a Gaussian mixture model (GMM) and the analysis of covariance (ANOVA) decomposition. The expectation maximization (EM) algorithm is applied to determine the parameters of the fuzzy membership functions. Then neurofuzzy model is identified via the supervised subspace orthogonal least square (OLS) algorithm. Finally a logistic regression model is applied to produce the class probability. The effectiveness of the proposed neurofuzzy classifier has been demonstrated using a real data set.