916 resultados para Sampling (Statistics)
Resumo:
Geophysical tomography captures the spatial distribution of the underlying geophysical property at a relatively high resolution, but the tomographic images tend to be blurred representations of reality and generally fail to reproduce sharp interfaces. Such models may cause significant bias when taken as a basis for predictive flow and transport modeling and are unsuitable for uncertainty assessment. We present a methodology in which tomograms are used to condition multiple-point statistics (MPS) simulations. A large set of geologically reasonable facies realizations and their corresponding synthetically calculated cross-hole radar tomograms are used as a training image. The training image is scanned with a direct sampling algorithm for patterns in the conditioning tomogram, while accounting for the spatially varying resolution of the tomograms. In a post-processing step, only those conditional simulations that predicted the radar traveltimes within the expected data error levels are accepted. The methodology is demonstrated on a two-facies example featuring channels and an aquifer analog of alluvial sedimentary structures with five facies. For both cases, MPS simulations exhibit the sharp interfaces and the geological patterns found in the training image. Compared to unconditioned MPS simulations, the uncertainty in transport predictions is markedly decreased for simulations conditioned to tomograms. As an improvement to other approaches relying on classical smoothness-constrained geophysical tomography, the proposed method allows for: (1) reproduction of sharp interfaces, (2) incorporation of realistic geological constraints and (3) generation of multiple realizations that enables uncertainty assessment.
Resumo:
Turbulence statistics obtained by direct numerical simulations are analysed to investigate spatial heterogeneity within regular arrays of building-like cubical obstacles. Two different array layouts are studied, staggered and square, both at a packing density of λp=0.25 . The flow statistics analysed are mean streamwise velocity ( u− ), shear stress ( u′w′−−−− ), turbulent kinetic energy (k) and dispersive stress fraction ( u˜w˜ ). The spatial flow patterns and spatial distribution of these statistics in the two arrays are found to be very different. Local regions of high spatial variability are identified. The overall spatial variances of the statistics are shown to be generally very significant in comparison with their spatial averages within the arrays. Above the arrays the spatial variances as well as dispersive stresses decay rapidly to zero. The heterogeneity is explored further by separately considering six different flow regimes identified within the arrays, described here as: channelling region, constricted region, intersection region, building wake region, canyon region and front-recirculation region. It is found that the flow in the first three regions is relatively homogeneous, but that spatial variances in the latter three regions are large, especially in the building wake and canyon regions. The implication is that, in general, the flow immediately behind (and, to a lesser extent, in front of) a building is much more heterogeneous than elsewhere, even in the relatively dense arrays considered here. Most of the dispersive stress is concentrated in these regions. Considering the experimental difficulties of obtaining enough point measurements to form a representative spatial average, the error incurred by degrading the sampling resolution is investigated. It is found that a good estimate for both area and line averages can be obtained using a relatively small number of strategically located sampling points.
Resumo:
Long-term monitoring of forest soils as part of a pan-European network to detect environmental change depends on an accurate determination of the mean of the soil properties at each monitoring event. Forest soil is known to be very variable spatially, however. A study was undertaken to explore and quantify this variability at three forest monitoring plots in Britain. Detailed soil sampling was carried out, and the data from the chemical analyses were analysed by classical statistics and geostatistics. An analysis of variance showed that there were no consistent effects from the sample sites in relation to the position of the trees. The variogram analysis showed that there was spatial dependence at each site for several variables and some varied in an apparently periodic way. An optimal sampling analysis based on the multivariate variogram for each site suggested that a bulked sample from 36 cores would reduce error to an acceptable level. Future sampling should be designed so that it neither targets nor avoids trees and disturbed ground. This can be achieved best by using a stratified random sampling design.
Resumo:
It is common practice to design a survey with a large number of strata. However, in this case the usual techniques for variance estimation can be inaccurate. This paper proposes a variance estimator for estimators of totals. The method proposed can be implemented with standard statistical packages without any specific programming, as it involves simple techniques of estimation, such as regression fitting.
Resumo:
The systematic sampling (SYS) design (Madow and Madow, 1944) is widely used by statistical offices due to its simplicity and efficiency (e.g., Iachan, 1982). But it suffers from a serious defect, namely, that it is impossible to unbiasedly estimate the sampling variance (Iachan, 1982) and usual variance estimators (Yates and Grundy, 1953) are inadequate and can overestimate the variance significantly (Särndal et al., 1992). We propose a novel variance estimator which is less biased and that can be implemented with any given population order. We will justify this estimator theoretically and with a Monte Carlo simulation study.
Resumo:
This paper presents our experience with combining statistical principles and participatory methods to generate national statistics. The methodology was developed in Malawi during 1999–2002. We demonstrate that if PRA is combined with statistical principles (including probability-based sampling and standardization), it can produce total population statistics and estimates of the proportion of households with certain characteristics (e.g., poverty). It can also provide quantitative data on complex issues of national importance such as poverty targeting. This approach is distinct from previous PRA-based approaches, which generate numbers at community level but only provide qualitative information at national level.
Resumo:
Mixed models may be defined with or without reference to sampling, and can be used to predict realized random effects, as when estimating the latent values of study subjects measured with response error. When the model is specified without reference to sampling, a simple mixed model includes two random variables, one stemming from an exchangeable distribution of latent values of study subjects and the other, from the study subjects` response error distributions. Positive probabilities are assigned to both potentially realizable responses and artificial responses that are not potentially realizable, resulting in artificial latent values. In contrast, finite population mixed models represent the two-stage process of sampling subjects and measuring their responses, where positive probabilities are only assigned to potentially realizable responses. A comparison of the estimators over the same potentially realizable responses indicates that the optimal linear mixed model estimator (the usual best linear unbiased predictor, BLUP) is often (but not always) more accurate than the comparable finite population mixed model estimator (the FPMM BLUP). We examine a simple example and provide the basis for a broader discussion of the role of conditioning, sampling, and model assumptions in developing inference.
Resumo:
The power-law size distributions obtained experimentally for neuronal avalanches are an important evidence of criticality in the brain. This evidence is supported by the fact that a critical branching process exhibits the same exponent t~3=2. Models at criticality have been employed to mimic avalanche propagation and explain the statistics observed experimentally. However, a crucial aspect of neuronal recordings has been almost completely neglected in the models: undersampling. While in a typical multielectrode array hundreds of neurons are recorded, in the same area of neuronal tissue tens of thousands of neurons can be found. Here we investigate the consequences of undersampling in models with three different topologies (two-dimensional, small-world and random network) and three different dynamical regimes (subcritical, critical and supercritical). We found that undersampling modifies avalanche size distributions, extinguishing the power laws observed in critical systems. Distributions from subcritical systems are also modified, but the shape of the undersampled distributions is more similar to that of a fully sampled system. Undersampled supercritical systems can recover the general characteristics of the fully sampled version, provided that enough neurons are measured. Undersampling in two-dimensional and small-world networks leads to similar effects, while the random network is insensitive to sampling density due to the lack of a well-defined neighborhood. We conjecture that neuronal avalanches recorded from local field potentials avoid undersampling effects due to the nature of this signal, but the same does not hold for spike avalanches. We conclude that undersampled branching-process-like models in these topologies fail to reproduce the statistics of spike avalanches.
Resumo:
In this article, we consider the T(2) chart with double sampling to control bivariate processes (BDS chart). During the first stage of the sampling, n(1) items of the sample are inspected and two quality characteristics (x; y) are measured. If the Hotelling statistic T(1)(2) for the mean vector of (x; y) is less than w, the sampling is interrupted. If the Hotelling statistic T(1)(2) is greater than CL(1), where CL(1) > w, the control chart signals an out-of-control condition. If w < T(1)(2) <= CL(1), the sampling goes on to the second stage, where the remaining n(2) items of the sample are inspected and T(2)(2) for the mean vector of the whole sample is computed. During the second stage of the sampling, the control chart signals an out-of-control condition when the statistic T(2)(2) is larger than CL(2). A comparative study shows that the BDS chart detects process disturbances faster than the standard bivariate T(2) chart and the adaptive bivariate T(2) charts with variable sample size and/or variable sampling interval.
Resumo:
A standard (X) over bar chart for controlling the process mean takes samples of size no at specified, equally-spaced, fixed-time points. This article proposes a modification of the standard (X) over bar chart that allows one to take additional samples, bigger than no, between these fixed times. The additional samples are taken from the process when there is evidence that the process mean moved from target. Following the notation proposed by Reynolds (1996a) and Costs (1997) we shortly call the proposed (X) over bar chart as VSSIFT (X) over bar chart: where VSSIFT means variable sample size and sampling intervals with fixed times. The (X) over bar chart with the VSSIFT feature is easier to be administered than a standard VSSI (X) over bar chart that is not constrained to sample at the specified fixed times. The performances of the charts in detecting process mean shifts are comparable.
Resumo:
A standard X̄ chart for controlling the process mean takes samples of size n0 at specified, equally-spaced, fixed-time points. This article proposes a modification of the standard X chart that allows one to take additional samples, bigger than n0, between these fixed times. The additional samples are taken from the process when there is evidence that the process mean moved from target. Following the notation proposed by Reynolds (1996a) and Costa (1997) we shortly call the proposed X chart as VSSIFT X chart where VSSIFT means variable sample size and sampling intervals with fixed times. The X chart with the VSSIFT feature is easier to be administered than a standard VSSI X chart that is not constrained to sample at the specified fixed times. The performances of the charts in detecting process mean shifts are comparable. Copyright © 1998 by Marcel Dekker, Inc.
Resumo:
The purpose of the present study was to establish reference values for hemoglobins (Hb) using HPLC, in samples containing normal Hb (AA), sickle cell trait without alpha-thalassemia (AS), sickle cell trait with alpha-thalassemia (ASH), sickle cell anemia (SS), and Hb SC disease (SC). The blood samples were analyzed by electrophoresis, HPLC and molecular procedures. The Hb A2 mean was 4.30 ± 0.44% in AS, 4.18 ± 0.42% in ASH, 3.90 ± 1.14% in SS, and 4.39 ± 0.35% in SC. They were similar, but above the normal range. Between the AS and ASH groups, only the amount of Hb S was higher in the AS group. The Hb S mean in the AS group was 38.54 ± 3.01% and in the ASH it was 36.54 ± 3.76%. In the qualitative analysis, using FastMap, distinct groups were seen: AA and SS located at opposite extremes, AS and ASH with overlapping values and intermediate distribution, SC between heterozygotes and the SS group. Hb S was confirmed by allele-specific polymerase chain reaction. The Hb values established will be available for use as a reference for the Brazilian population, drawing attention to the increased levels of Hb A2, which should be considered with caution to prevent incorrect diagnoses. ©FUNPEC-RP.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Geostatistics involves the fitting of spatially continuous models to spatially discrete data (Chil`es and Delfiner, 1999). Preferential sampling arises when the process that determines the data-locations and the process being modelled are stochastically dependent. Conventional geostatistical methods assume, if only implicitly, that sampling is non-preferential. However, these methods are often used in situations where sampling is likely to be preferential. For example, in mineral exploration samples may be concentrated in areas thought likely to yield high-grade ore. We give a general expression for the likelihood function of preferentially sampled geostatistical data and describe how this can be evaluated approximately using Monte Carlo methods. We present a model for preferential sampling, and demonstrate through simulated examples that ignoring preferential sampling can lead to seriously misleading inferences. We describe an application of the model to a set of bio-monitoring data from Galicia, northern Spain, in which making allowance for preferential sampling materially changes the inferences.