922 resultados para Data Streams Distribution


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. The first paper of this series examined the effects of the former on the variogram and this paper examines the effects of asymmetry arising from outliers. Simulated annealing was used to create normally distributed random fields of different size that are realizations of known processes described by variograms with different nugget:sill ratios. These primary data sets were then contaminated with randomly located and spatially aggregated outliers from a secondary process to produce different degrees of asymmetry. Experimental variograms were computed from these data by Matheron's estimator and by three robust estimators. The effects of standard data transformations on the coefficient of skewness and on the variogram were also investigated. Cross-validation was used to assess the performance of models fitted to experimental variograms computed from a range of data contaminated by outliers for kriging. The results showed that where skewness was caused by outliers the variograms retained their general shape, but showed an increase in the nugget and sill variances and nugget:sill ratios. This effect was only slightly more for the smallest data set than for the two larger data sets and there was little difference between the results for the latter. Overall, the effect of size of data set was small for all analyses. The nugget:sill ratio showed a consistent decrease after transformation to both square roots and logarithms; the decrease was generally larger for the latter, however. Aggregated outliers had different effects on the variogram shape from those that were randomly located, and this also depended on whether they were aggregated near to the edge or the centre of the field. The results of cross-validation showed that the robust estimators and the removal of outliers were the most effective ways of dealing with outliers for variogram estimation and kriging. (C) 2007 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Particle size distribution (psd) is one of the most important features of the soil because it affects many of its other properties, and it determines how soil should be managed. To understand the properties of chalk soil, psd analyses should be based on the original material (including carbonates), and not just the acid-resistant fraction. Laser-based methods rather than traditional sedimentation methods are being used increasingly to determine particle size to reduce the cost of analysis. We give an overview of both approaches and the problems associated with them for analyzing the psd of chalk soil. In particular, we show that it is not appropriate to use the widely adopted 8 pm boundary between the clay and silt size fractions for samples determined by laser to estimate proportions of these size fractions that are equivalent to those based on sedimentation. We present data from field and national-scale surveys of soil derived from chalk in England. Results from both types of survey showed that laser methods tend to over-estimate the clay-size fraction compared to sedimentation for the 8 mu m clay/silt boundary, and we suggest reasons for this. For soil derived from chalk, either the sedimentation methods need to be modified or it would be more appropriate to use a 4 pm threshold as an interim solution for laser methods. Correlations between the proportions of sand- and clay-sized fractions, and other properties such as organic matter and volumetric water content, were the opposite of what one would expect for soil dominated by silicate minerals. For water content, this appeared to be due to the predominance of porous, chalk fragments in the sand-sized fraction rather than quartz grains, and the abundance of fine (<2 mu m) calcite crystals rather than phyllosilicates in the clay-sized fraction. This was confirmed by scanning electron microscope (SEM) analyses. "Of all the rocks with which 1 am acquainted, there is none whose formation seems to tax the ingenuity of theorists so severely, as the chalk, in whatever respect we may think fit to consider it". Thomas Allan, FRS Edinburgh 1823, Transactions of the Royal Society of Edinburgh. (C) 2009 Natural Environment Research Council (NERC) Published by Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Annual total phosphorus (TP) export data from 108 European micro-catchments were analyzed against descriptive catchment data on climate (runoff), soil types, catchment size, and land use. The best possible empirical model developed included runoff, proportion of agricultural land and catchment size as explanatory variables but with a low explanation of the variance in the dataset (R-2 = 0.37). Improved country specific empirical models could be developed in some cases. The best example was from Norway where an analysis of TP-export data from 12 predominantly agricultural micro-catchments revealed a relationship explaining 96% of the variance in TP-export. The explanatory variables were in this case soil-P status (P-AL), proportion of organic soil, and the export of suspended sediment. Another example is from Denmark where an empirical model was established for the basic annual average TP-export from 24 catchments with percentage sandy soils, percentage organic soils, runoff, and application of phosphorus in fertilizer and animal manure as explanatory variables (R-2 = 0.97).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The elucidation of spatial variation in the landscape can indicate potential wildlife habitats or breeding sites for vectors, such as ticks or mosquitoes, which cause a range of diseases. Information from remotely sensed data could aid the delineation of vegetation distribution on the ground in areas where local knowledge is limited. The data from digital images are often difficult to interpret because of pixel-to-pixel variation, that is, noise, and complex variation at more than one spatial scale. Landsat Thematic Mapper Plus (ETM+) and Satellite Pour l'Observation de La Terre (SPOT) image data were analyzed for an area close to Douna in Mali, West Africa. The variograms of the normalized difference vegetation index (NDVI) from both types of image data were nested. The parameters of the nested variogram function from the Landsat ETM+ data were used to design the sampling for a ground survey of soil and vegetation data. Variograms of the soil and vegetation data showed that their variation was anisotropic and their scales of variation were similar to those of NDVI from the SPOT data. The short- and long-range components of variation in the SPOT data were filtered out separately by factorial kriging. The map of the short-range component appears to represent the patterns of vegetation and associated shallow slopes and drainage channels of the tiger bush system. The map of the long-range component also appeared to relate to broader patterns in the tiger bush and to gentle undulations in the topography. The results suggest that the types of image data analyzed in this study could be used to identify areas with more moisture in semiarid regions that could support wildlife and also be potential vector breeding sites.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Airborne dust is of concern due to hazards in the localities affected by erosion, transport and deposition, but it is also of global concern due to uncertainties over its role in radiative forcing of climate. In order to model the environmental impact of dust, we need a better knowledge of sources and transport processes. Satellite remote sensing has been instrumental in providing this knowledge, through long time series of observations of atmospheric dust transport. Three remote sensing methodologies have been used, and are reviewed briefly in this paper. Firstly the use of observations from the Total Ozone Mapping Spectrometer (TOMS), secondly the use of the Infrared Difference Dust Index (IDDI) from Meterosat infrared data, thirdly the use of MODIS images from the rapid response system. These data have highlighted the major global sources of dust, mist of which are associated with endoreic drainage basins in deserts, which held lakes during Quaternary humid climate phases, and identified the Bodele Depression in Tchad as the dustiest place on Earth.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper maps the carbonate geochemistry of the Makgadikgadi Pans region of northern Botswana from moderate resolution (500 m pixels) remotely sensed data, to assess the impact of various geomorphological processes on surficial carbonate distribution. Previous palaeo-environmental studies have demonstrated that the pans have experienced several highstands during the Quaternary, forming calcretes around shoreline embayments. The pans are also a significant regional source of dust, and some workers have suggested that surficial carbonate distributions may be controlled, in part, by wind regime. Field studies of carbonate deposits in the region have also highlighted the importance of fluvial and groundwater processes in calcrete formation. However, due to the large area involved and problems of accessibility, the carbonate distribution across the entire Makgadikgadi basin remains poorly understood. The MODIS instrument permits mapping of carbonate distribution over large areas; comparison with estimates from Landsat Thematic Mapper data show reasonable agreement, and there is good agreement with estimates from laboratory analysis of field samples. The results suggest that palaeo-lake highstands, reconstructed here using the SRTM 3 arc-second digital elevation model, have left behind surficial carbonate deposits, which can be mapped by the MODIS instrument. Copyright (c) 2006 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many time series are measured monthly, either as averages or totals, and such data often exhibit seasonal variability-the values of the series are consistently larger for some months of the year than for others. A typical series of this type is the number of deaths each month attributed to SIDS (Sudden Infant Death Syndrome). Seasonality can be modelled in a number of ways. This paper describes and discusses various methods for modelling seasonality in SIDS data, though much of the discussion is relevant to other seasonally varying data. There are two main approaches, either fitting a circular probability distribution to the data, or using regression-based techniques to model the mean seasonal behaviour. Both are discussed in this paper.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The images taken by the Heliospheric Imagers (HIs), part of the SECCHI imaging package onboard the pair of STEREO spacecraft, provide information on the radial and latitudinal evolution of the plasma compressed inside corotating interaction regions (CIRs). A plasma density wave imaged by the HI instrument onboard STEREO-B was found to propagate towards STEREO-A, enabling a comparison between simultaneous remotesensing and in situ observations of its structure to be performed. In situ measurements made by STEREO-A show that the plasma density wave is associated with the passage of a CIR. The magnetic field compressed after the CIR stream interface (SI) is found to have a planar distribution. Minimum variance analysis of the magnetic field vectors shows that the SI is inclined at 54° to the orbital plane of the STEREO-A spacecraft. This inclination of the CIR SI is comparable to the inclination of the associated plasma density wave observed by HI. A small-scale magnetic cloud with a flux rope topology and radial extent of 0.08 AU is also embedded prior to the SI. The pitch-angle distribution of suprathermal electrons measured by the STEREO-A SWEA instrument shows that an open magnetic field topology in the cloud replaced the heliospheric current sheet locally. These observations confirm that HI observes CIRs in difference images when a small-scale transient is caught up in the compression region.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While over-dispersion in capture–recapture studies is well known to lead to poor estimation of population size, current diagnostic tools to detect the presence of heterogeneity have not been specifically developed for capture–recapture studies. To address this, a simple and efficient method of testing for over-dispersion in zero-truncated count data is developed and evaluated. The proposed method generalizes an over-dispersion test previously suggested for un-truncated count data and may also be used for testing residual over-dispersion in zero-inflation data. Simulations suggest that the asymptotic distribution of the test statistic is standard normal and that this approximation is also reasonable for small sample sizes. The method is also shown to be more efficient than an existing test for over-dispersion adapted for the capture–recapture setting. Studies with zero-truncated and zero-inflated count data are used to illustrate the test procedures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Molecular tools may help to uncover closely related and still diverging species from a wide variety of taxa and provide insight into the mechanisms, pace and geography of marine speciation. There is a certain controversy on the phylogeography and speciation modes of species-groups with an Eastern Atlantic-Western Indian Ocean distribution, with previous studies suggesting that older events (Miocene) and/or more recent (Pleistocene) oceanographic processes could have influenced the phylogeny of marine taxa. The spiny lobster genus Palinurus allows for testing among speciation hypotheses, since it has a particular distribution with two groups of three species each in the Northeastern Atlantic (P. elephas, P. mauritanicus and P. charlestoni) and Southeastern Atlantic and Southwestern Indian Oceans (P. gilchristi, P. delagoae and P. barbarae). In the present study, we obtain a more complete understanding of the phylogenetic relationships among these species through a combined dataset with both nuclear and mitochondrial markers, by testing alternative hypotheses on both the mutation rate and tree topology under the recently developed approximate Bayesian computation (ABC) methods. Results: Our analyses support a North-to-South speciation pattern in Palinurus with all the South-African species forming a monophyletic clade nested within the Northern Hemisphere species. Coalescent-based ABC methods allowed us to reject the previously proposed hypothesis of a Middle Miocene speciation event related with the closure of the Tethyan Seaway. Instead, divergence times obtained for Palinurus species using the combined mtDNA-microsatellite dataset and standard mutation rates for mtDNA agree with known glaciation-related processes occurring during the last 2 my. Conclusion: The Palinurus speciation pattern is a typical example of a series of rapid speciation events occurring within a group, with very short branches separating different species. Our results support the hypothesis that recent climate change-related oceanographic processes have influenced the phylogeny of marine taxa, with most Palinurus species originating during the last two million years. The present study highlights the value of new coalescent-based statistical methods such as ABC for testing different speciation hypotheses using molecular data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This investigation deals with the question of when a particular population can be considered to be disease-free. The motivation is the case of BSE where specific birth cohorts may present distinct disease-free subpopulations. The specific objective is to develop a statistical approach suitable for documenting freedom of disease, in particular, freedom from BSE in birth cohorts. The approach is based upon a geometric waiting time distribution for the occurrence of positive surveillance results and formalizes the relationship between design prevalence, cumulative sample size and statistical power. The simple geometric waiting time model is further modified to account for the diagnostic sensitivity and specificity associated with the detection of disease. This is exemplified for BSE using two different models for the diagnostic sensitivity. The model is furthermore modified in such a way that a set of different values for the design prevalence in the surveillance streams can be accommodated (prevalence heterogeneity) and a general expression for the power function is developed. For illustration, numerical results for BSE suggest that currently (data status September 2004) a birth cohort of Danish cattle born after March 1999 is free from BSE with probability (power) of 0.8746 or 0.8509, depending on the choice of a model for the diagnostic sensitivity.