30 resultados para Test data generation


Relevância:

40.00% 40.00%

Publicador:

Resumo:

While over-dispersion in capture–recapture studies is well known to lead to poor estimation of population size, current diagnostic tools to detect the presence of heterogeneity have not been specifically developed for capture–recapture studies. To address this, a simple and efficient method of testing for over-dispersion in zero-truncated count data is developed and evaluated. The proposed method generalizes an over-dispersion test previously suggested for un-truncated count data and may also be used for testing residual over-dispersion in zero-inflation data. Simulations suggest that the asymptotic distribution of the test statistic is standard normal and that this approximation is also reasonable for small sample sizes. The method is also shown to be more efficient than an existing test for over-dispersion adapted for the capture–recapture setting. Studies with zero-truncated and zero-inflated count data are used to illustrate the test procedures.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

To construct Biodiversity richness maps from Environmental Niche Models (ENMs) of thousands of species is time consuming. A separate species occurrence data pre-processing phase enables the experimenter to control test AUC score variance due to species dataset size. Besides, removing duplicate occurrences and points with missing environmental data, we discuss the need for coordinate precision, wide dispersion, temporal and synonymity filters. After species data filtering, the final task of a pre-processing phase should be the automatic generation of species occurrence datasets which can then be directly ’plugged-in’ to the ENM. A software application capable of carrying out all these tasks will be a valuable time-saver particularly for large scale biodiversity studies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A score test is developed for binary clinical trial data, which incorporates patient non-compliance while respecting randomization. It is assumed in this paper that compliance is all-or-nothing, in the sense that a patient either accepts all of the treatment assigned as specified in the protocol, or none of it. Direct analytic comparisons of the adjusted test statistic for both the score test and the likelihood ratio test are made with the corresponding test statistics that adhere to the intention-to-treat principle. It is shown that no gain in power is possible over the intention-to-treat analysis, by adjusting for patient non-compliance. Sample size formulae are derived and simulation studies are used to demonstrate that the sample size approximation holds. Copyright © 2003 John Wiley & Sons, Ltd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

It is generally assumed that the variability of neuronal morphology has an important effect on both the connectivity and the activity of the nervous system, but this effect has not been thoroughly investigated. Neuroanatomical archives represent a crucial tool to explore structure–function relationships in the brain. We are developing computational tools to describe, generate, store and render large sets of three–dimensional neuronal structures in a format that is compact, quantitative, accurate and readily accessible to the neuroscientist. Single–cell neuroanatomy can be characterized quantitatively at several levels. In computer–aided neuronal tracing files, a dendritic tree is described as a series of cylinders, each represented by diameter, spatial coordinates and the connectivity to other cylinders in the tree. This ‘Cartesian’ description constitutes a completely accurate mapping of dendritic morphology but it bears little intuitive information for the neuroscientist. In contrast, a classical neuroanatomical analysis characterizes neuronal dendrites on the basis of the statistical distributions of morphological parameters, e.g. maximum branching order or bifurcation asymmetry. This description is intuitively more accessible, but it only yields information on the collective anatomy of a group of dendrites, i.e. it is not complete enough to provide a precise ‘blueprint’ of the original data. We are adopting a third, intermediate level of description, which consists of the algorithmic generation of neuronal structures within a certain morphological class based on a set of ‘fundamental’, measured parameters. This description is as intuitive as a classical neuroanatomical analysis (parameters have an intuitive interpretation), and as complete as a Cartesian file (the algorithms generate and display complete neurons). The advantages of the algorithmic description of neuronal structure are immense. If an algorithm can measure the values of a handful of parameters from an experimental database and generate virtual neurons whose anatomy is statistically indistinguishable from that of their real counterparts, a great deal of data compression and amplification can be achieved. Data compression results from the quantitative and complete description of thousands of neurons with a handful of statistical distributions of parameters. Data amplification is possible because, from a set of experimental neurons, many more virtual analogues can be generated. This approach could allow one, in principle, to create and store a neuroanatomical database containing data for an entire human brain in a personal computer. We are using two programs, L–NEURON and ARBORVITAE, to investigate systematically the potential of several different algorithms for the generation of virtual neurons. Using these programs, we have generated anatomically plausible virtual neurons for several morphological classes, including guinea pig cerebellar Purkinje cells and cat spinal cord motor neurons. These virtual neurons are stored in an online electronic archive of dendritic morphology. This process highlights the potential and the limitations of the ‘computational neuroanatomy’ strategy for neuroscience databases.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Various studies investigating the future impacts of integrating high levels of renewable energy make use of historical meteorological (met) station data to produce estimates of future generation. Hourly means of 10m horizontal wind are extrapolated to a standard turbine hub height using the wind profile power or log law and used to simulate the hypothetical power output of a turbine at that location; repeating this procedure using many viable locations can produce a picture of future electricity generation. However, the estimate of hub height wind speed is dependent on the choice of the wind shear exponent a or the roughness length z0, and requires a number of simplifying assumptions. This paper investigates the sensitivity of this estimation on generation output using a case study of a met station in West Freugh, Scotland. The results show that the choice of wind shear exponent is a particularly sensitive parameter which can lead to significant variation of estimated hub height wind speed and hence estimated future generation potential of a region.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Meteorological (met) station data is used as the basis for a number of influential studies into the impacts of the variability of renewable resources. Real turbine output data is not often easy to acquire, whereas meteorological wind data, supplied at a standardised height of 10 m, is widely available. This data can be extrapolated to a standard turbine height using the wind profile power law and used to simulate the hypothetical power output of a turbine. Utilising a number of met sites in such a manner can develop a model of future wind generation output. However, the accuracy of this extrapolation is strongly dependent on the choice of the wind shear exponent alpha. This paper investigates the accuracy of the simulated generation output compared to reality using a wind farm in North Rhins, Scotland and a nearby met station in West Freugh. The results show that while a single annual average value for alpha may be selected to accurately represent the long term energy generation from a simulated wind farm, there are significant differences between simulation and reality on an hourly power generation basis, with implications for understanding the impact of variability of renewables on short timescales, particularly system balancing and the way that conventional generation may be asked to respond to a high level of variable renewable generation on the grid in the future.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

As wind generation increases, system impact studies rely on predictions of future generation and effective representation of wind variability. A well-established approach to investigate the impact of wind variability is to simulate generation using observations from 10 m meteorological mast-data. However, there are problems with relying purely on historical wind-speed records or generation histories: mast-data is often incomplete, not sited at a relevant wind generation sites, and recorded at the wrong altitude above ground (usually 10 m), each of which may distort the generation profile. A possible complimentary approach is to use reanalysis data, where data assimilation techniques are combined with state-of-the-art weather forecast models to produce complete gridded wind time-series over an area. Previous investigations of reanalysis datasets have placed an emphasis on comparing reanalysis to meteorological site records whereas this paper compares wind generation simulated using reanalysis data directly against historic wind generation records. Importantly, this comparison is conducted using raw reanalysis data (typical resolution ∼50 km), without relying on a computationally expensive “dynamical downscaling” for a particular target region. Although the raw reanalysis data cannot, by nature of its construction, represent the site-specific effects of sub-gridscale topography, it is nevertheless shown to be comparable to or better than the mast-based simulation in the region considered and it is therefore argued that raw reanalysis data may offer a number of significant advantages as a data source.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The MATLAB model is contained within the compressed folders (versions are available as .zip and .tgz). This model uses MERRA reanalysis data (>34 years available) to estimate the hourly aggregated wind power generation for a predefined (fixed) distribution of wind farms. A ready made example is included for the wind farm distribution of Great Britain, April 2014 ("CF.dat"). This consists of an hourly time series of GB-total capacity factor spanning the period 1980-2013 inclusive. Given the global nature of reanalysis data, the model can be applied to any specified distribution of wind farms in any region of the world. Users are, however, strongly advised to bear in mind the limitations of reanalysis data when using this model/data. This is discussed in our paper: Cannon, Brayshaw, Methven, Coker, Lenaghan. "Using reanalysis data to quantify extreme wind power generation statistics: a 33 year case study in Great Britain". Submitted to Renewable Energy in March, 2014. Additional information about the model is contained in the model code itself, in the accompanying ReadMe file, and on our website: http://www.met.reading.ac.uk/~energymet/data/Cannon2014/

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With a rapidly increasing fraction of electricity generation being sourced from wind, extreme wind power generation events such as prolonged periods of low (or high) generation and ramps in generation, are a growing concern for the efficient and secure operation of national power systems. As extreme events occur infrequently, long and reliable meteorological records are required to accurately estimate their characteristics. Recent publications have begun to investigate the use of global meteorological “reanalysis” data sets for power system applications, many of which focus on long-term average statistics such as monthly-mean generation. Here we demonstrate that reanalysis data can also be used to estimate the frequency of relatively short-lived extreme events (including ramping on sub-daily time scales). Verification against 328 surface observation stations across the United Kingdom suggests that near-surface wind variability over spatiotemporal scales greater than around 300 km and 6 h can be faithfully reproduced using reanalysis, with no need for costly dynamical downscaling. A case study is presented in which a state-of-the-art, 33 year reanalysis data set (MERRA, from NASA-GMAO), is used to construct an hourly time series of nationally-aggregated wind power generation in Great Britain (GB), assuming a fixed, modern distribution of wind farms. The resultant generation estimates are highly correlated with recorded data from National Grid in the recent period, both for instantaneous hourly values and for variability over time intervals greater than around 6 h. This 33 year time series is then used to quantify the frequency with which different extreme GB-wide wind power generation events occur, as well as their seasonal and inter-annual variability. Several novel insights into the nature of extreme wind power generation events are described, including (i) that the number of prolonged low or high generation events is well approximated by a Poission-like random process, and (ii) whilst in general there is large seasonal variability, the magnitude of the most extreme ramps is similar in both summer and winter. An up-to-date version of the GB case study data as well as the underlying model are freely available for download from our website: http://www.met.reading.ac.uk/~energymet/data/Cannon2014/.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Using data from the EISCAT (European Incoherent Scatter) VHF and CUTLASS (Co-operative UK Twin- Located Auroral Sounding System) HF radars, we study the formation of ionospheric polar cap patches and their relationship to the magnetopause reconnection pulses identified in the companion paper by Lockwood et al. (2005). It is shown that the poleward-moving, high-concentration plasma patches observed in the ionosphere by EISCAT on 23 November 1999, as reported by Davies et al. (2002), were often associated with corresponding reconnection rate pulses. However, not all such pulses generated a patch and only within a limited MLT range (11:00–12:00 MLT) did a patch result from a reconnection pulse. Three proposed mechanisms for the production of patches, and of the concentration minima that separate them, are analysed and evaluated: (1) concentration enhancement within the patches by cusp/cleft precipitation; (2) plasma depletion in the minima between the patches by fast plasma flows; and (3) intermittent injection of photoionisation-enhanced plasma into the polar cap. We devise a test to distinguish between the effects of these mechanisms. Some of the events repeat too frequently to apply the test. Others have sufficiently long repeat periods and mechanism (3) is shown to be the only explanation of three of the longer-lived patches seen on this day. However, effect (2) also appears to contribute to some events. We conclude that plasma concentration gradients on the edges of the larger patches arise mainly from local time variations in the subauroral plasma, via the mechanism proposed by Lockwood et al. (2000).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Land cover plays a key role in global to regional monitoring and modeling because it affects and is being affected by climate change and thus became one of the essential variables for climate change studies. National and international organizations require timely and accurate land cover information for reporting and management actions. The North American Land Change Monitoring System (NALCMS) is an international cooperation of organizations and entities of Canada, the United States, and Mexico to map land cover change of North America's changing environment. This paper presents the methodology to derive the land cover map of Mexico for the year 2005 which was integrated in the NALCMS continental map. Based on a time series of 250 m Moderate Resolution Imaging Spectroradiometer (MODIS) data and an extensive sample data base the complexity of the Mexican landscape required a specific approach to reflect land cover heterogeneity. To estimate the proportion of each land cover class for every pixel several decision tree classifications were combined to obtain class membership maps which were finally converted to a discrete map accompanied by a confidence estimate. The map yielded an overall accuracy of 82.5% (Kappa of 0.79) for pixels with at least 50% map confidence (71.3% of the data). An additional assessment with 780 randomly stratified samples and primary and alternative calls in the reference data to account for ambiguity indicated 83.4% overall accuracy (Kappa of 0.80). A high agreement of 83.6% for all pixels and 92.6% for pixels with a map confidence of more than 50% was found for the comparison between the land cover maps of 2005 and 2006. Further wall-to-wall comparisons to related land cover maps resulted in 56.6% agreement with the MODIS land cover product and a congruence of 49.5 with Globcover.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Accurate knowledge of species’ habitat associations is important for conservation planning and policy. Assessing habitat associations is a vital precursor to selecting appropriate indicator species for prioritising sites for conservation or assessing trends in habitat quality. However, much existing knowledge is based on qualitative expert opinion or local scale studies, and may not remain accurate across different spatial scales or geographic locations. Data from biological recording schemes have the potential to provide objective measures of habitat association, with the ability to account for spatial variation. We used data on 50 British butterfly species as a test case to investigate the correspondence of data-derived measures of habitat association with expert opinion, from two different butterfly recording schemes. One scheme collected large quantities of occurrence data (c. 3 million records) and the other, lower quantities of standardised monitoring data (c. 1400 sites). We used general linear mixed effects models to derive scores of association with broad-leaf woodland for both datasets and compared them with scores canvassed from experts. Scores derived from occurrence and abundance data both showed strongly positive correlations with expert opinion. However, only for occurrence data did these fell within the range of correlations between experts. Data-derived scores showed regional spatial variation in the strength of butterfly associations with broad-leaf woodland, with a significant latitudinal trend in 26% of species. Sub-sampling of the data suggested a mean sample size of 5000 occurrence records per species to gain an accurate estimation of habitat association, although habitat specialists are likely to be readily detected using several hundred records. Occurrence data from recording schemes can thus provide easily obtained, objective, quantitative measures of habitat association.