925 resultados para data-types
Resumo:
The elucidation of spatial variation in the landscape can indicate potential wildlife habitats or breeding sites for vectors, such as ticks or mosquitoes, which cause a range of diseases. Information from remotely sensed data could aid the delineation of vegetation distribution on the ground in areas where local knowledge is limited. The data from digital images are often difficult to interpret because of pixel-to-pixel variation, that is, noise, and complex variation at more than one spatial scale. Landsat Thematic Mapper Plus (ETM+) and Satellite Pour l'Observation de La Terre (SPOT) image data were analyzed for an area close to Douna in Mali, West Africa. The variograms of the normalized difference vegetation index (NDVI) from both types of image data were nested. The parameters of the nested variogram function from the Landsat ETM+ data were used to design the sampling for a ground survey of soil and vegetation data. Variograms of the soil and vegetation data showed that their variation was anisotropic and their scales of variation were similar to those of NDVI from the SPOT data. The short- and long-range components of variation in the SPOT data were filtered out separately by factorial kriging. The map of the short-range component appears to represent the patterns of vegetation and associated shallow slopes and drainage channels of the tiger bush system. The map of the long-range component also appeared to relate to broader patterns in the tiger bush and to gentle undulations in the topography. The results suggest that the types of image data analyzed in this study could be used to identify areas with more moisture in semiarid regions that could support wildlife and also be potential vector breeding sites.
Progress on “Changing coastlines: data assimilation for morphodynamic prediction and predictability”
Resumo:
The task of assessing the likelihood and extent of coastal flooding is hampered by the lack of detailed information on near-shore bathymetry. This is required as an input for coastal inundation models, and in some cases the variability in the bathymetry can impact the prediction of those areas likely to be affected by flooding in a storm. The constant monitoring and data collection that would be required to characterise the near-shore bathymetry over large coastal areas is impractical, leaving the option of running morphodynamic models to predict the likely bathymetry at any given time. However, if the models are inaccurate the errors may be significant if incorrect bathymetry is used to predict possible flood risks. This project is assessing the use of data assimilation techniques to improve the predictions from a simple model, by rigorously incorporating observations of the bathymetry into the model, to bring the model closer to the actual situation. Currently we are concentrating on Morecambe Bay as a primary study site, as it has a highly dynamic inter-tidal zone, with changes in the course of channels in this zone impacting the likely locations of flooding from storms. We are working with SAR images, LiDAR, and swath bathymetry to give us the observations over a 2.5 year period running from May 2003 – November 2005. We have a LiDAR image of the entire inter-tidal zone for November 2005 to use as validation data. We have implemented a 3D-Var data assimilation scheme, to investigate the improvements in performance of the data assimilation compared to the previous scheme which was based on the optimal interpolation method. We are currently evaluating these different data assimilation techniques, using 22 SAR data observations. We will also include the LiDAR data and swath bathymetry to improve the observational coverage, and investigate the impact of different types of observation on the predictive ability of the model. We are also assessing the ability of the data assimilation scheme to recover the correct bathymetry after storm events, which can dramatically change the bathymetry in a short period of time.
Resumo:
A Bayesian method of classifying observations that are assumed to come from a number of distinct subpopulations is outlined. The method is illustrated with simulated data and applied to the classification of farms according to their level and variability of income. The resultant classification shows a greater diversity of technical charactersitics within farm types than is conventionally the case. The range of mean farm income between groups in the new classification is wider than that of the conventional method and the variability of income within groups is narrower. Results show that the highest income group in 2000 included large specialist dairy farmers and pig and poultry producers, whilst in 2001 it included large and small specialist dairy farms and large mixed dairy and arable farms. In both years the lowest income group is dominated by non-milk producing livestock farms.
Resumo:
Background: Meta-analyses based on individual patient data (IPD) are regarded as the gold standard for systematic reviews. However, the methods used for analysing and presenting results from IPD meta-analyses have received little discussion. Methods We review 44 IPD meta-analyses published during the years 1999–2001. We summarize whether they obtained all the data they sought, what types of approaches were used in the analysis, including assumptions of common or random effects, and how they examined the effects of covariates. Results: Twenty-four out of 44 analyses focused on time-to-event outcomes, and most analyses (28) estimated treatment effects within each trial and then combined the results assuming a common treatment effect across trials. Three analyses failed to stratify by trial, analysing the data is if they came from a single mega-trial. Only nine analyses used random effects methods. Covariate-treatment interactions were generally investigated by subgrouping patients. Seven of the meta-analyses included data from less than 80% of the randomized patients sought, but did not address the resulting potential biases. Conclusions: Although IPD meta-analyses have many advantages in assessing the effects of health care, there are several aspects that could be further developed to make fuller use of the potential of these time-consuming projects. In particular, IPD could be used to more fully investigate the influence of covariates on heterogeneity of treatment effects, both within and between trials. The impact of heterogeneity, or use of random effects, are seldom discussed. There is thus considerable scope for enhancing the methods of analysis and presentation of IPD meta-analysis.
Resumo:
The influence matrix is used in ordinary least-squares applications for monitoring statistical multiple-regression analyses. Concepts related to the influence matrix provide diagnostics on the influence of individual data on the analysis - the analysis change that would occur by leaving one observation out, and the effective information content (degrees of freedom for signal) in any sub-set of the analysed data. In this paper, the corresponding concepts have been derived in the context of linear statistical data assimilation in numerical weather prediction. An approximate method to compute the diagonal elements of the influence matrix (the self-sensitivities) has been developed for a large-dimension variational data assimilation system (the four-dimensional variational system of the European Centre for Medium-Range Weather Forecasts). Results show that, in the boreal spring 2003 operational system, 15% of the global influence is due to the assimilated observations in any one analysis, and the complementary 85% is the influence of the prior (background) information, a short-range forecast containing information from earlier assimilated observations. About 25% of the observational information is currently provided by surface-based observing systems, and 75% by satellite systems. Low-influence data points usually occur in data-rich areas, while high-influence data points are in data-sparse areas or in dynamically active regions. Background-error correlations also play an important role: high correlation diminishes the observation influence and amplifies the importance of the surrounding real and pseudo observations (prior information in observation space). Incorrect specifications of background and observation-error covariance matrices can be identified, interpreted and better understood by the use of influence-matrix diagnostics for the variety of observation types and observed variables used in the data assimilation system. Copyright © 2004 Royal Meteorological Society
Resumo:
Stephens and Donnelly have introduced a simple yet powerful importance sampling scheme for computing the likelihood in population genetic models. Fundamental to the method is an approximation to the conditional probability of the allelic type of an additional gene, given those currently in the sample. As noted by Li and Stephens, the product of these conditional probabilities for a sequence of draws that gives the frequency of allelic types in a sample is an approximation to the likelihood, and can be used directly in inference. The aim of this note is to demonstrate the high level of accuracy of "product of approximate conditionals" (PAC) likelihood when used with microsatellite data. Results obtained on simulated microsatellite data show that this strategy leads to a negligible bias over a wide range of the scaled mutation parameter theta. Furthermore, the sampling variance of likelihood estimates as well as the computation time are lower than that obtained with importance sampling on the whole range of theta. It follows that this approach represents an efficient substitute to IS algorithms in computer intensive (e.g. MCMC) inference methods in population genetics. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
Termites are an important component of tropical soil communities and have a significant affect on the structure and nutrient content of soil. Digestion in termites is related to gut structure, gut physico-chemical conditions and gut symbiotic microbiota. Here we describe the use of 16S rRNA gene sequencing and Terminal-restriction Fragment Length Polymorphism (T-RFLP) analysis to examine methanogenic Archaea (MA) in the guts and food-soil of the soil-feeder Cubitermes fungifaber Sjostedt across a range of soil types. If they are strictly vertically inherited, then MA in guts should be the same in all individuals even if the soils differ across sites. In contrast, gut MA should reflect what is present in soil if populations are merely a reflection of what is ingested as the insects forage. We show clear differences between the euryarchaeal communities in termite guts and in food-soils from five different sites. Analysis of 16S rRNA gene clones indicated little overlap between the gut and soil communities. Gut clones were related to a termite-derived Methanomicrobiales cluster, to Methanobrevibacter and, surprisingly, to the haloalkaliphile Natronococcus. Soil clones clustered with Methanosarcina, Methanomicrococcus or Rice Cluster I. T-RFLP analysis indicated that the archaeal communities in the soil samples differed from site to site, whereas those in termite guts were similar between sites. There was some overlap between the gut and soil communities but these may represent transient populations in either guts or soil. Our data does not support the hypothesis that termite gut MA are derived from their food soil but also does not support a purely vertical transmission of gut microflora.
Resumo:
Termites are an important component of tropical soil communities and have a significant effect on the structure and nutrient content of soil. Digestion in termites is related to gut structure, gut physicochemical conditions, and gut symbiotic microbiota. Here we describe the use of 16S rRNA gene sequencing and terminal-restriction fragment length polymorphism (T-RFLP) analysis to examine methanogenic archaea (MA) in the guts and food-soil of the soil-feeder Cubitermes fungifaber Sjostedt across a range of soil types. If these MA are strictly vertically inherited, then the MA in guts should be the same in all individuals even if the soils differ across sites. In contrast, gut MA should reflect what is present in soil if populations are merely a reflection of what is ingested as the insects forage. We show clear differences between the euryarchaeal communities in termite guts and in food-soils from five different sites. Analysis of 16S rRNA gene clones indicated little overlap between the gut and soil communities. Gut clones were related to a termite-derived Methanomicrobiales cluster, to Methanobrevibacter and, surprisingly, to the haloalkaliphile Natronococcus. Soil clones clustered with Methanosarcina, Methanomicrococcus, or rice cluster I. T-RFLP analysis indicated that the archaeal communities in the soil samples differed from site to site, whereas those in termite guts were similar between sites. There was some overlap between the gut and soil communities, but these may represent transient populations in either guts or soil. Our data do not support the hypothesis that termite gut MA are derived from their food-soil but also do not support a purely vertical transmission of gut microflora.
Resumo:
The iRODS system, created by the San Diego Supercomputing Centre, is a rule oriented data management system that allows the user to create sets of rules to define how the data is to be managed. Each rule corresponds to a particular action or operation (such as checksumming a file) and the system is flexible enough to allow the user to create new rules for new types of operations. The iRODS system can interface to any storage system (provided an iRODS driver is built for that system) and relies on its’ metadata catalogue to provide a virtual file-system that can handle files of any size and type. However, some storage systems (such as tape systems) do not handle small files efficiently and prefer small files to be packaged up (or “bundled”) into larger units. We have developed a system that can bundle small data files of any type into larger units - mounted collections. The system can create collection families and contains its’ own extensible metadata, including metadata on which family the collection belongs to. The mounted collection system can work standalone and is being incorporated into the iRODS system to enhance the systems flexibility to handle small files. In this paper we describe the motivation for creating a mounted collection system, its’ architecture and how it has been incorporated into the iRODS system. We describe different technologies used to create the mounted collection system and provide some performance numbers.
Resumo:
Agri-environment schemes (AESs) have been implemented across EU member states in an attempt to reconcile agricultural production methods with protection of the environment and maintenance of the countryside. To determine the extent to which such policy objectives are being fulfilled, participating countries are obliged to monitor and evaluate the environmental, agricultural and socio-economic impacts of their AESs. However, few evaluations measure precise environmental outcomes and critically, there are no agreed methodologies to evaluate the benefits of particular agri-environmental measures, or to track the environmental consequences of changing agricultural practices. In response to these issues, the Agri-Environmental Footprint project developed a common methodology for assessing the environmental impact of European AES. The Agri-Environmental Footprint Index (AFI) is a farm-level, adaptable methodology that aggregates measurements of agri-environmental indicators based on Multi-Criteria Analysis (MCA) techniques. The method was developed specifically to allow assessment of differences in the environmental performance of farms according to participation in agri-environment schemes. The AFI methodology is constructed so that high values represent good environmental performance. This paper explores the use of the AFI methodology in combination with Farm Business Survey data collected in England for the Farm Accountancy Data Network (FADN), to test whether its use could be extended for the routine surveillance of environmental performance of farming systems using established data sources. Overall, the aim was to measure the environmental impact of three different types of agriculture (arable, lowland livestock and upland livestock) in England and to identify differences in AFI due to participation in agri-environment schemes. However, because farm size, farmer age, level of education and region are also likely to influence the environmental performance of a holding, these factors were also considered. Application of the methodology revealed that only arable holdings participating in agri-environment schemes had a greater environmental performance, although responses differed between regions. Of the other explanatory variables explored, the key factors determining the environmental performance for lowland livestock holdings were farm size, farmer age and level of education. In contrast, the AFI value of upland livestock holdings differed only between regions. The paper demonstrates that the AFI methodology can be used readily with English FADN data and therefore has the potential to be applied more widely to similar data sources routinely collected across the EU-27 in a standardised manner.
Resumo:
This study focuses on the occurrence and type of clouds observed in West Africa, a subject which has neither been much documented nor quantified. It takes advantage of data collected above Niamey in 2006 with the ARM mobile facility. A survey of cloud characteristics inferred from ground measurements is presented with a focus on their seasonal evolution and diurnal cycle. Four types of clouds are distinguished: high-level clouds, deep convective clouds, shallow convective clouds and mid-level clouds. A frequent occurrence of the latter clouds located at the top of the Saharan Air Layer is highlighted. High-level clouds are ubiquitous throughout the period whereas shallow convective clouds are mainly noticeable during the core of the monsoon. The diurnal cycle of each cloud category and its seasonal evolution is investigated. CloudSat and CALIPSO data are used in order to demonstrate that these four cloud types (in addition to stratocumulus clouds over the ocean) are not a particularity of the Niamey region and that mid-level clouds are present over the Sahara during most of the Monsoon season. Moreover, using complementary data sets, the radiative impact of each type of clouds at the surface level has been quantified in the shortwave and longwave domain. Mid-level clouds and anvil clouds have the largest impact respectively in longwave (about 15 W m−2) and the shortwave (about 150 W m−2). Furthermore, mid-level clouds exert a strong radiative forcing in Spring at a time when the other cloud types are less numerous.
Resumo:
The ASTER Global Digital Elevation Model (GDEM) has made elevation data at 30 m spatial resolution freely available, enabling reinvestigation of morphometric relationships derived from limited field data using much larger sample sizes. These data are used to analyse a range of morphometric relationships derived for dunes (between dune height, spacing, and equivalent sand thickness) in the Namib Sand Sea, which was chosen because there are a number of extant studies that could be used for comparison with the results. The relative accuracy of GDEM for capturing dune height and shape was tested against multiple individual ASTER DEM scenes and against field surveys, highlighting the smoothing of the dune crest and resultant underestimation of dune height, and the omission of the smallest dunes, because of the 30 m sampling of ASTER DEM products. It is demonstrated that morphometric relationships derived from GDEM data are broadly comparable with relationships derived by previous methods, across a range of different dune types. The data confirm patterns of dune height, spacing and equivalent sand thickness mapped previously in the Namib Sand Sea, but add new detail to these patterns.
Resumo:
This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax.
Resumo:
Current methods for estimating vegetation parameters are generally sub-optimal in the way they exploit information and do not generally consider uncertainties. We look forward to a future where operational dataassimilation schemes improve estimates by tracking land surface processes and exploiting multiple types of observations. Dataassimilation schemes seek to combine observations and models in a statistically optimal way taking into account uncertainty in both, but have not yet been much exploited in this area. The EO-LDAS scheme and prototype, developed under ESA funding, is designed to exploit the anticipated wealth of data that will be available under GMES missions, such as the Sentinel family of satellites, to provide improved mapping of land surface biophysical parameters. This paper describes the EO-LDAS implementation, and explores some of its core functionality. EO-LDAS is a weak constraint variational dataassimilationsystem. The prototype provides a mechanism for constraint based on a prior estimate of the state vector, a linear dynamic model, and EarthObservationdata (top-of-canopy reflectance here). The observation operator is a non-linear optical radiative transfer model for a vegetation canopy with a soil lower boundary, operating over the range 400 to 2500 nm. Adjoint codes for all model and operator components are provided in the prototype by automatic differentiation of the computer codes. In this paper, EO-LDAS is applied to the problem of daily estimation of six of the parameters controlling the radiative transfer operator over the course of a year (> 2000 state vector elements). Zero and first order process model constraints are implemented and explored as the dynamic model. The assimilation estimates all state vector elements simultaneously. This is performed in the context of a typical Sentinel-2 MSI operating scenario, using synthetic MSI observations simulated with the observation operator, with uncertainties typical of those achieved by optical sensors supposed for the data. The experiments consider a baseline state vector estimation case where dynamic constraints are applied, and assess the impact of dynamic constraints on the a posteriori uncertainties. The results demonstrate that reductions in uncertainty by a factor of up to two might be obtained by applying the sorts of dynamic constraints used here. The hyperparameter (dynamic model uncertainty) required to control the assimilation are estimated by a cross-validation exercise. The result of the assimilation is seen to be robust to missing observations with quite large data gaps.
Resumo:
OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.