931 resultados para large data sets


Relevância:

90.00% 90.00%

Publicador:

Resumo:

CO, O3, and H2O data in the upper troposphere/lower stratosphere (UTLS) measured by the Atmospheric Chemistry Experiment Fourier Transform Spectrometer(ACE-FTS) on Canada’s SCISAT-1 satellite are validated using aircraft and ozonesonde measurements. In the UTLS, validation of chemical trace gas measurements is a challenging task due to small-scale variability in the tracer fields, strong gradients of the tracers across the tropopause, and scarcity of measurements suitable for validation purposes. Validation based on coincidences therefore suffers from geophysical noise. Two alternative methods for the validation of satellite data are introduced, which avoid the usual need for coincident measurements: tracer-tracer correlations, and vertical tracer profiles relative to tropopause height. Both are increasingly being used for model validation as they strongly suppress geophysical variability and thereby provide an “instantaneous climatology”. This allows comparison of measurements between non-coincident data sets which yields information about the precision and a statistically meaningful error-assessment of the ACE-FTS satellite data in the UTLS. By defining a trade-off factor, we show that the measurement errors can be reduced by including more measurements obtained over a wider longitude range into the comparison, despite the increased geophysical variability. Applying the methods then yields the following upper bounds to the relative differences in the mean found between the ACE-FTS and SPURT aircraft measurements in the upper troposphere (UT) and lower stratosphere (LS), respectively: for CO ±9% and ±12%, for H2O ±30% and ±18%, and for O3 ±25% and ±19%. The relative differences for O3 can be narrowed down by using a larger dataset obtained from ozonesondes, yielding a high bias in the ACEFTS measurements of 18% in the UT and relative differences of ±8% for measurements in the LS. When taking into account the smearing effect of the vertically limited spacing between measurements of the ACE-FTS instrument, the relative differences decrease by 5–15% around the tropopause, suggesting a vertical resolution of the ACE-FTS in the UTLS of around 1 km. The ACE-FTS hence offers unprecedented precision and vertical resolution for a satellite instrument, which will allow a new global perspective on UTLS tracer distributions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Current methods for estimating vegetation parameters are generally sub-optimal in the way they exploit information and do not generally consider uncertainties. We look forward to a future where operational dataassimilation schemes improve estimates by tracking land surface processes and exploiting multiple types of observations. Dataassimilation schemes seek to combine observations and models in a statistically optimal way taking into account uncertainty in both, but have not yet been much exploited in this area. The EO-LDAS scheme and prototype, developed under ESA funding, is designed to exploit the anticipated wealth of data that will be available under GMES missions, such as the Sentinel family of satellites, to provide improved mapping of land surface biophysical parameters. This paper describes the EO-LDAS implementation, and explores some of its core functionality. EO-LDAS is a weak constraint variational dataassimilationsystem. The prototype provides a mechanism for constraint based on a prior estimate of the state vector, a linear dynamic model, and EarthObservationdata (top-of-canopy reflectance here). The observation operator is a non-linear optical radiative transfer model for a vegetation canopy with a soil lower boundary, operating over the range 400 to 2500 nm. Adjoint codes for all model and operator components are provided in the prototype by automatic differentiation of the computer codes. In this paper, EO-LDAS is applied to the problem of daily estimation of six of the parameters controlling the radiative transfer operator over the course of a year (> 2000 state vector elements). Zero and first order process model constraints are implemented and explored as the dynamic model. The assimilation estimates all state vector elements simultaneously. This is performed in the context of a typical Sentinel-2 MSI operating scenario, using synthetic MSI observations simulated with the observation operator, with uncertainties typical of those achieved by optical sensors supposed for the data. The experiments consider a baseline state vector estimation case where dynamic constraints are applied, and assess the impact of dynamic constraints on the a posteriori uncertainties. The results demonstrate that reductions in uncertainty by a factor of up to two might be obtained by applying the sorts of dynamic constraints used here. The hyperparameter (dynamic model uncertainty) required to control the assimilation are estimated by a cross-validation exercise. The result of the assimilation is seen to be robust to missing observations with quite large data gaps.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

 In the last decade, a vast number of land surface schemes has been designed for use in global climate models, atmospheric weather prediction, mesoscale numerical models, ecological models, and models of global changes. Since land surface schemes are designed for different purposes they have various levels of complexity in the treatment of bare soil processes, vegetation, and soil water movement. This paper is a contribution to a little group of papers dealing with intercomparison of differently designed and oriented land surface schemes. For that purpose we have chosen three schemes for classification: i) global climate models, BATS (Dickinson et al., 1986; Dickinson et al., 1992); ii) mesoscale and ecological models, LEAF (Lee, 1992) and iii) mesoscale models, LAPS (Mihailović, 1996; Mihailović and Kallos, 1997; Mihailović et al., 1999) according to the Shao et al. (1995) classification. These schemes were compared using surface fluxes and leaf temperature outputs obtained by time integrations of data sets derived from the micrometeorological measurements above a maize field at an experimental site in De Sinderhoeve (The Netherlands) for 18 August, 8 September, and 4 October 1988. Finally, comparison of the schemes was supported applying a simple statistical analysis on the surface flux outputs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A model for estimating the turbulent kinetic energy dissipation rate in the oceanic boundary layer, based on insights from rapid-distortion theory, is presented and tested. This model provides a possible explanation for the very high dissipation levels found by numerous authors near the surface. It is conceived that turbulence, injected into the water by breaking waves, is subsequently amplified due to its distortion by the mean shear of the wind-induced current and straining by the Stokes drift of surface waves. The partition of the turbulent shear stress into a shear-induced part and a wave-induced part is taken into account. In this picture, dissipation enhancement results from the same mechanism responsible for Langmuir circulations. Apart from a dimensionless depth and an eddy turn-over time, the dimensionless dissipation rate depends on the wave slope and wave age, which may be encapsulated in the turbulent Langmuir number La_t. For large La_t, or any Lat but large depth, the dissipation rate tends to the usual surface layer scaling, whereas when Lat is small, it is strongly enhanced near the surface, growing asymptotically as ɛ ∝ La_t^{-2} when La_t → 0. Results from this model are compared with observations from the WAVES and SWADE data sets, assuming that this is the dominant dissipation mechanism acting in the ocean surface layer and statistical measures of the corresponding fit indicate a substantial improvement over previous theoretical models. Comparisons are also carried out against more recent measurements, showing good order-of-magnitude agreement, even when shallow-water effects are important.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the increasing awareness of protein folding disorders, the explosion of genomic information, and the need for efficient ways to predict protein structure, protein folding and unfolding has become a central issue in molecular sciences research. Molecular dynamics computer simulations are increasingly employed to understand the folding and unfolding of proteins. Running protein unfolding simulations is computationally expensive and finding ways to enhance performance is a grid issue on its own. However, more and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. This paper describes efforts to provide a grid-enabled data warehouse for protein unfolding data. We outline the challenge and present first results in the design and implementation of the data warehouse.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Polycyclic aromatic hydrocarbons (PAHs) are ubiquitous environmental pollutants that frequently accumulate in soils. There is therefore a requirement to determine their levels in contaminated environments for the purposes of determining impacts on human health. PAHs are a suite of individual chemicals, and there is an ongoing debate as to the most appropriate method for assessing the risk to humans from them. Two methods predominate: the surrogate marker approach and the toxic equivalency factor. The former assumes that all chemicals in a mixture have an equivalent toxicity. The toxic equivalency approach estimates the potency of individual chemicals relative to the usually most toxic Benzo(a)pyrene. The surrogate marker approach is believed to overestimate risk and the toxic equivalency factor to underestimate risk. When analysing the risks from soils, the surrogate marker approach is preferred due to its simplicity, but there are concerns because of the potential diversity of the PAH profile across the range of impacted soils. Using two independent data sets containing soils from 274 sites across a diverse range of locations, statistical analysis was undertaken to determine the differences in the composition of carcinogenic PAH between site locations, for example, rural versus industrial. Following principal components analysis, distinct population differences were not seen between site locations in spite of large differences in the total PAH burden between individual sites. Using all data, highly significant correlations were seen between BaP and other carcinogenic PAH with the majority of r2 values > 0.8. Correlations with the European Food Standards Agency (EFSA) summed groups, that is, EFSA2, EFSA4 and EFSA8 had even higher correlations (r2 > 0.95). We therefore conclude that BaP is a suitable surrogate marker to represent mixtures of PAH in soil during risk assessments.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Although grasslands are crucial habitats for European butterflies, large-scale declines in quality and area have devastated many species. Grassland restoration can contribute to the recovery of butterfly populations, although there is a paucity of information on the long-term effects of management. Using eight UK data sets (9-21 years), we investigate changes in restoration success for (1) arable reversion sites, were grassland was established on bare ground using seed mixtures, and (2) grassland enhancement sites, where degraded grasslands are restored by scrub removal followed by the re-instigation of cutting/grazing. We also assessed the importance of individual butterfly traits and ecological characteristics in determining colonisation times. Consistent increases in restoration success over time were seen for arable reversion sites, with the most rapid rates of increase in restoration success seen over the first 10 years. For grasslands enhancement there were no consistent increases in restoration success over time. Butterfly colonisation times were fastest for species with widespread host plants or where host plants established well during restoration. Low mobility butterfly species took longer to colonise. We show that arable reversion is an effective tool for the management of butterfly communities. We suggest that as restoration takes time to achieve, its use as a mitigation tool against future environmental change (i.e. by decreasing isolation in fragmented landscapes) needs to take into account such time lags.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

 In the last decade, a vast number of land surface schemes has been designed for use in global climate models, atmospheric weather prediction, mesoscale numerical models, ecological models, and models of global changes. Since land surface schemes are designed for different purposes they have various levels of complexity in the treatment of bare soil processes, vegetation, and soil water movement. This paper is a contribution to a little group of papers dealing with intercomparison of differently designed and oriented land surface schemes. For that purpose we have chosen three schemes for classification: i) global climate models, BATS (Dickinson et al., 1986; Dickinson et al., 1992); ii) mesoscale and ecological models, LEAF (Lee, 1992) and iii) mesoscale models, LAPS (Mihailović, 1996; Mihailović and Kallos, 1997; Mihailović et al., 1999) according to the Shao et al. (1995) classification. These schemes were compared using surface fluxes and leaf temperature outputs obtained by time integrations of data sets derived from the micrometeorological measurements above a maize field at an experimental site in De Sinderhoeve (The Netherlands) for 18 August, 8 September, and 4 October 1988. Finally, comparison of the schemes was supported applying a simple statistical analysis on the surface flux outputs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper will introduce the Baltex research programme and summarize associated numerical modelling work which has been undertaken during the last five years. The research has broadly managed to clarify the main mechanisms determining the water and energy cycle in the Baltic region, such as the strong dependence upon the large scale atmospheric circulation. It has further been shown that the Baltic Sea has a positive water balance, albeit with large interannual variations. The focus on the modelling studies has been the use of limited area models at ultra-high resolution driven by boundary conditions from global models or from reanalysis data sets. The programme has further initiated a comprehensive integration of atmospheric, land surface and hydrological modelling incorporating snow, sea ice and special lake models. Other aspects of the programme include process studies such as the role of deep convection, air sea interaction and the handling of land surface moisture. Studies have also been undertaken to investigate synoptic and sub-synoptic events over the Baltic region, thus exploring the role of transient weather systems for the hydrological cycle. A special aspect has been the strong interests and commitments of the meteorological and hydrological services because of the potentially large societal interests of operational applications of the research. As a result of this interests special attention has been put on data-assimilation aspects and the use of new types of data such as SSM/I, GPS-measurements and digital radar. A series of high resolution data sets are being produced. One of those, a 1/6 degree daily precipitation climatology for the years 1996–1999, is such a unique contribution. The specific research achievements to be presented in this volume of Meteorology and Atmospheric Physics is the result of a cooperative venture between 11 European research groups supported under the EU-Framework programmes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we develop a method, termed the Interaction Distribution (ID) method, for analysis of quantitative ecological network data. In many cases, quantitative network data sets are under-sampled, i.e. many interactions are poorly sampled or remain unobserved. Hence, the output of statistical analyses may fail to differentiate between patterns that are statistical artefacts and those which are real characteristics of ecological networks. The ID method can support assessment and inference of under-sampled ecological network data. In the current paper, we illustrate and discuss the ID method based on the properties of plant-animal pollination data sets of flower visitation frequencies. However, the ID method may be applied to other types of ecological networks. The method can supplement existing network analyses based on two definitions of the underlying probabilities for each combination of pollinator and plant species: (1), pi,j: the probability for a visit made by the i’th pollinator species to take place on the j’th plant species; (2), qi,j: the probability for a visit received by the j’th plant species to be made by the i’th pollinator. The method applies the Dirichlet distribution to estimate these two probabilities, based on a given empirical data set. The estimated mean values for pi,j and qi,j reflect the relative differences between recorded numbers of visits for different pollinator and plant species, and the estimated uncertainty of pi,j and qi,j decreases with higher numbers of recorded visits.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Based on the availability of hemispheric gridded data sets from observations, analysis and global climate models, objective cyclone identification methods were developed and applied to these data sets. Due to the large amount of investigation methods combined with the variety of different datasets, a multitude of results exist, not only for the recent climate period but also for the next century, assuming anthropogenic changed conditions. Different thresholds, different physical quantities, and considerations of different atmospheric vertical levels add to a picture that is difficult to combine into a common view of cyclones, their variability and trends, in the real world and in GCM studies. Thus, this paper will give a comprehensive review of the actual knowledge on climatologies of mid-latitude cyclones for the Northern and Southern Hemisphere for the present climate and for its possible changes under anthropogenic climate conditions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The long observational record is critical to our understanding of the Earth’s climate, but most observing systems were not developed with a climate objective in mind. As a result, tremendous efforts have gone into assessing and reprocessing the data records to improve their usefulness in climate studies. The purpose of this paper is to both review recent progress in reprocessing and reanalyzing observations, and summarize the challenges that must be overcome in order to improve our understanding of climate and variability. Reprocessing improves data quality through more scrutiny and improved retrieval techniques for individual observing systems, while reanalysis merges many disparate observations with models through data assimilation, yet both aim to provide a climatology of Earth processes. Many challenges remain, such as tracking the improvement of processing algorithms and limited spatial coverage. Reanalyses have fostered significant research, yet reliable global trends in many physical fields are not yet attainable, despite significant advances in data assimilation and numerical modeling. Oceanic reanalyses have made significant advances in recent years, but will only be discussed here in terms of progress toward integrated Earth system analyses. Climate data sets are generally adequate for process studies and large-scale climate variability. Communication of the strengths, limitations and uncertainties of reprocessed observations and reanalysis data, not only among the community of developers, but also with the extended research community, including the new generations of researchers and the decision makers is crucial for further advancement of the observational data records. It must be emphasized that careful investigation of the data and processing methods are required to use the observations appropriately.