960 resultados para data sets


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Retrieving large amounts of information over wide area networks, including the Internet, is problematic due to issues arising from latency of response, lack of direct memory access to data serving resources, and fault tolerance. This paper describes a design pattern for solving the issues of handling results from queries that return large amounts of data. Typically these queries would be made by a client process across a wide area network (or Internet), with one or more middle-tiers, to a relational database residing on a remote server. The solution involves implementing a combination of data retrieval strategies, including the use of iterators for traversing data sets and providing an appropriate level of abstraction to the client, double-buffering of data subsets, multi-threaded data retrieval, and query slicing. This design has recently been implemented and incorporated into the framework of a commercial software product developed at Oracle Corporation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The principled statistical application of Gaussian random field models used in geostatistics has historically been limited to data sets of a small size. This limitation is imposed by the requirement to store and invert the covariance matrix of all the samples to obtain a predictive distribution at unsampled locations, or to use likelihood-based covariance estimation. Various ad hoc approaches to solve this problem have been adopted, such as selecting a neighborhood region and/or a small number of observations to use in the kriging process, but these have no sound theoretical basis and it is unclear what information is being lost. In this article, we present a Bayesian method for estimating the posterior mean and covariance structures of a Gaussian random field using a sequential estimation algorithm. By imposing sparsity in a well-defined framework, the algorithm retains a subset of “basis vectors” that best represent the “true” posterior Gaussian random field model in the relative entropy sense. This allows a principled treatment of Gaussian random field models on very large data sets. The method is particularly appropriate when the Gaussian random field model is regarded as a latent variable model, which may be nonlinearly related to the observations. We show the application of the sequential, sparse Bayesian estimation in Gaussian random field models and discuss its merits and drawbacks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential framework for inference in such projected processes is presented, where the observations are considered one at a time. We introduce a C++ library for carrying out such projected, sequential estimation which adds several novel features. In particular we have incorporated the ability to use a generic observation operator, or sensor model, to permit data fusion. We can also cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the variogram parameters is based on maximum likelihood estimation. We illustrate the projected sequential method in application to synthetic and real data sets. We discuss the software implementation and suggest possible future extensions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

long-term research on freshwater ecosystems provides insights that can be difficult to obtain from other approaches. Widespread monitoring of ecologically relevant water-quality parameters spanning decades can facilitate important tests of ecological principles. Unique long-term data sets and analytical tools are increasingly available, allowing for powerful and synthetic analyses across sites. long-term measurements or experiments in aquatic systems can catch rare events, changes in highly variable systems, time-lagged responses, cumulative effects of stressors, and biotic responses that encompass multiple generations. Data are available from formal networks, local to international agencies, private organizations, various institutions, and paleontological and historic records; brief literature surveys suggest much existing data are not synthesized. Ecological sciences will benefit from careful maintenance and analyses of existing long-term programs, and subsequent insights can aid in the design of effective future long-term experimental and observational efforts. long-term research on freshwaters is particularly important because of their value to humanity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Approaches to quantify the organic carbon accumulation on a global scale generally do not consider the small-scale variability of sedimentary and oceanographic boundary conditions along continental margins. In this study, we present a new approach to regionalize the total organic carbon (TOC) content in surface sediments (<5 cm sediment depth). It is based on a compilation of more than 5500 single measurements from various sources. Global TOC distribution was determined by the application of a combined qualitative and quantitative-geostatistical method. Overall, 33 benthic TOC-based provinces were defined and used to process the global distribution pattern of the TOC content in surface sediments in a 1°x1° grid resolution. Regional dependencies of data points within each single province are expressed by modeled semi-variograms. Measured and estimated TOC values show good correlation, emphasizing the reasonable applicability of the method. The accumulation of organic carbon in marine surface sediments is a key parameter in the control of mineralization processes and the material exchange between the sediment and the ocean water. Our approach will help to improve global budgets of nutrient and carbon cycles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation five years since its first description by Nisumaa et al. (2010). Most of study sites from which data archived are still in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans are still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcomed shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The general knowledge of the hydrographic structure of the Southern Ocean is still rather incomplete since observations particularly in the ice covered regions are cumbersome to be carried out. But we know from the available information that thermohaline processes have large amplitudes and cover a wide range of scales in this part of the world ocean. The modification of water masses around Antarctica have indeed a worldwide impact, these processes ultimately determine the cold state of the present climate in the world ocean. We have converted efforts of the German and Russian polar research institutions to collect and validate the presently available temperature, salinity and oxygen data of the ocean south of 30°S latitude. We have carried out this work in spite of the fact that the hydrographic programme of the World Ocean Circulation Experiment (WOCE) will provide more new information in due time, but its contribution to the high latitudes of the Southern Ocean is quite sparse. The modified picture of the hydrographic structure of the Southern Ocean presented in this atlas may serve the oceanographic community in many ways and help to unravel the role of this ocean in the global climate system. This atlas could only be prepared with the altruistic assistance of many colleagues from various institutions worldwide who have provided us with their data and their advice. Their generous help is gratefully acknowledged. During two years scientists from the Arctic and Antarctic Research Institute in St. Petersburg and the Alfred Wegener Institute for Polar and Marine Research in Bremerhaven have cooperated in a fruitful way to establish the atlas and the archive of about 38749 validated hydrographic stations. We hope that both sources of information will be widely applied for future ocean studies and will serve as a reference state for global change considerations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study we present a global distribution pattern and budget of the minimum flux of particulate organic carbon to the sea floor (J POC alpha). The estimations are based on regionally specific correlations between the diffusive oxygen flux across the sediment-water interface, the total organic carbon content in surface sediments, and the oxygen concentration in bottom waters. For this, we modified the principal equation of Cai and Reimers [1995] as a basic monod reaction rate, applied within 11 regions where in situ measurements of diffusive oxygen uptake exist. By application of the resulting transfer functions to other regions with similar sedimentary conditions and areal interpolation, we calculated a minimum global budget of particulate organic carbon that actually reaches the sea floor of ~0.5 GtC yr**-1 (>1000 m water depth (wd)), whereas approximately 0.002-0.12 GtC yr**-1 is buried in the sediments (0.01-0.4% of surface primary production). Despite the fact that our global budget is in good agreement with previous studies, we found conspicuous differences among the distribution patterns of primary production, calculations based on particle trap collections of the POC flux, and J POC alpha of this study. These deviations, especially located at the southeastern and southwestern Atlantic Ocean, the Greenland and Norwegian Sea and the entire equatorial Pacific Ocean, strongly indicate a considerable influence of lateral particle transport on the vertical link between surface waters and underlying sediments. This observation is supported by sediment trap data. Furthermore, local differences in the availability and quality of the organic matter as well as different transport mechanisms through the water column are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the SINOPS project, an optimal state of the art simulation of the marine silicon cycle is attempted employing a biogeochemical ocean general circulation model (BOGCM) through three particular time steps relevant for global (paleo-) climate. In order to tune the model optimally, results of the simulations are compared to a comprehensive data set of 'real' observations. SINOPS' scientific data management ensures that data structure becomes homogeneous throughout the project. Practical work routine comprises systematic progress from data acquisition, through preparation, processing, quality check and archiving, up to the presentation of data to the scientific community. Meta-information and analytical data are mapped by an n-dimensional catalogue in order to itemize the analytical value and to serve as an unambiguous identifier. In practice, data management is carried out by means of the online-accessible information system PANGAEA, which offers a tool set comprising a data warehouse, Graphical Information System (GIS), 2-D plot, cross-section plot, etc. and whose multidimensional data model promotes scientific data mining. Besides scientific and technical aspects, this alliance between scientific project team and data management crew serves to integrate the participants and allows them to gain mutual respect and appreciation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.