177 resultados para Automatic Analysis of Multivariate Categorical Data Sets
em Publishing Network for Geoscientific
Resumo:
During the SINOPS project, an optimal state of the art simulation of the marine silicon cycle is attempted employing a biogeochemical ocean general circulation model (BOGCM) through three particular time steps relevant for global (paleo-) climate. In order to tune the model optimally, results of the simulations are compared to a comprehensive data set of 'real' observations. SINOPS' scientific data management ensures that data structure becomes homogeneous throughout the project. Practical work routine comprises systematic progress from data acquisition, through preparation, processing, quality check and archiving, up to the presentation of data to the scientific community. Meta-information and analytical data are mapped by an n-dimensional catalogue in order to itemize the analytical value and to serve as an unambiguous identifier. In practice, data management is carried out by means of the online-accessible information system PANGAEA, which offers a tool set comprising a data warehouse, Graphical Information System (GIS), 2-D plot, cross-section plot, etc. and whose multidimensional data model promotes scientific data mining. Besides scientific and technical aspects, this alliance between scientific project team and data management crew serves to integrate the participants and allows them to gain mutual respect and appreciation.
Resumo:
During the past six years organic geochemical, micropaleontological, and sedimentological investigations were carried out within the framework of the multidisciplinary bilateral German-Russian research project ''System Laptev Sea'' and detailed biological investigations within the project ''German-Russian Investigations of the Marginal Seas of the Eurasian Arctic'', In order to understand the Laptev Sea ecosystem and to obtain information about sources and fate of organic carbon, the distribution of phyto- and zooplankton, diatoms, chlorophyll a benthic macrofauna, palynomorphs, grain size, total organic carbon, d13Corg and biomarkers (n-alkanes, fatty acids) were determined. In general, the influence of the major rivers draining into the Laptev Sea, is reflected in the water column as well as in the surface sediments. In both habitats three ecological provinces can be distinguished, i.e., the southeastern Laptev Sea, the central Laptev Sea, and the northern Laptev Sea. Additionally, clear differences between the western and the eastern Laptev Sea occur. The comparison of the different data sets of the water column and the surface sediments provide information about organic carbon sources and pathways in the Laptev Sea shelf and continental slope area.
Resumo:
The analysis of research data plays a key role in data-driven areas of science. Varieties of mixed research data sets exist and scientists aim to derive or validate hypotheses to find undiscovered knowledge. Many analysis techniques identify relations of an entire dataset only. This may level the characteristic behavior of different subgroups in the data. Like automatic subspace clustering, we aim at identifying interesting subgroups and attribute sets. We present a visual-interactive system that supports scientists to explore interesting relations between aggregated bins of multivariate attributes in mixed data sets. The abstraction of data to bins enables the application of statistical dependency tests as the measure of interestingness. An overview matrix view shows all attributes, ranked with respect to the interestingness of bins. Complementary, a node-link view reveals multivariate bin relations by positioning dependent bins close to each other. The system supports information drill-down based on both expert knowledge and algorithmic support. Finally, visual-interactive subset clustering assigns multivariate bin relations to groups. A list-based cluster result representation enables the scientist to communicate multivariate findings at a glance. We demonstrate the applicability of the system with two case studies from the earth observation domain and the prostate cancer research domain. In both cases, the system enabled us to identify the most interesting multivariate bin relations, to validate already published results, and, moreover, to discover unexpected relations.
Resumo:
The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation five years since its first description by Nisumaa et al. (2010). Most of study sites from which data archived are still in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans are still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcomed shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.
Resumo:
Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, specially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships.
Resumo:
For a reliable simulation of the time and space dependent CO2 redistribution between ocean and atmosphere an appropriate time dependent simulation of particle dynamics processes is essential but has not been carried out so far. The major difficulties were the lack of suitable modules for particle dynamics and early diagenesis (in order to close the carbon and nutrient budget) in ocean general circulation models, and the lack of an understanding of biogeochemical processes, such as the partial dissolution of calcareous particles in oversaturated water. The main target of ORFOIS was to fill in this gap in our knowledge and prediction capability infrastructure. This goal has been achieved step by step. At first comprehensive data bases (already existing data) of observations of relevance for the three major types of biogenic particles, organic carbon (POC), calcium carbonate (CaCO3), and biogenic silica (BSi or opal), as well as for refractory particles of terrestrial origin were collated and made publicly available.
Resumo:
The circum-Antarctic Southern Ocean is an important region for global marine food webs and carbon cycling because of sea-ice formation and its unique plankton ecosystem. However, the mechanisms underlying the installation of this distinct ecosystem and the geological timing of its development remain unknown. Here, we show, on the basis of fossil marine dinoflagellate cyst records, that a major restructuring of the Southern Ocean plankton ecosystem occurred abruptly and concomitant with the first major Antarctic glaciation in the earliest Oligocene (~33.6 million years ago). This turnover marks a regime shift in zooplankton-phytoplankton interactions and community structure, which indicates the appearance of eutrophic and seasonally productive environments on the Antarctic margin. We conclude that earliest Oligocene cooling, ice-sheet expansion, and subsequent sea-ice formation were important drivers of biotic evolution in the Southern Ocean.
Resumo:
The Greenland Ice Sheet Project 2 (GISP2) core can enhance our understanding of the relationship between parameters measured in the ice in central Greenland and variability in the ocean, atmosphere, and cryosphere of the North Atlantic Ocean and adjacent land masses. Seasonal (summer, winter) to annual responses of dD and deuterium excess isotopic signals in the GISP2 core to the seesaw in winter temperatures between West Greenland and northern Europe from A.D. 1840 to 1970 are investigated. This seesaw represents extreme modes of the North Atlantic Oscillation, which also influences sea surface temperatures (SSTs), atmospheric pressures, geostrophic wind strength, and sea ice extents beyond the winter season. Temperature excursions inferred from the dD record during seesaw/extreme NAO mode years move in the same direction as the West Greenland side of the seesaw. Symmetry with the West Greenland side of the seesaw suggests a possible mechanism for damping in the ice core record of the lowest decadal temperatures experienced in Europe from A.D. 1500 to 1700. Seasonal and annual deuterium excess excursions during seesaw years show negative correlation with dD. This suggests an isotopic response to a SST/ land temperature seesaw. The isotopic record from GISP2 may therefore give information on both ice sheet and sea surface temperature variability. Cross-plots of dD and d show a tendency for data to be grouped according to the prevailing mode of the seesaw, but do not provide unambiguous identification of individual seesaw years. A combination of ice core and tree ring data sets may allow more confident identification of GA and GB (extreme NAO mode) years prior to 1840.
SYNOPS: Synoptical observations from meteorological stations of West Africa, with links to data sets
Resumo:
The JGOFS International Collection Volume 2: Integrated Data Sets CD is a coherent, organised compilation of existing data sets produced by member countries which participated in JGOFS. In most cases, the data were gathered from the JGOFS International Collection, Volume 1: Discrete Datasets DVD. To produce Vol. 1 data were taken from the original sources and copied "as is" on the DVD. For Vol. 2 data and metadata have been harmonized using the conversion software PanTool and the import routine of PANGAEA checking for completeness of metadata and defining the relations between data and metadata. Prior to the import, data had performed a technical quality control, i.e. format and readability of the file, availability and combination of parameters and units, range of values.