957 resultados para Imbalanced datasets
Resumo:
The analysis of research data plays a key role in data-driven areas of science. Varieties of mixed research data sets exist and scientists aim to derive or validate hypotheses to find undiscovered knowledge. Many analysis techniques identify relations of an entire dataset only. This may level the characteristic behavior of different subgroups in the data. Like automatic subspace clustering, we aim at identifying interesting subgroups and attribute sets. We present a visual-interactive system that supports scientists to explore interesting relations between aggregated bins of multivariate attributes in mixed data sets. The abstraction of data to bins enables the application of statistical dependency tests as the measure of interestingness. An overview matrix view shows all attributes, ranked with respect to the interestingness of bins. Complementary, a node-link view reveals multivariate bin relations by positioning dependent bins close to each other. The system supports information drill-down based on both expert knowledge and algorithmic support. Finally, visual-interactive subset clustering assigns multivariate bin relations to groups. A list-based cluster result representation enables the scientist to communicate multivariate findings at a glance. We demonstrate the applicability of the system with two case studies from the earth observation domain and the prostate cancer research domain. In both cases, the system enabled us to identify the most interesting multivariate bin relations, to validate already published results, and, moreover, to discover unexpected relations.
Resumo:
The international Global Ocean Ecosystem Dynamics (GLOBEC) programme was initiated in 1991 by the Scientific Committee on Oceanic Research (SCOR) and the Intergovernmental Oceanographic Commission (IOC) of the UNESCO. It was a core project of the International Geosphere-Biosphere Project (IGBP) with its research topics aiming at understanding how global change impacts abundance, diversity and productivity of marine populations (Barange & Harris 2003). GLOBEC-Germany was the national German contribution to this core project focussing on the Baltic Sea and North Sea, to which Germany has adjoining coastlines. The two seas exhibit a gradient from marine (North Sea) to almost fresh water conditions (outer ends of the Baltic Sea). Main topic of the project was the investigation of interactions between zooplankton and fish under the influence of physical processes (Alheit 2004). Main sampling areas were located in the southern North Sea and German Bight, as well as in the Bornholm Basin in the Baltic Sea (Tamm et al. 2007).
Resumo:
The BSRN Toolbox is a software package supplied by the WRMC and is freely available to all station scientists and data users. The main features of the package include a download manager for Station- to-Archive files, a tool to convert files into human readable TAB-separated ASCII-tables (similar to those output by the PANGAEA database), and a tool to check data sets for violations of the "BSRN Global Network recommended QC tests, V2.0" quality criteria. The latter tool creates quality codes, one per measured value, indicating if the data are "physically possible," "extremely rare," or if "intercomparison limits are exceeded." In addition, auxiliary data such as solar zenith angle or global calculated from diffuse and direct can be output. All output from the QC tool can be visualized using PanPlot (doi:10.1594/PANGAEA.816201).
Resumo:
The CoastColour project Round Robin (CCRR) project (http://www.coastcolour.org) funded by the European Space Agency (ESA) was designed to bring together a variety of reference datasets and to use these to test algorithms and assess their accuracy for retrieving water quality parameters. This information was then developed to help end-users of remote sensing products to select the most accurate algorithms for their coastal region. To facilitate this, an inter-comparison of the performance of algorithms for the retrieval of in-water properties over coastal waters was carried out. The comparison used three types of datasets on which ocean colour algorithms were tested. The description and comparison of the three datasets are the focus of this paper, and include the Medium Resolution Imaging Spectrometer (MERIS) Level 2 match-ups, in situ reflectance measurements and data generated by a radiative transfer model (HydroLight). The datasets mainly consisted of 6,484 marine reflectance associated with various geometrical (sensor viewing and solar angles) and sky conditions and water constituents: Total Suspended Matter (TSM) and Chlorophyll-a (CHL) concentrations, and the absorption of Coloured Dissolved Organic Matter (CDOM). Inherent optical properties were also provided in the simulated datasets (5,000 simulations) and from 3,054 match-up locations. The distributions of reflectance at selected MERIS bands and band ratios, CHL and TSM as a function of reflectance, from the three datasets are compared. Match-up and in situ sites where deviations occur are identified. The distribution of the three reflectance datasets are also compared to the simulated and in situ reflectances used previously by the International Ocean Colour Coordinating Group (IOCCG, 2006) for algorithm testing, showing a clear extension of the CCRR data which covers more turbid waters.