4 resultados para Data clustering

em Publishing Network for Geoscientific


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The analysis of time-dependent data is an important problem in many application domains, and interactive visualization of time-series data can help in understanding patterns in large time series data. Many effective approaches already exist for visual analysis of univariate time series supporting tasks such as assessment of data quality, detection of outliers, or identification of periodically or frequently occurring patterns. However, much fewer approaches exist which support multivariate time series. The existence of multiple values per time stamp makes the analysis task per se harder, and existing visualization techniques often do not scale well. We introduce an approach for visual analysis of large multivariate time-dependent data, based on the idea of projecting multivariate measurements to a 2D display, visualizing the time dimension by trajectories. We use visual data aggregation metaphors based on grouping of similar data elements to scale with multivariate time series. Aggregation procedures can either be based on statistical properties of the data or on data clustering routines. Appropriately defined user controls allow to navigate and explore the data and interactively steer the parameters of the data aggregation to enhance data analysis. We present an implementation of our approach and apply it on a comprehensive data set from the field of earth bservation, demonstrating the applicability and usefulness of our approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Breeding distribution of the Adelie penguin, Pygoscelis adeliae, was surveyed with Landsat-7 Enhanced Thematic Mapper Plus (ETM+) data along the coastline of Antarctica, an area covering approximately 330° of longitude. An algorithm was designed to minimize the radiometric contribution from exogenous sources and to retrieve Adelie penguin colony location and spatial extent from the ETM+ data. In all, 9143 individual pixels were classified as belonging to an Adelie penguin colony class out of the entire dataset of 195 ETM+ scenes, where the dimension of each pixel is 30 m by 30 m, and each scene is approximately 180 km by 180 km. Pixel clustering identified a total of 187 individual Adelie penguin colonies, ranging in size from a single pixel (900 m**2) to a maximum of 875 pixels (0.788 km**2). Colony retrievals have a very low error of commission, on the order of 1 percent or less, and the error of omission was estimated to be 2.9 percent by population based on comparisons with direct observations from surveys across east Antarctica. Thus, the Landsat retrievals can successfully locate Adelie penguin colonies that account for ~97 percent of a regional population. Geographic coordinates and the spatial extent of each colony retrieved from the Landsat data are available publically. Regional analysis found several areas where the Landsat retrievals suggest populations that are significantly larger than published estimates. Six Adelie penguin colonies were found that are believed to be unreported in the literature.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, specially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships.