923 resultados para Data exploration


Relevância:

100.00% 100.00%

Publicador:

Resumo:

As technological capabilities for capturing, aggregating, and processing large quantities of data continue to improve, the question becomes how to effectively utilise these resources. Whenever automatic methods fail, it is necessary to rely on human background knowledge, intuition, and deliberation. This creates demand for data exploration interfaces that support the analytical process, allowing users to absorb and derive knowledge from data. Such interfaces have historically been designed for experts. However, existing research has shown promise in involving a broader range of users that act as citizen scientists, placing high demands in terms of usability. Visualisation is one of the most effective analytical tools for humans to process abstract information. Our research focuses on the development of interfaces to support collaborative, community-led inquiry into data, which we refer to as Participatory Data Analytics. The development of data exploration interfaces to support independent investigations by local communities around topics of their interest presents a unique set of challenges, which we discuss in this paper. We present our preliminary work towards suitable high-level abstractions and interaction concepts to allow users to construct and tailor visualisations to their own needs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Version 1 of the Global Charcoal Database is now available for regional fire history reconstructions, data exploration, hypothesis testing, and evaluation of coupled climate–vegetation–fire model simulations. The charcoal database contains over 400 radiocarbon-dated records that document changes in charcoal abundance during the Late Quaternary. The aim of this public database is to stimulate cross-disciplinary research in fire sciences targeted at an increased understanding of the controls and impacts of natural and anthropogenic fire regimes on centennial-to-orbital timescales. We describe here the data standardization techniques for comparing multiple types of sedimentary charcoal records. Version 1 of the Global Charcoal Database has been used to characterize global and regional patterns in fire activity since the last glacial maximum. Recent studies using the charcoal database have explored the relation between climate and fire during periods of rapid climate change, including evidence of fire activity during the Younger Dryas Chronozone, and during the past two millennia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes how the statistical technique of cluster analysis and the machine learning technique of rule induction can be combined to explore a database. The ways in which such an approach alleviates the problems associated with other techniques for data analysis are discussed. We report the results of experiments carried out on a database from the medical diagnosis domain. Finally we describe the future developments which we plan to carry out to build on our current work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Complementary to automatic extraction processes, Virtual Reality technologies provide an adequate framework to integrate human perception in the exploration of large data sets. In such multisensory system, thanks to intuitive interactions, a user can take advantage of all his perceptual abilities in the exploration task. In this context the haptic perception, coupled to visual rendering, has been investigated for the last two decades, with significant achievements. In this paper, we present a survey related to exploitation of the haptic feedback in exploration of large data sets. For each haptic technique introduced, we describe its principles and its effectiveness.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Queensland Department of Main Roads uses Weigh-in-Motion (WiM) devices to covertly monitor (at highway speed) axle mass, axle configurations and speed of heavy vehicles on the road network. Such data is critical for the planning and design of the road network. Some of the data appears excessively variable. The current work considers the nature, magnitude and possible causes of WiM data variability. Over fifty possible causes of variation in WiM data have been identified in the literature. Data exploration has highlighted five basic types of variability specifically: ----- • cycling, both diurnal and annual;----- • consistent but unreasonable data;----- • data jumps;----- • variations between data from opposite sides of the one road; and ----- • non-systematic variations.----- This work is part of wider research into procedures to eliminate or mitigate the influence of WiM data variability.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper reports an empirical study on measuring transit service reliability using the data from a Web-based passenger survey on a major transit corridor in Brisbane, Australia. After an introduction of transit service reliability measures, the paper presents the results from the case study including study area, data collection, and reliability measures obtained. This includes data exploration of boarding/arrival lateness, in-vehicle time variation, waiting time variation, and headway adherence. Impacts of peak-period effects and separate operation on service reliability are examined. Relationships between transit service characteristics and passenger waiting time are also discussed. A summary of key findings and an agenda of future research are offered in conclusions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We introduce a variation density function that profiles the relationship between multiple scalar fields over isosurfaces of a given scalar field. This profile serves as a valuable tool for multifield data exploration because it provides the user with cues to identify interesting isovalues of scalar fields. Existing isosurface-based techniques for scalar data exploration like Reeb graphs, contour spectra, isosurface statistics, etc., study a scalar field in isolation. We argue that the identification of interesting isovalues in a multifield data set should necessarily be based on the interaction between the different fields. We demonstrate the effectiveness of our approach by applying it to explore data from a wide variety of applications.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Contém resumo

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Study of symmetric or repeating patterns in scalar fields is important in scientific data analysis because it gives deep insights into the properties of the underlying phenomenon. Though geometric symmetry has been well studied within areas like shape processing, identifying symmetry in scalar fields has remained largely unexplored due to the high computational cost of the associated algorithms. We propose a computationally efficient algorithm for detecting symmetric patterns in a scalar field distribution by analysing the topology of level sets of the scalar field. Our algorithm computes the contour tree of a given scalar field and identifies subtrees that are similar. We define a robust similarity measure for comparing subtrees of the contour tree and use it to group similar subtrees together. Regions of the domain corresponding to subtrees that belong to a common group are extracted and reported to be symmetric. Identifying symmetry in scalar fields finds applications in visualization, data exploration, and feature detection. We describe two applications in detail: symmetry-aware transfer function design and symmetry-aware isosurface extraction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clustering has been the most popular method for data exploration. Clustering is partitioning the data set into sub-partitions based on some measures say the distance measure, each partition has its own significant information. There are a number of algorithms explored for this purpose, one such algorithm is the Particle Swarm Optimization(PSO) which is a population based heuristic search technique derived from swarm intelligence. In this paper we present an improved version of the Particle Swarm Optimization where, each feature of the data set is given significance accordingly by adding some random weights, which also minimizes the distortions in the dataset if any. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The experimental results shows that our proposed methodology performs significantly better than the previously performed experiments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In matters of social research sociologists and other social scientists have tended to view documents primarily as sources of evidence and as receptacles of inert content.The key strategies for data exploration have consequently been associated with various styles of content or thematic analysis. Even when discourse analysis has been recommended, there has been a marked tendency to deal with records, files, and the like, primarily as containers - things to be read, understood, and categorized. In this article, however, the author seeks to demonstrate that by focussing on the functioning of documents instead of content, sociology can embrace a much wider range of approaches to both data collection and analysis. Indeed, the adoption of such a programme encourages researchers to see documents as active agents in the world, and to view documentation as a key component of dynamic networks rather than as a set of static and immutable 'things'.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Physical Access Control Systems are commonly used to secure doors in buildings such as airports, hospitals, government buildings and offices. These systems are designed primarily to provide an authentication mechanism, but they also log each door access as a transaction in a database. Unsupervised learning techniques can be used to detect inconsistencies or anomalies in the mobility data, such as a cloned or forged Access Badge, or unusual behaviour by staff members. In this paper, we present an overview of our method of inferring directed graphs to represent a physical building network and the flows of mobility within it. We demonstrate how the graphs can be used for Visual Data Exploration, and outline how to apply algorithms based on Information Theory to the graph data in order to detect inconsistent or abnormal behaviour.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.