989 resultados para Information visualization
Resumo:
Heterogeneous and incomplete datasets are common in many real-world visualisation applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which was originally developed for complete continuous data, can be extended to model heterogeneous (i.e. containing both continuous and discrete values) and missing data. This paper describes and assesses the resulting model on both synthetic and real-world heterogeneous data with missing values.
Resumo:
Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.
Resumo:
We describe a novel approach to explore DNA nucleotide sequence data, aiming to produce high-level categorical and structural information about the underlying chromosomes, genomes and species. The article starts by analyzing chromosomal data through histograms using fixed length DNA sequences. After creating the DNA-related histograms, a correlation between pairs of histograms is computed, producing a global correlation matrix. These data are then used as input to several data processing methods for information extraction and tabular/graphical output generation. A set of 18 species is processed and the extensive results reveal that the proposed method is able to generate significant and diversified outputs, in good accordance with current scientific knowledge in domains such as genomics and phylogenetics.
Resumo:
This paper analyzes the DNA code of several species in the perspective of information content. For that purpose several concepts and mathematical tools are selected towards establishing a quantitative method without a priori distorting the alphabet represented by the sequence of DNA bases. The synergies of associating Gray code, histogram characterization and multidimensional scaling visualization lead to a collection of plots with a categorical representation of species and chromosomes.
Resumo:
Seismic data is difficult to analyze and classical mathematical tools reveal strong limitations in exposing hidden relationships between earthquakes. In this paper, we study earthquake phenomena in the perspective of complex systems. Global seismic data, covering the period from 1962 up to 2011 is analyzed. The events, characterized by their magnitude, geographic location and time of occurrence, are divided into groups, either according to the Flinn-Engdahl (F-E) seismic regions of Earth or using a rectangular grid based in latitude and longitude coordinates. Two methods of analysis are considered and compared in this study. In a first method, the distributions of magnitudes are approximated by Gutenberg-Richter (G-R) distributions and the parameters used to reveal the relationships among regions. In the second method, the mutual information is calculated and adopted as a measure of similarity between regions. In both cases, using clustering analysis, visualization maps are generated, providing an intuitive and useful representation of the complex relationships that are present among seismic data. Such relationships might not be perceived on classical geographic maps. Therefore, the generated charts are a valid alternative to other visualization tools, for understanding the global behavior of earthquakes.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
EMAp - Escola de Matemática Aplicada
Resumo:
[EN]This paper describes a wildfi re forecasting application based on a 3D virtual environment and a fi re simulation engine. A novel open source framework is presented for the development of 3D graphics applications over large geographic areas, off ering high performance 3D visualization and powerful interaction tools for the Geographic Information Systems (GIS) community. The application includes a remote module that allows simultaneous connection of several users for monitoring a real wildfi re event.
Resumo:
Three-dimensional flow visualization plays an essential role in many areas of science and engineering, such as aero- and hydro-dynamical systems which dominate various physical and natural phenomena. For popular methods such as the streamline visualization to be effective, they should capture the underlying flow features while facilitating user observation and understanding of the flow field in a clear manner. My research mainly focuses on the analysis and visualization of flow fields using various techniques, e.g. information-theoretic techniques and graph-based representations. Since the streamline visualization is a popular technique in flow field visualization, how to select good streamlines to capture flow patterns and how to pick good viewpoints to observe flow fields become critical. We treat streamline selection and viewpoint selection as symmetric problems and solve them simultaneously using the dual information channel [81]. To the best of my knowledge, this is the first attempt in flow visualization to combine these two selection problems in a unified approach. This work selects streamline in a view-independent manner and the selected streamlines will not change for all viewpoints. My another work [56] uses an information-theoretic approach to evaluate the importance of each streamline under various sample viewpoints and presents a solution for view-dependent streamline selection that guarantees coherent streamline update when the view changes gradually. When projecting 3D streamlines to 2D images for viewing, occlusion and clutter become inevitable. To address this challenge, we design FlowGraph [57, 58], a novel compound graph representation that organizes field line clusters and spatiotemporal regions hierarchically for occlusion-free and controllable visual exploration. We enable observation and exploration of the relationships among field line clusters, spatiotemporal regions and their interconnection in the transformed space. Most viewpoint selection methods only consider the external viewpoints outside of the flow field. This will not convey a clear observation when the flow field is clutter on the boundary side. Therefore, we propose a new way to explore flow fields by selecting several internal viewpoints around the flow features inside of the flow field and then generating a B-Spline curve path traversing these viewpoints to provide users with closeup views of the flow field for detailed observation of hidden or occluded internal flow features [54]. This work is also extended to deal with unsteady flow fields. Besides flow field visualization, some other topics relevant to visualization also attract my attention. In iGraph [31], we leverage a distributed system along with a tiled display wall to provide users with high-resolution visual analytics of big image and text collections in real time. Developing pedagogical visualization tools forms my other research focus. Since most cryptography algorithms use sophisticated mathematics, it is difficult for beginners to understand both what the algorithm does and how the algorithm does that. Therefore, we develop a set of visualization tools to provide users with an intuitive way to learn and understand these algorithms.
Resumo:
Objective: The present study offers a novel methodological contribution to the study of the configuration and dynamics of research groups, through a comparative perspective of the projects funded (inputs) and publication co-authorships (output). Method: A combination of bibliometric techniques and social network analysis was applied to a case study: the Departmento de Bibliotecología (DHUBI), Universidad Nacional de La Plata, Argentina, for the period 2000-2009. The results were interpreted statistically and staff members of the department, were interviewed. Results: The method makes it possible to distinguish groups, identify their members and reflect group make-up through an analytical strategy that involves the categorization of actors and the interdisciplinary and national or international projection of the networks that they configure. The integration of these two aspects (input and output) at different points in time over the analyzed period leads to inferences about group profiles and the roles of actors. Conclusions: The methodology presented is conducive to micro-level interpretations in a given area of study, regarding individual researchers or research groups. Because the comparative input-output analysis broadens the base of information and makes it possible to follow up, over time, individual and group trends, it may prove very useful for the management, promotion and evaluation of science
Resumo:
Objective: The present study offers a novel methodological contribution to the study of the configuration and dynamics of research groups, through a comparative perspective of the projects funded (inputs) and publication co-authorships (output). Method: A combination of bibliometric techniques and social network analysis was applied to a case study: the Departmento de Bibliotecología (DHUBI), Universidad Nacional de La Plata, Argentina, for the period 2000-2009. The results were interpreted statistically and staff members of the department, were interviewed. Results: The method makes it possible to distinguish groups, identify their members and reflect group make-up through an analytical strategy that involves the categorization of actors and the interdisciplinary and national or international projection of the networks that they configure. The integration of these two aspects (input and output) at different points in time over the analyzed period leads to inferences about group profiles and the roles of actors. Conclusions: The methodology presented is conducive to micro-level interpretations in a given area of study, regarding individual researchers or research groups. Because the comparative input-output analysis broadens the base of information and makes it possible to follow up, over time, individual and group trends, it may prove very useful for the management, promotion and evaluation of science
Resumo:
Objective: The present study offers a novel methodological contribution to the study of the configuration and dynamics of research groups, through a comparative perspective of the projects funded (inputs) and publication co-authorships (output). Method: A combination of bibliometric techniques and social network analysis was applied to a case study: the Departmento de Bibliotecología (DHUBI), Universidad Nacional de La Plata, Argentina, for the period 2000-2009. The results were interpreted statistically and staff members of the department, were interviewed. Results: The method makes it possible to distinguish groups, identify their members and reflect group make-up through an analytical strategy that involves the categorization of actors and the interdisciplinary and national or international projection of the networks that they configure. The integration of these two aspects (input and output) at different points in time over the analyzed period leads to inferences about group profiles and the roles of actors. Conclusions: The methodology presented is conducive to micro-level interpretations in a given area of study, regarding individual researchers or research groups. Because the comparative input-output analysis broadens the base of information and makes it possible to follow up, over time, individual and group trends, it may prove very useful for the management, promotion and evaluation of science
Resumo:
Cultural content on the Web is available in various domains (cultural objects, datasets, geospatial data, moving images, scholarly texts and visual resources), concerns various topics, is written in different languages, targeted to both laymen and experts, and provided by different communities (libraries, archives museums and information industry) and individuals (Figure 1). The integration of information technologies and cultural heritage content on the Web is expected to have an impact on everyday life from the point of view of institutions, communities and individuals. In particular, collaborative environment scan recreate 3D navigable worlds that can offer new insights into our cultural heritage (Chan 2007). However, the main barrier is to find and relate cultural heritage information by end-users of cultural contents, as well as by organisations and communities managing and producing them. In this paper, we explore several visualisation techniques for supporting cultural interfaces, where the role of metadata is essential for supporting the search and communication among end-users (Figure 2). A conceptual framework was developed to integrate the data, purpose, technology, impact, and form components of a collaborative environment, Our preliminary results show that collaborative environments can help with cultural heritage information sharing and communication tasks because of the way in which they provide a visual context to end-users. They can be regarded as distributed virtual reality systems that offer graphically realised, potentially infinite, digital information landscapes. Moreover, collaborative environments also provide a new way of interaction between an end-user and a cultural heritage data set. Finally, the visualisation of metadata of a dataset plays an important role in helping end-users in their search for heritage contents on the Web.
Resumo:
Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study by an estimated 7000 researchers around the world. In addition to the large body of genetic, physiological and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced, with completion expected at the end of the year 2000. The sequencing effort has been coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The rationale for intensive investigation of Arabidopsis is that it is an excellent model for higher plants. In order to maximize use of the knowledge gained about this plant, there is a need for a comprehensive database and information retrieval and analysis system that will provide user-friendly access to Arabidopsis information. This paper describes the initial steps we have taken toward realizing these goals in a project called The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org).