970 resultados para Data exploration


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advancements in information technology have made it possible for organizations to gather and store vast amounts of data of their customers. Information stored in databases can be highly valuable for organizations. However, analyzing large databases has proven to be difficult in practice. For companies in the retail industry, customer intelligence can be used to identify profitable customers, their characteristics, and behavior. By clustering customers into homogeneous groups, companies can more effectively manage their customer base and target profitable customer segments. This thesis will study the use of the self-organizing map (SOM) as a method for analyzing large customer datasets, clustering customers, and discovering information about customer behavior. Aim of the thesis is to find out whether the SOM could be a practical tool for retail companies to analyze their customer data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Version 1 of the Global Charcoal Database is now available for regional fire history reconstructions, data exploration, hypothesis testing, and evaluation of coupled climate–vegetation–fire model simulations. The charcoal database contains over 400 radiocarbon-dated records that document changes in charcoal abundance during the Late Quaternary. The aim of this public database is to stimulate cross-disciplinary research in fire sciences targeted at an increased understanding of the controls and impacts of natural and anthropogenic fire regimes on centennial-to-orbital timescales. We describe here the data standardization techniques for comparing multiple types of sedimentary charcoal records. Version 1 of the Global Charcoal Database has been used to characterize global and regional patterns in fire activity since the last glacial maximum. Recent studies using the charcoal database have explored the relation between climate and fire during periods of rapid climate change, including evidence of fire activity during the Younger Dryas Chronozone, and during the past two millennia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes how the statistical technique of cluster analysis and the machine learning technique of rule induction can be combined to explore a database. The ways in which such an approach alleviates the problems associated with other techniques for data analysis are discussed. We report the results of experiments carried out on a database from the medical diagnosis domain. Finally we describe the future developments which we plan to carry out to build on our current work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a participant study that compares biological data exploration tasks using volume renderings of laser confocal microscopy data across three environments that vary in level of immersion: a desktop, fishtank, and cave system. For the tasks, data, and visualization approach used in our study, we found that subjects qualitatively preferred and quantitatively performed better in the cave compared with the fishtank and desktop. Subjects performed real-world biological data analysis tasks that emphasized understanding spatial relationships including characterizing the general features in a volume, identifying colocated features, and reporting geometric relationships such as whether clusters of cells were coplanar. After analyzing data in each environment, subjects were asked to choose which environment they wanted to analyze additional data sets in - subjects uniformly selected the cave environment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Complementary to automatic extraction processes, Virtual Reality technologies provide an adequate framework to integrate human perception in the exploration of large data sets. In such multisensory system, thanks to intuitive interactions, a user can take advantage of all his perceptual abilities in the exploration task. In this context the haptic perception, coupled to visual rendering, has been investigated for the last two decades, with significant achievements. In this paper, we present a survey related to exploitation of the haptic feedback in exploration of large data sets. For each haptic technique introduced, we describe its principles and its effectiveness.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Contém resumo

Relevância:

70.00% 70.00%

Publicador:

Resumo:

As huge amounts of data become available in organizations and society, specific data analytics skills and techniques are needed to explore this data and extract from it useful patterns, tendencies, models or other useful knowledge, which could be used to support the decision-making process, to define new strategies or to understand what is happening in a specific field. Only with a deep understanding of a phenomenon it is possible to fight it. In this paper, a data-driven analytics approach is used for the analysis of the increasing incidence of fatalities by pneumonia in the Portuguese population, characterizing the disease and its incidence in terms of fatalities, knowledge that can be used to define appropriate strategies that can aim to reduce this phenomenon, which has increased more than 65% in a decade.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Illustration Watermarks, Image annotation, Virtual data exploration, Interaction techniques

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MOTIVATION: Lipids are a large and diverse group of biological molecules with roles in membrane formation, energy storage and signaling. Cellular lipidomes may contain tens of thousands of structures, a staggering degree of complexity whose significance is not yet fully understood. High-throughput mass spectrometry-based platforms provide a means to study this complexity, but the interpretation of lipidomic data and its integration with prior knowledge of lipid biology suffers from a lack of appropriate tools to manage the data and extract knowledge from it. RESULTS: To facilitate the description and exploration of lipidomic data and its integration with prior biological knowledge, we have developed a knowledge resource for lipids and their biology-SwissLipids. SwissLipids provides curated knowledge of lipid structures and metabolism which is used to generate an in silico library of feasible lipid structures. These are arranged in a hierarchical classification that links mass spectrometry analytical outputs to all possible lipid structures, metabolic reactions and enzymes. SwissLipids provides a reference namespace for lipidomic data publication, data exploration and hypothesis generation. The current version of SwissLipids includes over 244 000 known and theoretically possible lipid structures, over 800 proteins, and curated links to published knowledge from over 620 peer-reviewed publications. We are continually updating the SwissLipids hierarchy with new lipid categories and new expert curated knowledge. AVAILABILITY: SwissLipids is freely available at http://www.swisslipids.org/. CONTACT: alan.bridge@isb-sib.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O uso combinado de algoritmos para a descoberta de tópicos em coleções de documentos com técnicas orientadas à visualização da evolução daqueles tópicos no tempo permite a exploração de padrões temáticos em corpora extensos a partir de representações visuais compactas. A pesquisa em apresentação investigou os requisitos de visualização do dado sobre composição temática de documentos obtido através da modelagem de tópicos – o qual é esparso e possui multiatributos – em diferentes níveis de detalhe, através do desenvolvimento de uma técnica de visualização própria e pelo uso de uma biblioteca de código aberto para visualização de dados, de forma comparativa. Sobre o problema estudado de visualização do fluxo de tópicos, observou-se a presença de requisitos de visualização conflitantes para diferentes resoluções dos dados, o que levou à investigação detalhada das formas de manipulação e exibição daqueles. Dessa investigação, a hipótese defendida foi a de que o uso integrado de mais de uma técnica de visualização de acordo com a resolução do dado amplia as possibilidades de exploração do objeto em estudo em relação ao que seria obtido através de apenas uma técnica. A exibição dos limites no uso dessas técnicas de acordo com a resolução de exploração do dado é a principal contribuição desse trabalho, no intuito de dar subsídios ao desenvolvimento de novas aplicações.