17 resultados para Knowledge Discovery Database
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
This paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery. The kernel also includes a specific high-performance library that allows designing and applying the mining functions in parallel machines. The experimental results obtained by the application of the kernel functions are reported. © 2003 Elsevier Ltd. All rights reserved.
Resumo:
Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Pós-graduação em Educação - FFC
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This work identifies and analyzes literature about knowledge organization (KO), expressed in scientific journals communication of information science (IS). It performs an exploratory study on the Base de Dados Referencial de Artigos de Periodicos em Ciência da Informacio (BRAPCI, Reference Database of Journal Articles on Information Science) between the years 2000 and 2010. The descriptors relating to "knowledge organization" are used in order to recover and analyze the corresponding articles and to identify descriptors and concepts which integrate the semantic universe related to KO. Through the analysis of content, based on metrical studies, this article gathers and interprets data relating to documents and authors. Through this, it demonstrates the development of this field and its research fronts according to the observed characteristics, as well as noting the transformation indicative in the production of knowledge. The work describes the influences of the Spanish researchers on Brazilian literature in the fields of knowledge and information organization. As a result, it presents the most cited and productive authors, the theoretical currents which support them, and the most significant relationships of the Spanish-Brazilian authors network. Based on the constant key-words analysis in the cited articles, the co-existence of the French conception current and the incipient Spanish influence in Brazil is observed. Through this, it contributes to the comprehension of the thematic range relating to KO, stimulating both criticism and self-criticism, debate and knowledge creation, based on studies that have been developed and institutionalized in academic contexts in Spain and Brazil.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The analysis of large amounts of data is better performed by humans when represented in a graphical format. Therefore, a new research area called the Visual Data Mining is being developed endeavoring to use the number crunching power of computers to prepare data for visualization, allied to the ability of humans to interpret data presented graphically.This work presents the results of applying a visual data mining tool, called FastMapDB to detect the behavioral pattern exhibited by a dataset of clinical information about hemoglobinopathies known as thalassemia. FastMapDB is a visual data mining tool that get tabular data stored in a relational database such as dates, numbers and texts, and by considering them as points in a multidimensional space, maps them to a three-dimensional space. The intuitive three-dimensional representation of objects enables a data analyst to see the behavior of the characteristics from abnormal forms of hemoglobin, highlighting the differences when compared to data from a group without alteration.
Resumo:
The quest for new control strategies for ticks can profit from high throughput genomics. In order to identify genes that are involved in oogenesis and development, in defense, and in hematophagy, the transcriptomes of ovaries, hemocytes, and salivary glands from rapidly ingurgitating females, and of salivary glands from males of Boophilus microplus were PCR amplified, and the expressed sequence tags (EST) of random clones were mass sequenced. So far, more than 1,344 EST have been generated for these tissues, with approximately 30% novelty, depending on the the tissue studied. To date approximately 760 nucleotide sequences from B. microplus are deposited in the NCBI database. Mass sequencing of partial cDNAs of parasite genes can build up this scant database and rapidly generate a large quantity of useful information about potential targets for immunobiological or chemical control.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Sickle Cell Disease (SCD) is one of the most prevalent hematological diseases in the world. Despite the immense progress in molecular knowledge about SCD in last years few therapeutical sources are currently available. Nowadays the treatment is performed mainly with drugs such as hydroxyurea or other fetal hemoglobin inducers and chelating agents. This review summarizes current knowledge about the treatment and the advancements in drug design in order to discover more effective and safe drugs. Patient monitoring methods in SCD are also discussed. © 2011 Bentham Science Publishers Ltd.
Resumo:
A significant set of information stored in different databases around the world, can be shared through peer-topeer databases. With that, is obtained a large base of knowledge, without the need for large investments because they are used existing databases, as well as the infrastructure in place. However, the structural characteristics of peer-topeer, makes complex the process of finding such information. On the other side, these databases are often heterogeneous in their schemas, but semantically similar in their content. A good peer-to-peer databases systems should allow the user access information from databases scattered across the network and receive only the information really relate to your topic of interest. This paper proposes to use ontologies in peer-to-peer database queries to represent the semantics inherent to the data. The main contribution of this work is enable integration between heterogeneous databases, improve the performance of such queries and use the algorithm of optimization Ant Colony to solve the problem of locating information on peer-to-peer networks, which presents an improve of 18% in results. © 2011 IEEE.