786 resultados para Data mining models
Resumo:
Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.
Resumo:
One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we present different ofrailtyo models to analyze longitudinal data in the presence of covariates. These models incorporate the extra-Poisson variability and the possible correlation among the repeated counting data for each individual. Assuming a CD4 counting data set in HIV-infected patients, we develop a hierarchical Bayesian analysis considering the different proposed models and using Markov Chain Monte Carlo methods. We also discuss some Bayesian discrimination aspects for the choice of the best model.
Resumo:
Multidimensional Visualization techniques are invaluable tools for analysis of structured and unstructured data with variable dimensionality. This paper introduces PEx-Image-Projection Explorer for Images-a tool aimed at supporting analysis of image collections. The tool supports a methodology that employs interactive visualizations to aid user-driven feature detection and classification tasks, thus offering improved analysis and exploration capabilities. The visual mappings employ similarity-based multidimensional projections and point placement to layout the data on a plane for visual exploration. In addition to its application to image databases, we also illustrate how the proposed approach can be successfully employed in simultaneous analysis of different data types, such as text and images, offering a common visual representation for data expressed in different modalities.
Resumo:
Fire is a major management issue in the southwestern United States. Three spatial models of fire risk for Coconino County, Northern Arizona. These models were generated using thematic data layers depicting vegetation, elevation, wind speed and direction, and precipitation for January (winter), June (summer), and July (start of monsoon season). ArcGIS 9.0 was used to weight attributes in raster layers to reflect their influence on fire risk and to interpolate raster data layers from point data. Final models were generated using the raster calculator in the Spatial Analyst extension of ArcGIS 9.0. Ultimately, the unique combinations of variables resulted in three different models illustrating the change in fire risk during the year.
Resumo:
Cada vez mais o tempo acaba sendo o diferencial de uma empresa para outra. As empresas, para serem bem sucedidas, precisam da informação certa, no momento certo e para as pessoas certas. Os dados outrora considerados importantes para a sobrevivência das empresas hoje precisam estar em formato de informações para serem utilizados. Essa é a função das ferramentas de “Business Intelligence”, cuja finalidade é modelar os dados para obter informações, de forma que diferencie as ações das empresas e essas consigam ser mais promissoras que as demais. “Business Intelligence” é um processo de coleta, análise e distribuição de dados para melhorar a decisão de negócios, que leva a informação a um número bem maior de usuários dentro da corporação. Existem vários tipos de ferramentas que se propõe a essa finalidade. Esse trabalho tem como objetivo comparar ferramentas através do estudo das técnicas de modelagem dimensional, fundamentais nos projetos de estruturas informacionais, suporte a “Data Warehouses”, “Data Marts”, “Data Mining” e outros, bem como o mercado, suas vantagens e desvantagens e a arquitetura tecnológica utilizada por estes produtos. Assim sendo, foram selecionados os conjuntos de ferramentas de “Business Intelligence” das empresas Microsoft Corporation e Oracle Corporation, visto as suas magnitudes no mundo da informática.
Resumo:
In this thesis, the basic research of Chase and Simon (1973) is questioned, and we seek new results by analyzing the errors of experts and beginners chess players in experiments to reproduce chess positions. Chess players with different levels of expertise participated in the study. The results were analyzed by a Brazilian grandmaster, and quantitative analysis was performed with the use of statistical methods data mining. The results challenge significantly, the current theories of expertise, memory and decision making in this area, because the present theory predicts piece on square encoding, in which players can recognize the strategic situation reproducing it faithfully, but commit several errors that the theory can¿t explain. The current theory can¿t fully explain the encoding used by players to register a board. The errors of intermediary players preserved fragments of the strategic situation, although they have committed a series of errors in the reconstruction of the positions. The encoding of chunks therefore includes more information than that predicted by current theories. Currently, research on perception, trial and decision is heavily concentrated on the idea of pattern recognition". Based on the results of this research, we explore a change of perspective. The idea of "pattern recognition" presupposes that the processing of relevant information is on "patterns" (or data) that exist independently of any interpretation. We propose that the theory suggests the vision of decision-making via the recognition of experience.
Resumo:
In this thesis, the basic research of Chase and Simon (1973) is questioned, and we seek new results by analyzing the errors of experts and beginners chess players in experiments to reproduce chess positions. Chess players with different levels of expertise participated in the study. The results were analyzed by a Brazilian grandmaster, and quantitative analysis was performed with the use of statistical methods data mining. The results challenge significantly, the current theories of expertise, memory and decision making in this area, because the present theory predicts piece on square encoding, in which players can recognize the strategic situation reproducing it faithfully, but commit several errors that the theory can¿t explain. The current theory can¿t fully explain the encoding used by players to register a board. The errors of intermediary players preserved fragments of the strategic situation, although they have committed a series of errors in the reconstruction of the positions. The encoding of chunks therefore includes more information than that predicted by current theories. Currently, research on perception, trial and decision is heavily concentrated on the idea of 'pattern recognition'. Based on the results of this research, we explore a change of perspective. The idea of 'pattern recognition' presupposes that the processing of relevant information is on 'patterns' (or data) that exist independently of any interpretation. We propose that the theory suggests the vision of decision-making via the recognition of experience.
Resumo:
The domain of Knowledge Discovery (KD) and Data Mining (DM) is of growing importance in a time where more and more data is produced and knowledge is one of the most precious assets. Having explored both the existing underlying theory, the results of the ongoing research in academia and the industry practices in the domain of KD and DM, we have found that this is a domain that still lacks some systematization. We also found that this systematization exists to a greater degree in the Software Engineering and Requirements Engineering domains, probably due to being more mature areas. We believe that it is possible to improve and facilitate the participation of enterprise stakeholders in the requirements engineering for KD projects by systematizing requirements engineering process for such projects. This will, in turn, result in more projects that end successfully, that is, with satisfied stakeholders, including in terms of time and budget constraints. With this in mind and based on all information found in the state-of-the art, we propose SysPRE - Systematized Process for Requirements Engineering in KD projects. We begin by proposing an encompassing generic description of the KD process, where the main focus is on the Requirements Engineering activities. This description is then used as a base for the application of the Design and Engineering Methodology for Organizations (DEMO) so that we can specify a formal ontology for this process. The resulting SysPRE ontology can serve as a base that can be used not only to make enterprises become aware of their own KD process and requirements engineering process in the KD projects, but also to improve such processes in reality, namely in terms of success rate.
Resumo:
Layer mortality due to heat stress is an important economic loss for the producer. The aim of this study was to determine the mortality pattern of layers reared in the region of Bastos, SP, Brazil, according to external environment and bird age. Data mining technique were used based on monthly mortality records of hens in production, 135 poultry houses, from January 2004 to August 2008. The external environment was characterized according maximum and minimum temperatures, obtained monthly at the meteorological station CATI in the city of Tupa, SP, Brazil. Mortality was classified as normal (<= 1.2%) or high (> 1.2%), considering the mortality limits mentioned in literature. Data mining technique produced a decision tree with nine levels and 23 leaves, with 62.6% of overall accuracy. The hit rate for the High class was 64.1% and 59.9% for Normal class. The decision tree allowed finding a pattern in the mortality data, generating a model for estimating mortality based on the thermal environment and bird age.