791 resultados para Web Data Mining
Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods
Resumo:
Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Resumo:
This article highlights the potential benefits that the Kohonen method has for the classification of rivers with similar characteristics by determining regional ecological flows using the ELOHA (Ecological Limits of Hydrologic Alteration) methodology. Currently, there are many methodologies for the classification of rivers, however none of them include the characteristics found in Kohonen method such as (i) providing the number of groups that actually underlie the information presented, (ii) used to make variable importance analysis, (iii) which in any case can display two-dimensional classification process, and (iv) that regardless of the parameters used in the model the clustering structure remains. In order to evaluate the potential benefits of the Kohonen method, 174 flow stations distributed along the great river basin “Magdalena-Cauca” (Colombia) were analyzed. 73 variables were obtained for the classification process in each case. Six trials were done using different combinations of variables and the results were validated against reference classification obtained by Ingfocol in 2010, whose results were also framed using ELOHA guidelines. In the process of validation it was found that two of the tested models reproduced a level higher than 80% of the reference classification with the first trial, meaning that more than 80% of the flow stations analyzed in both models formed invariant groups of streams.
Resumo:
An underwater gas pipeline is the portion of the pipeline that crosses a river beneath its bottom. Underwater gas pipelines are subject to increasing dangers as time goes by. An accident at an underwater gas pipeline can lead to technological and environmental disaster on the scale of an entire region. Therefore, timely troubleshooting of all underwater gas pipelines in order to prevent any potential accidents will remain a pressing task for the industry. The most important aspect of resolving this challenge is the quality of the automated system in question. Now the industry doesn't have any automated system that fully meets the needs of the experts working in the field maintaining underwater gas pipelines. Principle Aim of this Research: This work aims to develop a new system of automated monitoring which would simplify the process of evaluating the technical condition and decision making on planning and preventive maintenance and repair work on the underwater gas pipeline. Objectives: Creation a shared model for a new, automated system via IDEF3; Development of a new database system which would store all information about underwater gas pipelines; Development a new application that works with database servers, and provides an explanation of the results obtained from the server; Calculation of the values MTBF for specified pipelines based on quantitative data obtained from tests of this system. Conclusion: The new, automated system PodvodGazExpert has been developed for timely and qualitative determination of the physical conditions of underwater gas pipeline; The basis of the mathematical analysis of this new, automated system uses principal component analysis method; The process of determining the physical condition of an underwater gas pipeline with this new, automated system increases the MTBF by a factor of 8.18 above the existing system used today in the industry.
Resumo:
Cada vez mais o tempo acaba sendo o diferencial de uma empresa para outra. As empresas, para serem bem sucedidas, precisam da informação certa, no momento certo e para as pessoas certas. Os dados outrora considerados importantes para a sobrevivência das empresas hoje precisam estar em formato de informações para serem utilizados. Essa é a função das ferramentas de “Business Intelligence”, cuja finalidade é modelar os dados para obter informações, de forma que diferencie as ações das empresas e essas consigam ser mais promissoras que as demais. “Business Intelligence” é um processo de coleta, análise e distribuição de dados para melhorar a decisão de negócios, que leva a informação a um número bem maior de usuários dentro da corporação. Existem vários tipos de ferramentas que se propõe a essa finalidade. Esse trabalho tem como objetivo comparar ferramentas através do estudo das técnicas de modelagem dimensional, fundamentais nos projetos de estruturas informacionais, suporte a “Data Warehouses”, “Data Marts”, “Data Mining” e outros, bem como o mercado, suas vantagens e desvantagens e a arquitetura tecnológica utilizada por estes produtos. Assim sendo, foram selecionados os conjuntos de ferramentas de “Business Intelligence” das empresas Microsoft Corporation e Oracle Corporation, visto as suas magnitudes no mundo da informática.
Resumo:
In this thesis, the basic research of Chase and Simon (1973) is questioned, and we seek new results by analyzing the errors of experts and beginners chess players in experiments to reproduce chess positions. Chess players with different levels of expertise participated in the study. The results were analyzed by a Brazilian grandmaster, and quantitative analysis was performed with the use of statistical methods data mining. The results challenge significantly, the current theories of expertise, memory and decision making in this area, because the present theory predicts piece on square encoding, in which players can recognize the strategic situation reproducing it faithfully, but commit several errors that the theory can¿t explain. The current theory can¿t fully explain the encoding used by players to register a board. The errors of intermediary players preserved fragments of the strategic situation, although they have committed a series of errors in the reconstruction of the positions. The encoding of chunks therefore includes more information than that predicted by current theories. Currently, research on perception, trial and decision is heavily concentrated on the idea of pattern recognition". Based on the results of this research, we explore a change of perspective. The idea of "pattern recognition" presupposes that the processing of relevant information is on "patterns" (or data) that exist independently of any interpretation. We propose that the theory suggests the vision of decision-making via the recognition of experience.
Resumo:
In this thesis, the basic research of Chase and Simon (1973) is questioned, and we seek new results by analyzing the errors of experts and beginners chess players in experiments to reproduce chess positions. Chess players with different levels of expertise participated in the study. The results were analyzed by a Brazilian grandmaster, and quantitative analysis was performed with the use of statistical methods data mining. The results challenge significantly, the current theories of expertise, memory and decision making in this area, because the present theory predicts piece on square encoding, in which players can recognize the strategic situation reproducing it faithfully, but commit several errors that the theory can¿t explain. The current theory can¿t fully explain the encoding used by players to register a board. The errors of intermediary players preserved fragments of the strategic situation, although they have committed a series of errors in the reconstruction of the positions. The encoding of chunks therefore includes more information than that predicted by current theories. Currently, research on perception, trial and decision is heavily concentrated on the idea of 'pattern recognition'. Based on the results of this research, we explore a change of perspective. The idea of 'pattern recognition' presupposes that the processing of relevant information is on 'patterns' (or data) that exist independently of any interpretation. We propose that the theory suggests the vision of decision-making via the recognition of experience.
Resumo:
O objetivo deste trabalho é testar a aplicação de um modelo gráfico probabilístico, denominado genericamente de Redes Bayesianas, para desenvolver modelos computacionais que possam ser utilizados para auxiliar a compreensão de problemas e/ou na previsão de variáveis de natureza econômica. Com este propósito, escolheu-se um problema amplamente abordado na literatura e comparou-se os resultados teóricos e experimentais já consolidados com os obtidos utilizando a técnica proposta. Para tanto,foi construído um modelo para a classificação da tendência do "risco país" para o Brasil a partir de uma base de dados composta por variáveis macroeconômicas e financeiras. Como medida do risco adotou-se o EMBI+ (Emerging Markets Bond Index Plus), por ser um indicador amplamente utilizado pelo mercado.
Resumo:
The domain of Knowledge Discovery (KD) and Data Mining (DM) is of growing importance in a time where more and more data is produced and knowledge is one of the most precious assets. Having explored both the existing underlying theory, the results of the ongoing research in academia and the industry practices in the domain of KD and DM, we have found that this is a domain that still lacks some systematization. We also found that this systematization exists to a greater degree in the Software Engineering and Requirements Engineering domains, probably due to being more mature areas. We believe that it is possible to improve and facilitate the participation of enterprise stakeholders in the requirements engineering for KD projects by systematizing requirements engineering process for such projects. This will, in turn, result in more projects that end successfully, that is, with satisfied stakeholders, including in terms of time and budget constraints. With this in mind and based on all information found in the state-of-the art, we propose SysPRE - Systematized Process for Requirements Engineering in KD projects. We begin by proposing an encompassing generic description of the KD process, where the main focus is on the Requirements Engineering activities. This description is then used as a base for the application of the Design and Engineering Methodology for Organizations (DEMO) so that we can specify a formal ontology for this process. The resulting SysPRE ontology can serve as a base that can be used not only to make enterprises become aware of their own KD process and requirements engineering process in the KD projects, but also to improve such processes in reality, namely in terms of success rate.
Resumo:
Layer mortality due to heat stress is an important economic loss for the producer. The aim of this study was to determine the mortality pattern of layers reared in the region of Bastos, SP, Brazil, according to external environment and bird age. Data mining technique were used based on monthly mortality records of hens in production, 135 poultry houses, from January 2004 to August 2008. The external environment was characterized according maximum and minimum temperatures, obtained monthly at the meteorological station CATI in the city of Tupa, SP, Brazil. Mortality was classified as normal (<= 1.2%) or high (> 1.2%), considering the mortality limits mentioned in literature. Data mining technique produced a decision tree with nine levels and 23 leaves, with 62.6% of overall accuracy. The hit rate for the High class was 64.1% and 59.9% for Normal class. The decision tree allowed finding a pattern in the mortality data, generating a model for estimating mortality based on the thermal environment and bird age.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)