782 resultados para Spatial Data mining
Resumo:
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Due to the free nature of Wikipedia and allowing open access to everyone to edit articles the quality of articles may be affected. As all people don’t have equal level of knowledge and also different people have different opinions about a topic so there may be difference between the contributions made by different authors. To overcome this situation it is very important to classify the articles so that the articles of good quality can be separated from the poor quality articles and should be removed from the database. The aim of this study is to classify the articles of Wikipedia into two classes class 0 (poor quality) and class 1(good quality) using the Adaptive Neuro Fuzzy Inference System (ANFIS) and data mining techniques. Two ANFIS are built using the Fuzzy Logic Toolbox [1] available in Matlab. The first ANFIS is based on the rules obtained from J48 classifier in WEKA while the other one was built by using the expert’s knowledge. The data used for this research work contains 226 article’s records taken from the German version of Wikipedia. The dataset consists of 19 inputs and one output. The data was preprocessed to remove any similar attributes. The input variables are related to the editors, contributors, length of articles and the lifecycle of articles. In the end analysis of different methods implemented in this research is made to analyze the performance of each classification method used.
Resumo:
Parkinson's disease (PD) is the second most common neurodegenerative disorder (after Alzheimer's disease) and directly affects upto 5 million people worldwide. The stages (Hoehn and Yaar) of disease has been predicted by many methods which will be helpful for the doctors to give the dosage according to it. So these methods were brought up based on the data set which includes about seventy patients at nine clinics in Sweden. The purpose of the work is to analyze unsupervised technique with supervised neural network techniques in order to make sure the collected data sets are reliable to make decisions. The data which is available was preprocessed before calculating the features of it. One of the complex and efficient feature called wavelets has been calculated to present the data set to the network. The dimension of the final feature set has been reduced using principle component analysis. For unsupervised learning k-means gives the closer result around 76% while comparing with supervised techniques. Back propagation and J4 has been used as supervised model to classify the stages of Parkinson's disease where back propagation gives the variance percentage of 76-82%. The results of both these models have been analyzed. This proves that the data which are collected are reliable to predict the disease stages in Parkinson's disease.
Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods
Resumo:
Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Resumo:
ABSTRACT World Heritage sites provide a glimpse into the stories and civilizations of the past. There are currently 1007 unique World Heritage properties with 779 being classified as cultural sites, 197 as natural sites, and 31 falling into the categories of both cultural and natural sites (UNESCO & World Heritage Centre, 1992-2015). However, of these 1007 World Heritage sites, at least 46 are categorized as in danger and this number continues to grow. These unique and irreplaceable sites are exceptional because of their universality. Consequently, since World Heritage sites belong to all the people of the world and provide inspiration and admiration to all who visit them, it is our responsibility to help preserve these sites. The key form of preservation involves the individual monitoring of each site over time. While traditional methods are still extremely valuable, more recent advances in the field of geographic and spatial technologies including geographic information systems (GIS), laser scanning, and remote sensing, are becoming more beneficial for the monitoring and overall safeguarding of World Heritage sites. Through the employment and analysis of more accurately detailed spatial data, World Heritage sites can be better managed. There is a strong urgency to protect these sites. The purpose of this thesis is to describe the importance of taking care of World Heritage sites and to depict a way in which spatial technologies can be used to monitor and in effect preserve World Heritage sites through the utilization of remote sensing imagery. The research conducted in this thesis centers on the Everglades National Park, a World Heritage site that is continually affected by changes in vegetation. Data used include Landsat satellite imagery that dates from 2001-2003, the Everglades' boundaries shapefile, and Google Earth imagery. In order to conduct the in-depth analysis of vegetation change within the selected World Heritage site, three main techniques were performed to study changes found within the imagery. These techniques consist of conducting supervised classification for each image, incorporating a vegetation index known as Normalized Vegetation Index (NDVI), and utilizing the change detection tool available in the Environment for Visualizing Images (ENVI) software. With the research and analysis conducted throughout this thesis, it has been shown that within the three year time span (2001-2003), there has been an overall increase in both areas of barren soil (5.760%) and areas of vegetation (1.263%) with a decrease in the percentage of areas classified as sparsely vegetated (-6.987%). These results were gathered through the use of the maximum likelihood classification process available in the ENVI software. The results produced by the change detection tool which further analyzed vegetation change correlate with the results produced by the classification method. As well, by utilizing the NDVI method, one is able to locate changes by selecting a specific area and comparing the vegetation index generated for each date. It has been found that through the utilization of remote sensing technology, it is possible to monitor and observe changes featured within a World Heritage site. Remote sensing is an extraordinary tool that can and should be used by all site managers and organizations whose goal it is to preserve and protect World Heritage sites. Remote sensing can be used to not only observe changes over time, but it can also be used to pinpoint threats within a World Heritage site. World Heritage sites are irreplaceable sources of beauty, culture, and inspiration. It is our responsibility, as citizens of this world, to guard these treasures.
Resumo:
This article highlights the potential benefits that the Kohonen method has for the classification of rivers with similar characteristics by determining regional ecological flows using the ELOHA (Ecological Limits of Hydrologic Alteration) methodology. Currently, there are many methodologies for the classification of rivers, however none of them include the characteristics found in Kohonen method such as (i) providing the number of groups that actually underlie the information presented, (ii) used to make variable importance analysis, (iii) which in any case can display two-dimensional classification process, and (iv) that regardless of the parameters used in the model the clustering structure remains. In order to evaluate the potential benefits of the Kohonen method, 174 flow stations distributed along the great river basin “Magdalena-Cauca” (Colombia) were analyzed. 73 variables were obtained for the classification process in each case. Six trials were done using different combinations of variables and the results were validated against reference classification obtained by Ingfocol in 2010, whose results were also framed using ELOHA guidelines. In the process of validation it was found that two of the tested models reproduced a level higher than 80% of the reference classification with the first trial, meaning that more than 80% of the flow stations analyzed in both models formed invariant groups of streams.
Resumo:
An underwater gas pipeline is the portion of the pipeline that crosses a river beneath its bottom. Underwater gas pipelines are subject to increasing dangers as time goes by. An accident at an underwater gas pipeline can lead to technological and environmental disaster on the scale of an entire region. Therefore, timely troubleshooting of all underwater gas pipelines in order to prevent any potential accidents will remain a pressing task for the industry. The most important aspect of resolving this challenge is the quality of the automated system in question. Now the industry doesn't have any automated system that fully meets the needs of the experts working in the field maintaining underwater gas pipelines. Principle Aim of this Research: This work aims to develop a new system of automated monitoring which would simplify the process of evaluating the technical condition and decision making on planning and preventive maintenance and repair work on the underwater gas pipeline. Objectives: Creation a shared model for a new, automated system via IDEF3; Development of a new database system which would store all information about underwater gas pipelines; Development a new application that works with database servers, and provides an explanation of the results obtained from the server; Calculation of the values MTBF for specified pipelines based on quantitative data obtained from tests of this system. Conclusion: The new, automated system PodvodGazExpert has been developed for timely and qualitative determination of the physical conditions of underwater gas pipeline; The basis of the mathematical analysis of this new, automated system uses principal component analysis method; The process of determining the physical condition of an underwater gas pipeline with this new, automated system increases the MTBF by a factor of 8.18 above the existing system used today in the industry.
Resumo:
Cada vez mais o tempo acaba sendo o diferencial de uma empresa para outra. As empresas, para serem bem sucedidas, precisam da informação certa, no momento certo e para as pessoas certas. Os dados outrora considerados importantes para a sobrevivência das empresas hoje precisam estar em formato de informações para serem utilizados. Essa é a função das ferramentas de “Business Intelligence”, cuja finalidade é modelar os dados para obter informações, de forma que diferencie as ações das empresas e essas consigam ser mais promissoras que as demais. “Business Intelligence” é um processo de coleta, análise e distribuição de dados para melhorar a decisão de negócios, que leva a informação a um número bem maior de usuários dentro da corporação. Existem vários tipos de ferramentas que se propõe a essa finalidade. Esse trabalho tem como objetivo comparar ferramentas através do estudo das técnicas de modelagem dimensional, fundamentais nos projetos de estruturas informacionais, suporte a “Data Warehouses”, “Data Marts”, “Data Mining” e outros, bem como o mercado, suas vantagens e desvantagens e a arquitetura tecnológica utilizada por estes produtos. Assim sendo, foram selecionados os conjuntos de ferramentas de “Business Intelligence” das empresas Microsoft Corporation e Oracle Corporation, visto as suas magnitudes no mundo da informática.
Resumo:
Dois experimentos e um levantamento por amostragem foram analisados no contexto de dados espaciais. Os experimentos foram delineados em blocos completos casualizados sendo que no experimento um (EXP 1) foram avaliados oito cultivares de trevo branco, sendo estudadas as variáveis Matéria Seca Total (MST) e Matéria Seca de Gramíneas (MSGRAM) e no experimento dois (EXP 2) 20 cultivares de espécies forrageiras, onde foi estudada a variável Percentagem de Implantação (%IMPL). As variáveis foram analisadas no contexto de modelos mistos, sendo modelada a variabilidade espacial através de semivariogramas exponencias, esféricos e gaussianos. Verificou-se uma diminuição em média de 19% e 14% do Coeficiente de Variação (CV) das medias dos cultivares, e uma diminuição em média de 24,6% e 33,3% nos erros padrões dos contrastes ortogonais propostos em MST e MSGRAM. No levantamento por amostragem, estudou-se a associação espacial em Aristida laevis (Nees) Kunth , Paspalum notatum Fl e Demodium incanum DC, amostrados em uma transecção fixa de quadros contiguos, a quatro tamanhos de unidades amostrais (0,1x0,1m; 0,1x0,3m; 0,1x0,5m; e 0,1x1,0m). Nas espécies Aristida laevis (Nees) Kunth e Paspalum notatum Fl, existiu um bom ajuste dos semivariogramas a tamanhos menores das unidades amostrais, diminuíndo quando a unidade amostral foi maior. Desmodium incanum DC apresentou comportamento contrario, ajustando melhor os semivariogramas a tamanhos maiores das unidades amostrais.
Resumo:
In this thesis, the basic research of Chase and Simon (1973) is questioned, and we seek new results by analyzing the errors of experts and beginners chess players in experiments to reproduce chess positions. Chess players with different levels of expertise participated in the study. The results were analyzed by a Brazilian grandmaster, and quantitative analysis was performed with the use of statistical methods data mining. The results challenge significantly, the current theories of expertise, memory and decision making in this area, because the present theory predicts piece on square encoding, in which players can recognize the strategic situation reproducing it faithfully, but commit several errors that the theory can¿t explain. The current theory can¿t fully explain the encoding used by players to register a board. The errors of intermediary players preserved fragments of the strategic situation, although they have committed a series of errors in the reconstruction of the positions. The encoding of chunks therefore includes more information than that predicted by current theories. Currently, research on perception, trial and decision is heavily concentrated on the idea of pattern recognition". Based on the results of this research, we explore a change of perspective. The idea of "pattern recognition" presupposes that the processing of relevant information is on "patterns" (or data) that exist independently of any interpretation. We propose that the theory suggests the vision of decision-making via the recognition of experience.
Resumo:
In this thesis, the basic research of Chase and Simon (1973) is questioned, and we seek new results by analyzing the errors of experts and beginners chess players in experiments to reproduce chess positions. Chess players with different levels of expertise participated in the study. The results were analyzed by a Brazilian grandmaster, and quantitative analysis was performed with the use of statistical methods data mining. The results challenge significantly, the current theories of expertise, memory and decision making in this area, because the present theory predicts piece on square encoding, in which players can recognize the strategic situation reproducing it faithfully, but commit several errors that the theory can¿t explain. The current theory can¿t fully explain the encoding used by players to register a board. The errors of intermediary players preserved fragments of the strategic situation, although they have committed a series of errors in the reconstruction of the positions. The encoding of chunks therefore includes more information than that predicted by current theories. Currently, research on perception, trial and decision is heavily concentrated on the idea of 'pattern recognition'. Based on the results of this research, we explore a change of perspective. The idea of 'pattern recognition' presupposes that the processing of relevant information is on 'patterns' (or data) that exist independently of any interpretation. We propose that the theory suggests the vision of decision-making via the recognition of experience.
Resumo:
O objetivo deste trabalho é testar a aplicação de um modelo gráfico probabilístico, denominado genericamente de Redes Bayesianas, para desenvolver modelos computacionais que possam ser utilizados para auxiliar a compreensão de problemas e/ou na previsão de variáveis de natureza econômica. Com este propósito, escolheu-se um problema amplamente abordado na literatura e comparou-se os resultados teóricos e experimentais já consolidados com os obtidos utilizando a técnica proposta. Para tanto,foi construído um modelo para a classificação da tendência do "risco país" para o Brasil a partir de uma base de dados composta por variáveis macroeconômicas e financeiras. Como medida do risco adotou-se o EMBI+ (Emerging Markets Bond Index Plus), por ser um indicador amplamente utilizado pelo mercado.
Resumo:
O estudo do fenômeno cluster, nos últimos 30 anos, sob inúmeras qualificações, tem atraído atenção crescente tanto do meio acadêmico quanto dos responsáveis pelo desenvolvimento de políticas públicas. No entanto, seu ciclo de desenvolvimento ainda é um aspecto pouco explorado. A necessidade em se investigar e compreender o ciclo de desenvolvimento dos clusters é destacada por vários autores. Do ponto de vista das políticas públicas, a importância em se identificar o ciclo de evolução dos clusters, bem como cada um de seus estágios, reside na assertividade das ações por elas proporcionada. A tese é apresentada na forma de três artigos científicos. O primeiro busca determinar a natureza e as dimensões da noção de cluster pela exploração das bases teórico-conceituais de três áreas de conhecimento em que o conceito é mais comumente empregado: a geografia econômica, a administração estratégica e a administração de operações. Além de outras descobertas, os resultados proporcionam uma classificação dos temas de pesquisa ao redor do conceito, bem como das linhas teórico-conceituais que os sustentam. O segundo lida com a questão de como os clusters são identificados. Partindo de algumas limitações das estatísticas da economia regional, em especial do Quociente Locacional (QL), o autor propõe uma abordagem de aplicação conjunta dessas estatísticas com aquelas apresentadas pela Análise Exploratória de Dados Espaciais (AEDE). Os resultados mostram a complementariedade das abordagens, pois parte das limitações da primeira consegue ser resolvida pela segunda, mas, mais importante, conseguem revelar o valor proporcionado pelas estatísticas da AEDE acerca da dinâmica regional. Por fim, o terceiro e último artigo propõe um instrumento de avaliação do ciclo de desenvolvimento dos clusters envolvendo 6 dimensões e 11 elementos de avaliação. A partir dos potenciais clusters identificados no segundo artigo, o instrumento é aplicado sobre três deles. O resultado é a obtenção de uma abordagem simples, conceitualmente coerente, prática e de baixo custo de aplicação.