782 resultados para Spatial Data mining
Resumo:
The domain of Knowledge Discovery (KD) and Data Mining (DM) is of growing importance in a time where more and more data is produced and knowledge is one of the most precious assets. Having explored both the existing underlying theory, the results of the ongoing research in academia and the industry practices in the domain of KD and DM, we have found that this is a domain that still lacks some systematization. We also found that this systematization exists to a greater degree in the Software Engineering and Requirements Engineering domains, probably due to being more mature areas. We believe that it is possible to improve and facilitate the participation of enterprise stakeholders in the requirements engineering for KD projects by systematizing requirements engineering process for such projects. This will, in turn, result in more projects that end successfully, that is, with satisfied stakeholders, including in terms of time and budget constraints. With this in mind and based on all information found in the state-of-the art, we propose SysPRE - Systematized Process for Requirements Engineering in KD projects. We begin by proposing an encompassing generic description of the KD process, where the main focus is on the Requirements Engineering activities. This description is then used as a base for the application of the Design and Engineering Methodology for Organizations (DEMO) so that we can specify a formal ontology for this process. The resulting SysPRE ontology can serve as a base that can be used not only to make enterprises become aware of their own KD process and requirements engineering process in the KD projects, but also to improve such processes in reality, namely in terms of success rate.
Resumo:
In recent years, the DFA introduced by Peng, was established as an important tool capable of detecting long-range autocorrelation in time series with non-stationary. This technique has been successfully applied to various areas such as: Econophysics, Biophysics, Medicine, Physics and Climatology. In this study, we used the DFA technique to obtain the Hurst exponent (H) of the profile of electric density profile (RHOB) of 53 wells resulting from the Field School of Namorados. In this work we want to know if we can or not use H to spatially characterize the spatial data field. Two cases arise: In the first a set of H reflects the local geology, with wells that are geographically closer showing similar H, and then one can use H in geostatistical procedures. In the second case each well has its proper H and the information of the well are uncorrelated, the profiles show only random fluctuations in H that do not show any spatial structure. Cluster analysis is a method widely used in carrying out statistical analysis. In this work we use the non-hierarchy method of k-means. In order to verify whether a set of data generated by the k-means method shows spatial patterns, we create the parameter Ω (index of neighborhood). High Ω shows more aggregated data, low Ω indicates dispersed or data without spatial correlation. With help of this index and the method of Monte Carlo. Using Ω index we verify that random cluster data shows a distribution of Ω that is lower than actual cluster Ω. Thus we conclude that the data of H obtained in 53 wells are grouped and can be used to characterize space patterns. The analysis of curves level confirmed the results of the k-means
Resumo:
The study of complex systems has become a prestigious area of science, although relatively young . Its importance was demonstrated by the diversity of applications that several studies have already provided to various fields such as biology , economics and Climatology . In physics , the approach of complex systems is creating paradigms that influence markedly the new methods , bringing to Statistical Physics problems macroscopic level no longer restricted to classical studies such as those of thermodynamics . The present work aims to make a comparison and verification of statistical data on clusters of profiles Sonic ( DT ) , Gamma Ray ( GR ) , induction ( ILD ) , neutron ( NPHI ) and density ( RHOB ) to be physical measured quantities during exploratory drilling of fundamental importance to locate , identify and characterize oil reservoirs . Software were used : Statistica , Matlab R2006a , Origin 6.1 and Fortran for comparison and verification of the data profiles of oil wells ceded the field Namorado School by ANP ( National Petroleum Agency ) . It was possible to demonstrate the importance of the DFA method and that it proved quite satisfactory in that work, coming to the conclusion that the data H ( Hurst exponent ) produce spatial data with greater congestion . Therefore , we find that it is possible to find spatial pattern using the Hurst coefficient . The profiles of 56 wells have confirmed the existence of spatial patterns of Hurst exponents , ie parameter B. The profile does not directly assessed catalogs verification of geological lithology , but reveals a non-random spatial distribution
Resumo:
O objetivo deste trabalho foi avaliar cenários de níveis freáticos extremos, em bacia hidrográfica, por meio de métodos de análise espacial de dados geográficos. Avaliou-se a dinâmica espaço‑temporal dos recursos hídricos subterrâneos em área de afloramento do Sistema Aquífero Guarani. As alturas do lençol freático foram estimadas por meio do monitoramento de níveis em 23 piezômetros e da modelagem das séries temporais disponíveis de abril de 2004 a abril de 2011. Para a geração de cenários espaciais, foram utilizadas técnicas geoestatísticas que incorporaram informações auxiliares relativas a padrões geomorfológicos da bacia, por meio de modelo digital de terreno. Esse procedimento melhorou as estimativas, em razão da alta correlação entre altura do lençol e elevação, e agregou sentido físico às predições. Os cenários apresentaram diferenças quanto aos níveis considerados extremos - muito profundos ou muito superficiais - e podem subsidiar o planejamento, o uso eficiente da água e a gestão sustentável dos recursos hídricos na bacia.
Resumo:
This work demonstrates the importance of using tools used in geographic information systems (GIS) and spatial data analysis (SDA) for the study of infectious diseases. Analysis methods were used to describe more fully the spatial distribution of a particular disease by incorporating the geographical element in the analysis. In Chapter 1, we report the historical evolution of these techniques in the field of human health and use Hansen s disease (leprosy) in Rio Grande do Norte as an example. In Chapter 2, we introduced a few basic theoretical concepts on the methodology and classified the types of spatial data commonly treated. Chapters 3 and 4 defined and demonstrated the use of the two most important techniques for analysis of health data, which are data point processes and data area. We modelled the case distribution of Hansen s disease in the city of Mossoró - RN. In the analysis, we used R scripts and made available routines and analitical procedures developed by the author. This approach can be easily used by researchers in several areas. As practical results, major risk areas in Mossoró leprosy were detected, and its association with the socioeconomic profile of the population at risk was found. Moreover, it is clearly shown that his approach could be of great help to be used continuously in data analysis and processing, allowing the development of new strategies to work might increase the use of such techniques in data analysis in health care
Resumo:
Layer mortality due to heat stress is an important economic loss for the producer. The aim of this study was to determine the mortality pattern of layers reared in the region of Bastos, SP, Brazil, according to external environment and bird age. Data mining technique were used based on monthly mortality records of hens in production, 135 poultry houses, from January 2004 to August 2008. The external environment was characterized according maximum and minimum temperatures, obtained monthly at the meteorological station CATI in the city of Tupa, SP, Brazil. Mortality was classified as normal (<= 1.2%) or high (> 1.2%), considering the mortality limits mentioned in literature. Data mining technique produced a decision tree with nine levels and 23 leaves, with 62.6% of overall accuracy. The hit rate for the High class was 64.1% and 59.9% for Normal class. The decision tree allowed finding a pattern in the mortality data, generating a model for estimating mortality based on the thermal environment and bird age.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This work aims to study the problem of the formal job in the Brazilian Northeast region and its effect in the social inclusion, taking for base the analysis of variables defined in the Atlas of Social Exclusion, which is based on the 2000 Brazilian Census, choosing the county as unit of analysis. As methodological options, an exploratory data analysis was performed, followed by multivariate statistical techniques, such as weighted multiple regression analysis, cluster analysis and exploratory analysis of spatial data. The results pointed out to low rates of formal job for the active age population as well as low indexes of social inclusion in the Northeast region of Brazil. A strong association of the formal job with the indicators of social inclusion under investigation, was evidenced (schooling, inequality, poverty, youth and income form government transfers), as well as a strong association of the formal job with the new index of social inclusion (IIS), modified from the IES. At the Federative Units, in which better levels of formal job had been found, good indexes of social inclusion are also observed. Highlights for the state of the Rio Grande do Norte, with the best conditions of life, and for the states of the Maranhão and Piauí, with the worst conditions. The situation of the Northeast region, facing the indicators under study, is very precarious, claiming for the necessity of emphasizing programs and governmental actions, specially directed to the raise of formal job levels of the region, reflecting, thus, in improvements on the income inequality, as well as in the social inclusion of the population of Northeastern natives.
Resumo:
The relevance of rising healthcare costs is a main topic in complementary health companies in Brazil. In 2011, these expenses consumed more than 80% of the monthly health insurance in Brazil. Considering the administrative costs, it is observed that the companies operating in this market work, on average, at the threshold between profit and loss. This paper presents results after an investigation of the welfare costs of a health plan company in Brazil. It was based on the KDD process and explorative Data Mining. A diversity of results is presented, such as data summarization, providing compact descriptions of the data, revealing common features and intrinsic observations. Among the key findings was observed that a small portion of the population is responsible for the most demanding of resources devoted to health care
Resumo:
Currently, one of the biggest challenges for the field of data mining is to perform cluster analysis on complex data. Several techniques have been proposed but, in general, they can only achieve good results within specific areas providing no consensus of what would be the best way to group this kind of data. In general, these techniques fail due to non-realistic assumptions about the true probability distribution of the data. Based on this, this thesis proposes a new measure based on Cross Information Potential that uses representative points of the dataset and statistics extracted directly from data to measure the interaction between groups. The proposed approach allows us to use all advantages of this information-theoretic descriptor and solves the limitations imposed on it by its own nature. From this, two cost functions and three algorithms have been proposed to perform cluster analysis. As the use of Information Theory captures the relationship between different patterns, regardless of assumptions about the nature of this relationship, the proposed approach was able to achieve a better performance than the main algorithms in literature. These results apply to the context of synthetic data designed to test the algorithms in specific situations and to real data extracted from problems of different fields
Resumo:
The use of Geographic Information Systems (GIS) has becoming very important in fields where detailed and precise study of earth surface features is required. Applications in environmental protection are such an example that requires the use of GIS tools for analysis and decision by managers and enrolled community of protected areas. In this specific field, a challenge that remains is to build a GIS that can be dynamically fed with data, allowing researchers and other agents to recover actual and up to date information. In some cases, data is acquired in several ways and come from different sources. To solve this problem, some tools were implemented that includes a model for spatial data treatment on the Web. The research issues involved start with the feeding and processing of environmental control data collected in-loco as biotic and geological variables and finishes with the presentation of all information on theWeb. For this dynamic processing, it was developed some tools that make MapServer more flexible and dynamic, allowing data uploading by the proper users. Furthermore, it was also developed a module that uses interpolation to aiming spatial data analysis. A complex application that has validated this research is to feed the system with data coming from coral reef regions located in northeast of Brazil. The system was implemented using the best interactivity concept provided by the AJAX model and resulted in a substantial contribution for efficiently accessing information, being an essential mechanism for controlling events in the environmental monitoring
Resumo:
The opening of the Brazilian market of electricity and competitiveness between companies in the energy sector make the search for useful information and tools that will assist in decision making activities, increase by the concessionaires. An important source of knowledge for these utilities is the time series of energy demand. The identification of behavior patterns and description of events become important for the planning execution, seeking improvements in service quality and financial benefits. This dissertation presents a methodology based on mining and representation tools of time series, in order to extract knowledge that relate series of electricity demand in various substations connected of a electric utility. The method exploits the relationship of duration, coincidence and partial order of events in multi-dimensionals time series. To represent the knowledge is used the language proposed by Mörchen (2005) called Time Series Knowledge Representation (TSKR). We conducted a case study using time series of energy demand of 8 substations interconnected by a ring system, which feeds the metropolitan area of Goiânia-GO, provided by CELG (Companhia Energética de Goiás), responsible for the service of power distribution in the state of Goiás (Brazil). Using the proposed methodology were extracted three levels of knowledge that describe the behavior of the system studied, representing clearly the system dynamics, becoming a tool to assist planning activities
Resumo:
Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature