10 resultados para Spatial Data mining
em Universidade Federal do Rio Grande do Norte(UFRN)
Resumo:
In recent years, the DFA introduced by Peng, was established as an important tool capable of detecting long-range autocorrelation in time series with non-stationary. This technique has been successfully applied to various areas such as: Econophysics, Biophysics, Medicine, Physics and Climatology. In this study, we used the DFA technique to obtain the Hurst exponent (H) of the profile of electric density profile (RHOB) of 53 wells resulting from the Field School of Namorados. In this work we want to know if we can or not use H to spatially characterize the spatial data field. Two cases arise: In the first a set of H reflects the local geology, with wells that are geographically closer showing similar H, and then one can use H in geostatistical procedures. In the second case each well has its proper H and the information of the well are uncorrelated, the profiles show only random fluctuations in H that do not show any spatial structure. Cluster analysis is a method widely used in carrying out statistical analysis. In this work we use the non-hierarchy method of k-means. In order to verify whether a set of data generated by the k-means method shows spatial patterns, we create the parameter Ω (index of neighborhood). High Ω shows more aggregated data, low Ω indicates dispersed or data without spatial correlation. With help of this index and the method of Monte Carlo. Using Ω index we verify that random cluster data shows a distribution of Ω that is lower than actual cluster Ω. Thus we conclude that the data of H obtained in 53 wells are grouped and can be used to characterize space patterns. The analysis of curves level confirmed the results of the k-means
Resumo:
The study of complex systems has become a prestigious area of science, although relatively young . Its importance was demonstrated by the diversity of applications that several studies have already provided to various fields such as biology , economics and Climatology . In physics , the approach of complex systems is creating paradigms that influence markedly the new methods , bringing to Statistical Physics problems macroscopic level no longer restricted to classical studies such as those of thermodynamics . The present work aims to make a comparison and verification of statistical data on clusters of profiles Sonic ( DT ) , Gamma Ray ( GR ) , induction ( ILD ) , neutron ( NPHI ) and density ( RHOB ) to be physical measured quantities during exploratory drilling of fundamental importance to locate , identify and characterize oil reservoirs . Software were used : Statistica , Matlab R2006a , Origin 6.1 and Fortran for comparison and verification of the data profiles of oil wells ceded the field Namorado School by ANP ( National Petroleum Agency ) . It was possible to demonstrate the importance of the DFA method and that it proved quite satisfactory in that work, coming to the conclusion that the data H ( Hurst exponent ) produce spatial data with greater congestion . Therefore , we find that it is possible to find spatial pattern using the Hurst coefficient . The profiles of 56 wells have confirmed the existence of spatial patterns of Hurst exponents , ie parameter B. The profile does not directly assessed catalogs verification of geological lithology , but reveals a non-random spatial distribution
Resumo:
This work demonstrates the importance of using tools used in geographic information systems (GIS) and spatial data analysis (SDA) for the study of infectious diseases. Analysis methods were used to describe more fully the spatial distribution of a particular disease by incorporating the geographical element in the analysis. In Chapter 1, we report the historical evolution of these techniques in the field of human health and use Hansen s disease (leprosy) in Rio Grande do Norte as an example. In Chapter 2, we introduced a few basic theoretical concepts on the methodology and classified the types of spatial data commonly treated. Chapters 3 and 4 defined and demonstrated the use of the two most important techniques for analysis of health data, which are data point processes and data area. We modelled the case distribution of Hansen s disease in the city of Mossoró - RN. In the analysis, we used R scripts and made available routines and analitical procedures developed by the author. This approach can be easily used by researchers in several areas. As practical results, major risk areas in Mossoró leprosy were detected, and its association with the socioeconomic profile of the population at risk was found. Moreover, it is clearly shown that his approach could be of great help to be used continuously in data analysis and processing, allowing the development of new strategies to work might increase the use of such techniques in data analysis in health care
Resumo:
This work aims to study the problem of the formal job in the Brazilian Northeast region and its effect in the social inclusion, taking for base the analysis of variables defined in the Atlas of Social Exclusion, which is based on the 2000 Brazilian Census, choosing the county as unit of analysis. As methodological options, an exploratory data analysis was performed, followed by multivariate statistical techniques, such as weighted multiple regression analysis, cluster analysis and exploratory analysis of spatial data. The results pointed out to low rates of formal job for the active age population as well as low indexes of social inclusion in the Northeast region of Brazil. A strong association of the formal job with the indicators of social inclusion under investigation, was evidenced (schooling, inequality, poverty, youth and income form government transfers), as well as a strong association of the formal job with the new index of social inclusion (IIS), modified from the IES. At the Federative Units, in which better levels of formal job had been found, good indexes of social inclusion are also observed. Highlights for the state of the Rio Grande do Norte, with the best conditions of life, and for the states of the Maranhão and Piauí, with the worst conditions. The situation of the Northeast region, facing the indicators under study, is very precarious, claiming for the necessity of emphasizing programs and governmental actions, specially directed to the raise of formal job levels of the region, reflecting, thus, in improvements on the income inequality, as well as in the social inclusion of the population of Northeastern natives.
Resumo:
The relevance of rising healthcare costs is a main topic in complementary health companies in Brazil. In 2011, these expenses consumed more than 80% of the monthly health insurance in Brazil. Considering the administrative costs, it is observed that the companies operating in this market work, on average, at the threshold between profit and loss. This paper presents results after an investigation of the welfare costs of a health plan company in Brazil. It was based on the KDD process and explorative Data Mining. A diversity of results is presented, such as data summarization, providing compact descriptions of the data, revealing common features and intrinsic observations. Among the key findings was observed that a small portion of the population is responsible for the most demanding of resources devoted to health care
Resumo:
Currently, one of the biggest challenges for the field of data mining is to perform cluster analysis on complex data. Several techniques have been proposed but, in general, they can only achieve good results within specific areas providing no consensus of what would be the best way to group this kind of data. In general, these techniques fail due to non-realistic assumptions about the true probability distribution of the data. Based on this, this thesis proposes a new measure based on Cross Information Potential that uses representative points of the dataset and statistics extracted directly from data to measure the interaction between groups. The proposed approach allows us to use all advantages of this information-theoretic descriptor and solves the limitations imposed on it by its own nature. From this, two cost functions and three algorithms have been proposed to perform cluster analysis. As the use of Information Theory captures the relationship between different patterns, regardless of assumptions about the nature of this relationship, the proposed approach was able to achieve a better performance than the main algorithms in literature. These results apply to the context of synthetic data designed to test the algorithms in specific situations and to real data extracted from problems of different fields
Resumo:
The use of Geographic Information Systems (GIS) has becoming very important in fields where detailed and precise study of earth surface features is required. Applications in environmental protection are such an example that requires the use of GIS tools for analysis and decision by managers and enrolled community of protected areas. In this specific field, a challenge that remains is to build a GIS that can be dynamically fed with data, allowing researchers and other agents to recover actual and up to date information. In some cases, data is acquired in several ways and come from different sources. To solve this problem, some tools were implemented that includes a model for spatial data treatment on the Web. The research issues involved start with the feeding and processing of environmental control data collected in-loco as biotic and geological variables and finishes with the presentation of all information on theWeb. For this dynamic processing, it was developed some tools that make MapServer more flexible and dynamic, allowing data uploading by the proper users. Furthermore, it was also developed a module that uses interpolation to aiming spatial data analysis. A complex application that has validated this research is to feed the system with data coming from coral reef regions located in northeast of Brazil. The system was implemented using the best interactivity concept provided by the AJAX model and resulted in a substantial contribution for efficiently accessing information, being an essential mechanism for controlling events in the environmental monitoring
Resumo:
The opening of the Brazilian market of electricity and competitiveness between companies in the energy sector make the search for useful information and tools that will assist in decision making activities, increase by the concessionaires. An important source of knowledge for these utilities is the time series of energy demand. The identification of behavior patterns and description of events become important for the planning execution, seeking improvements in service quality and financial benefits. This dissertation presents a methodology based on mining and representation tools of time series, in order to extract knowledge that relate series of electricity demand in various substations connected of a electric utility. The method exploits the relationship of duration, coincidence and partial order of events in multi-dimensionals time series. To represent the knowledge is used the language proposed by Mörchen (2005) called Time Series Knowledge Representation (TSKR). We conducted a case study using time series of energy demand of 8 substations interconnected by a ring system, which feeds the metropolitan area of Goiânia-GO, provided by CELG (Companhia Energética de Goiás), responsible for the service of power distribution in the state of Goiás (Brazil). Using the proposed methodology were extracted three levels of knowledge that describe the behavior of the system studied, representing clearly the system dynamics, becoming a tool to assist planning activities
Resumo:
Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature
Resumo:
The objective of this work is to identify, to chart and to explain the evolution of the soil occupation and the envirionment vulnerability of the areas of Canto do Amaro and Alto da Pedra, in the city of Mossoró-RN, having as base analyzes it multiweather of images of orbital remote sensors, the accomplishment of extensive integrated works of field to a Geographic Information System (GIS). With the use of inserted techniques of it analyzes space inserted in a (GIS), and related with the interpretation and analyzes of products that comes from the Remote Sensoriamento (RS.), make possible resulted significant to reach the objectives of this works. Having as support for the management of the information, the data set gotten of the most varied sources and stored in digital environment, it comes to constitute the geographic data base of this research. The previous knowledge of the spectral behavior of the natural or artificial targets, and the use of algorithms of Processing of Digital images (DIP), it facilitates the interpretation task sufficiently and searchs of new information on the spectral level. Use as background these data, was generated a varied thematic cartography was: Maps of Geology, Geomorfológicals Units soils, Vegetation and Use and Occupation of the soil. The crossing in environment SIG, of the above-mentioned maps, generated the maps of Natural and Vulnerability envirionmental of the petroliferous fields of I Canto do Amaro and Alto da Pedra-RN, working in an ambient centered in the management of waters and solid residuos, as well as the analysis of the spatial data, making possible then a more complex analysis of the studied area