924 resultados para Spatial data analysis
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by a simplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able to generate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow defining monitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the Rodrigues Triple Junction in the Indian Ocean were studied applying classical statistical methods (fuzzy c-means clustering, linear mixing model, principal component analysis) for the extraction of endmembers and evaluating the spatial and temporal variation of geochemical signals. Three main factors of sedimentation were expected by the marine geologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. The display of fuzzy membership values and/or factor scores versus depth provided consistent results for two factors only; the ultra-basic component could not be identified. The reason for this may be that only traditional statistical methods were applied, i.e. the untransformed components were used and the cosine-theta coefficient as similarity measure. During the last decade considerable progress in compositional data analysis was made and many case studies were published using new tools for exploratory analysis of these data. Therefore it makes sense to check if the application of suitable data transformations, reduction of the D-part simplex to two or three factors and visual interpretation of the factor scores would lead to a revision of earlier results and to answers to open questions . In this paper we follow the lines of a paper of R. Tolosana- Delgado et al. (2005) starting with a problem-oriented interpretation of the biplot scattergram, extracting compositional factors, ilr-transformation of the components and visualization of the factor scores in a spatial context: The compositional factors will be plotted versus depth (time) of the core samples in order to facilitate the identification of the expected sources of the sedimentary process. Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
The increase in the number of spatial data collected has motivated the development of geovisualisation techniques, aiming to provide an important resource to support the extraction of knowledge and decision making. One of these techniques are 3D graphs, which provides a dynamic and flexible increase of the results analysis obtained by the spatial data mining algorithms, principally when there are incidences of georeferenced objects in a same local. This work presented as an original contribution the potentialisation of visual resources in a computational environment of spatial data mining and, afterwards, the efficiency of these techniques is demonstrated with the use of a real database. The application has shown to be very interesting in interpreting obtained results, such as patterns that occurred in a same locality and to provide support for activities which could be done as from the visualisation of results. © 2013 Springer-Verlag.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The increase in new electronic devices had generated a considerable increase in obtaining spatial data information; hence these data are becoming more and more widely used. As well as for conventional data, spatial data need to be analyzed so interesting information can be retrieved from them. Therefore, data clustering techniques can be used to extract clusters of a set of spatial data. However, current approaches do not consider the implicit semantics that exist between a region and an object’s attributes. This paper presents an approach that enhances spatial data mining process, so they can use the semantic that exists within a region. A framework was developed, OntoSDM, which enables spatial data mining algorithms to communicate with ontologies in order to enhance the algorithm’s result. The experiments demonstrated a semantically improved result, generating more interesting clusters, therefore reducing manual analysis work of an expert.
Resumo:
Background: In a classical study, Durkheim noted a direct relation between suicide rates and wealth in the XIX century France. Since that time, several studies have verified this relationship. It is known that suicide rates are associated with income, although the direction of this association varies worldwide. Brazil presents a heterogeneous distribution of income and suicide across its territory; however, evaluation for an association between these variables has shown mixed results. We aimed to evaluate the relationship between suicide rates and income in Brazil, State of Sao Paulo (SP), and City of SP, considering geographical area and temporal trends. Methods: Data were extracted from the National and State official statistics departments. Three socioeconomic areas were considered according to income, from the wealthiest (area 1) to the poorest (area 3). We also considered three regions: country-wide (27 Brazilian States and 558 Brazilian micro-regions), state-wide (645 counties of SP State), and city-wide (96 districts of SP city). Relative risks (RR) were calculated among areas 1, 2, and 3 for all regions, in a cross-sectional approach. Then, we used Joinpoint analysis to explore the temporal trends of suicide rates and SaTScan to investigate geographical clusters of high/low suicide rates across the territory. Results: Suicide rates in Brazil, the State of SP, and the city of SP were 6.2, 6.6, and 5.4 per 100,000, respectively. Taking suicide rates of the poorest area (3) as reference, the RR for the wealthiest area was 1.64, 0.88, and 1.65 for Brazil, State of SP, and city of SP, respectively (p for trend <0.05 for all analyses). Spatial cluster of high suicide rates were identified at Brazilian southern (RR = 2.37), state of SP western (RR = 1.32), and city of SP central (RR = 1.65) regions. A direct association between income and suicide were found for Brazil (OR = 2.59) and the city of SP (OR = 1.07), and an inverse association for the state of SP (OR = 0.49). Conclusions: Temporospatial analyses revealed higher suicide rates in wealthier areas in Brazil and the city of SP and in poorer areas in the State of SP. We further discuss the role of socioeconomic characteristics for explaining these discrepancies and the importance of our findings in public health policies. Similar studies in other Brazilian States and developing countries are warranted.
Resumo:
Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.
Resumo:
Abstract Background In a classical study, Durkheim noted a direct relation between suicide rates and wealth in the XIX century France. Since that time, several studies have verified this relationship. It is known that suicide rates are associated with income, although the direction of this association varies worldwide. Brazil presents a heterogeneous distribution of income and suicide across its territory; however, evaluation for an association between these variables has shown mixed results. We aimed to evaluate the relationship between suicide rates and income in Brazil, State of São Paulo (SP), and City of SP, considering geographical area and temporal trends. Methods Data were extracted from the National and State official statistics departments. Three socioeconomic areas were considered according to income, from the wealthiest (area 1) to the poorest (area 3). We also considered three regions: country-wide (27 Brazilian States and 558 Brazilian micro-regions), state-wide (645 counties of SP State), and city-wide (96 districts of SP city). Relative risks (RR) were calculated among areas 1, 2, and 3 for all regions, in a cross-sectional approach. Then, we used Joinpoint analysis to explore the temporal trends of suicide rates and SaTScan to investigate geographical clusters of high/low suicide rates across the territory. Results Suicide rates in Brazil, the State of SP, and the city of SP were 6.2, 6.6, and 5.4 per 100,000, respectively. Taking suicide rates of the poorest area (3) as reference, the RR for the wealthiest area was 1.64, 0.88, and 1.65 for Brazil, State of SP, and city of SP, respectively (p for trend <0.05 for all analyses). Spatial cluster of high suicide rates were identified at Brazilian southern (RR = 2.37), state of SP western (RR = 1.32), and city of SP central (RR = 1.65) regions. A direct association between income and suicide were found for Brazil (OR = 2.59) and the city of SP (OR = 1.07), and an inverse association for the state of SP (OR = 0.49). Conclusions Temporospatial analyses revealed higher suicide rates in wealthier areas in Brazil and the city of SP and in poorer areas in the State of SP. We further discuss the role of socioeconomic characteristics for explaining these discrepancies and the importance of our findings in public health policies. Similar studies in other Brazilian States and developing countries are warranted.
Resumo:
Large amounts of information can be overwhelming and costly to process, especially when transmitting data over a network. A typical modern Geographical Information System (GIS) brings all types of data together based on the geographic component of the data and provides simple point-and-click query capabilities as well as complex analysis tools. Querying a Geographical Information System, however, can be prohibitively expensive due to the large amounts of data which may need to be processed. Since the use of GIS technology has grown dramatically in the past few years, there is now a need more than ever, to provide users with the fastest and least expensive query capabilities, especially since an approximated 80 % of data stored in corporate databases has a geographical component. However, not every application requires the same, high quality data for its processing. In this paper we address the issues of reducing the cost and response time of GIS queries by preaggregating data by compromising the data accuracy and precision. We present computational issues in generation of multi-level resolutions of spatial data and show that the problem of finding the best approximation for the given region and a real value function on this region, under a predictable error, in general is "NP-complete.
Resumo:
Wind-generated waves in the Kara, Laptev, and East-Siberian Seas are investigated using altimeter data from Envisat RA-2 and SARAL-AltiKa. Only isolated ice-free zones had been selected for analysis. Wind seas can be treated as pure wind-generated waves without any contamination by ambient swell. Such zones were identified using ice concentration data from microwave radiometers. Altimeter data, both significant wave height (SWH) and wind speed, for these areas were further obtained for the period 2002-2012 using Envisat RA-2 measurements, and for 2013 using SARAL-AltiKa. Dependencies of dimensionless SWH and wavelength on dimensionless wave generation spatial scale are compared to known empirical dependencies for fetch-limited wind wave development. We further check sensitivity of Ka- and Ku-band and discuss new possibilities that AltiKa's higher resolution can open.
Resumo:
At national and European levels, in various projects, data products are developed to provide end-users and stakeholders with homogeneously qualified observation compilation or analysis. Ifremer has developed a spatial data infrastructure for marine environment, called Sextant, in order to manage, share and retrieve these products for its partners and the general public. Thanks to the OGC and ISO standard and INSPIRE compliance, the infrastructure provides a unique framework to federate homogeneous descriptions and access to marine data products processed in various contexts, at national level or European level for DG research (SeaDataNet), DG Mare (EMODNET) and DG Growth (Copernicus MEMS). The discovery service of Sextant is based on the metadata catalogue. The data description is normalized according to ISO 191XX series standards and Inspire recommendations. Access to the catalogue is provided by the standard OGC service, Catalogue Service for the Web (CSW 2.0.2). Data visualization and data downloading are available through standard OGC services, Web Map Services (WMS) and Web Feature Services (WFS). Several OGC services are provided within Sextant, according to marine themes, regions and projects. Depending on the file format, WMTS services are used for large images, such as hyperspectral images, or NcWMS services for gridded data, such as climatology models. New functions are developped to improve the visualization, analyse and access to data, eg : data filtering, online spatial processing with WPS services and acces to sensor data with SOS services.
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.
Resumo:
Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.
Resumo:
A combination of deductive reasoning, clustering, and inductive learning is given as an example of a hybrid system for exploratory data analysis. Visualization is replaced by a dialogue with the data.