919 resultados para exploratory spatial data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data visualization techniques are powerful in the handling and analysis of multivariate systems. One such technique known as parallel coordinates was used to support the diagnosis of an event, detected by a neural network-based monitoring system, in a boiler at a Brazilian Kraft pulp mill. Its attractiveness is the possibility of the visualization of several variables simultaneously. The diagnostic procedure was carried out step-by-step going through exploratory, explanatory, confirmatory, and communicative goals. This tool allowed the visualization of the boiler dynamics in an easier way, compared to commonly used univariate trend plots. In addition it facilitated analysis of other aspects, namely relationships among process variables, distinct modes of operation and discrepant data. The whole analysis revealed firstly that the period involving the detected event was associated with a transition between two distinct normal modes of operation, and secondly the presence of unusual changes in process variables at this time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: In a classical study, Durkheim noted a direct relation between suicide rates and wealth in the XIX century France. Since that time, several studies have verified this relationship. It is known that suicide rates are associated with income, although the direction of this association varies worldwide. Brazil presents a heterogeneous distribution of income and suicide across its territory; however, evaluation for an association between these variables has shown mixed results. We aimed to evaluate the relationship between suicide rates and income in Brazil, State of Sao Paulo (SP), and City of SP, considering geographical area and temporal trends. Methods: Data were extracted from the National and State official statistics departments. Three socioeconomic areas were considered according to income, from the wealthiest (area 1) to the poorest (area 3). We also considered three regions: country-wide (27 Brazilian States and 558 Brazilian micro-regions), state-wide (645 counties of SP State), and city-wide (96 districts of SP city). Relative risks (RR) were calculated among areas 1, 2, and 3 for all regions, in a cross-sectional approach. Then, we used Joinpoint analysis to explore the temporal trends of suicide rates and SaTScan to investigate geographical clusters of high/low suicide rates across the territory. Results: Suicide rates in Brazil, the State of SP, and the city of SP were 6.2, 6.6, and 5.4 per 100,000, respectively. Taking suicide rates of the poorest area (3) as reference, the RR for the wealthiest area was 1.64, 0.88, and 1.65 for Brazil, State of SP, and city of SP, respectively (p for trend <0.05 for all analyses). Spatial cluster of high suicide rates were identified at Brazilian southern (RR = 2.37), state of SP western (RR = 1.32), and city of SP central (RR = 1.65) regions. A direct association between income and suicide were found for Brazil (OR = 2.59) and the city of SP (OR = 1.07), and an inverse association for the state of SP (OR = 0.49). Conclusions: Temporospatial analyses revealed higher suicide rates in wealthier areas in Brazil and the city of SP and in poorer areas in the State of SP. We further discuss the role of socioeconomic characteristics for explaining these discrepancies and the importance of our findings in public health policies. Similar studies in other Brazilian States and developing countries are warranted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background In a classical study, Durkheim noted a direct relation between suicide rates and wealth in the XIX century France. Since that time, several studies have verified this relationship. It is known that suicide rates are associated with income, although the direction of this association varies worldwide. Brazil presents a heterogeneous distribution of income and suicide across its territory; however, evaluation for an association between these variables has shown mixed results. We aimed to evaluate the relationship between suicide rates and income in Brazil, State of São Paulo (SP), and City of SP, considering geographical area and temporal trends. Methods Data were extracted from the National and State official statistics departments. Three socioeconomic areas were considered according to income, from the wealthiest (area 1) to the poorest (area 3). We also considered three regions: country-wide (27 Brazilian States and 558 Brazilian micro-regions), state-wide (645 counties of SP State), and city-wide (96 districts of SP city). Relative risks (RR) were calculated among areas 1, 2, and 3 for all regions, in a cross-sectional approach. Then, we used Joinpoint analysis to explore the temporal trends of suicide rates and SaTScan to investigate geographical clusters of high/low suicide rates across the territory. Results Suicide rates in Brazil, the State of SP, and the city of SP were 6.2, 6.6, and 5.4 per 100,000, respectively. Taking suicide rates of the poorest area (3) as reference, the RR for the wealthiest area was 1.64, 0.88, and 1.65 for Brazil, State of SP, and city of SP, respectively (p for trend <0.05 for all analyses). Spatial cluster of high suicide rates were identified at Brazilian southern (RR = 2.37), state of SP western (RR = 1.32), and city of SP central (RR = 1.65) regions. A direct association between income and suicide were found for Brazil (OR = 2.59) and the city of SP (OR = 1.07), and an inverse association for the state of SP (OR = 0.49). Conclusions Temporospatial analyses revealed higher suicide rates in wealthier areas in Brazil and the city of SP and in poorer areas in the State of SP. We further discuss the role of socioeconomic characteristics for explaining these discrepancies and the importance of our findings in public health policies. Similar studies in other Brazilian States and developing countries are warranted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Large amounts of information can be overwhelming and costly to process, especially when transmitting data over a network. A typical modern Geographical Information System (GIS) brings all types of data together based on the geographic component of the data and provides simple point-and-click query capabilities as well as complex analysis tools. Querying a Geographical Information System, however, can be prohibitively expensive due to the large amounts of data which may need to be processed. Since the use of GIS technology has grown dramatically in the past few years, there is now a need more than ever, to provide users with the fastest and least expensive query capabilities, especially since an approximated 80 % of data stored in corporate databases has a geographical component. However, not every application requires the same, high quality data for its processing. In this paper we address the issues of reducing the cost and response time of GIS queries by preaggregating data by compromising the data accuracy and precision. We present computational issues in generation of multi-level resolutions of spatial data and show that the problem of finding the best approximation for the given region and a real value function on this region, under a predictable error, in general is "NP-complete.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wind-generated waves in the Kara, Laptev, and East-Siberian Seas are investigated using altimeter data from Envisat RA-2 and SARAL-AltiKa. Only isolated ice-free zones had been selected for analysis. Wind seas can be treated as pure wind-generated waves without any contamination by ambient swell. Such zones were identified using ice concentration data from microwave radiometers. Altimeter data, both significant wave height (SWH) and wind speed, for these areas were further obtained for the period 2002-2012 using Envisat RA-2 measurements, and for 2013 using SARAL-AltiKa. Dependencies of dimensionless SWH and wavelength on dimensionless wave generation spatial scale are compared to known empirical dependencies for fetch-limited wind wave development. We further check sensitivity of Ka- and Ku-band and discuss new possibilities that AltiKa's higher resolution can open.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At national and European levels, in various projects, data products are developed to provide end-users and stakeholders with homogeneously qualified observation compilation or analysis. Ifremer has developed a spatial data infrastructure for marine environment, called Sextant, in order to manage, share and retrieve these products for its partners and the general public. Thanks to the OGC and ISO standard and INSPIRE compliance, the infrastructure provides a unique framework to federate homogeneous descriptions and access to marine data products processed in various contexts, at national level or European level for DG research (SeaDataNet), DG Mare (EMODNET) and DG Growth (Copernicus MEMS). The discovery service of Sextant is based on the metadata catalogue. The data description is normalized according to ISO 191XX series standards and Inspire recommendations. Access to the catalogue is provided by the standard OGC service, Catalogue Service for the Web (CSW 2.0.2). Data visualization and data downloading are available through standard OGC services, Web Map Services (WMS) and Web Feature Services (WFS). Several OGC services are provided within Sextant, according to marine themes, regions and projects. Depending on the file format, WMTS services are used for large images, such as hyperspectral images, or NcWMS services for gridded data, such as climatology models. New functions are developped to improve the visualization, analyse and access to data, eg : data filtering, online spatial processing with WPS services and acces to sensor data with SOS services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of three analytical methods for multiple-frequency bioelectrical impedance analysis (MFBIA) data was assessed. The methods were the established method of Cole and Cole, the newly proposed method of Siconolfi and co-workers and a modification of this procedure. Method performance was assessed from the adequacy of the curve fitting techniques, as judged by the correlation coefficient and standard error of the estimate, and the accuracy of the different methods in determining the theoretical values of impedance parameters describing a set of model electrical circuits. The experimental data were well fitted by all curve-fitting procedures (r = 0.9 with SEE 0.3 to 3.5% or better for most circuit-procedure combinations). Cole-Cole modelling provided the most accurate estimates of circuit impedance values, generally within 1-2% of the theoretical values, followed by the Siconolfi procedure using a sixth-order polynomial regression (1-6% variation). None of the methods, however, accurately estimated circuit parameters when the measured impedances were low (<20 Omega) reflecting the electronic limits of the impedance meter used. These data suggest that Cole-Cole modelling remains the preferred method for the analysis of MFBIA data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In an investigation intended to determine training needs of night crews, Bowers et al. (1998, this issue) report two studies showing that the patterning of communication is a better discriminator of good and poor crews than is the content of communication. Bowers et al. characterize their studies as intended to generate hypotheses for training needs and draw connections with Exploratory Sequential Data Analysis (ESDA). Although applauding the intentions of Bowers ct al., we point out some concerns with their characterization and implementation of ESDA. Our principal concern is that the Bowers et al. exploration of the data does not convincingly lead them back to a better fundamental understanding of the original phenomena they are investigating.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The principle of using induction rules based on spatial environmental data to model a soil map has previously been demonstrated Whilst the general pattern of classes of large spatial extent and those with close association with geology were delineated small classes and the detailed spatial pattern of the map were less well rendered Here we examine several strategies to improve the quality of the soil map models generated by rule induction Terrain attributes that are better suited to landscape description at a resolution of 250 m are introduced as predictors of soil type A map sampling strategy is developed Classification error is reduced by using boosting rather than cross validation to improve the model Further the benefit of incorporating the local spatial context for each environmental variable into the rule induction is examined The best model was achieved by sampling in proportion to the spatial extent of the mapped classes boosting the decision trees and using spatial contextual information extracted from the environmental variables.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The identification, modeling, and analysis of interactions between nodes of neural systems in the human brain have become the aim of interest of many studies in neuroscience. The complex neural network structure and its correlations with brain functions have played a role in all areas of neuroscience, including the comprehension of cognitive and emotional processing. Indeed, understanding how information is stored, retrieved, processed, and transmitted is one of the ultimate challenges in brain research. In this context, in functional neuroimaging, connectivity analysis is a major tool for the exploration and characterization of the information flow between specialized brain regions. In most functional magnetic resonance imaging (fMRI) studies, connectivity analysis is carried out by first selecting regions of interest (ROI) and then calculating an average BOLD time series (across the voxels in each cluster). Some studies have shown that the average may not be a good choice and have suggested, as an alternative, the use of principal component analysis (PCA) to extract the principal eigen-time series from the ROI(s). In this paper, we introduce a novel approach called cluster Granger analysis (CGA) to study connectivity between ROIs. The main aim of this method was to employ multiple eigen-time series in each ROI to avoid temporal information loss during identification of Granger causality. Such information loss is inherent in averaging (e.g., to yield a single ""representative"" time series per ROI). This, in turn, may lead to a lack of power in detecting connections. The proposed approach is based on multivariate statistical analysis and integrates PCA and partial canonical correlation in a framework of Granger causality for clusters (sets) of time series. We also describe an algorithm for statistical significance testing based on bootstrapping. By using Monte Carlo simulations, we show that the proposed approach outperforms conventional Granger causality analysis (i.e., using representative time series extracted by signal averaging or first principal components estimation from ROIs). The usefulness of the CGA approach in real fMRI data is illustrated in an experiment using human faces expressing emotions. With this data set, the proposed approach suggested the presence of significantly more connections between the ROIs than were detected using a single representative time series in each ROI. (c) 2010 Elsevier Inc. All rights reserved.