919 resultados para exploratory spatial data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Teaching and research are organised differently between subject domains: attempts to construct typologies of higher education institutions, however, often do not include quantitative indicators concerning subject mix which would allow systematic comparisons of large numbers of higher education institutions among different countries, as the availability of data for such indicators is limited. In this paper, we present an exploratory approach for the construction of such indicators. The database constructed in the AQUAMETH project, which includes also data disaggregated at the disciplinary level, is explored with the aim of understanding patterns of subject mix. For six European countries, an exploratory and descriptive analysis of staff composition divided in four large domains (medical sciences, engineering and technology, natural sciences and social sciences and humanities) is performed, which leads to a classification distinguishing between specialist and generalist institutions. Among the latter, a further distinction is made based on the presence or absence of a medical department. Preliminary exploration of this classification and its comparison with other indicators show the influence of long term dynamics on the subject mix of individual higher education institutions, but also underline disciplinary differences, for example regarding student to staff ratios, as well as national patterns, for example regarding the number of PhD degrees per 100 undergraduate students. Despite its many limitations, this exploratory approach allows defining a classification of higher education institutions that accounts for a large share of differences between the analysed higher education institutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several eco-toxicological studies have shown that insectivorous mammals, due to theirfeeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uenceon essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P,S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greaterwhite-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control(Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can givemisleading results. Therefore, to improve the interpretation of the data obtained, weused statistical techniques for compositional data analysis to define groups of metalsand to evaluate the relationships between them, from an inter-population viewpoint.Hypothesis testing on the adequate balance-coordinates allow us to confirm intuitionbased hypothesis and some previous results. The main statistical goal was to test equalmeans of balance-coordinates for the two defined populations. After checking normality,one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper deals with the problem of spatial data mapping. A new method based on wavelet interpolation and geostatistical prediction (kriging) is proposed. The method - wavelet analysis residual kriging (WARK) - is developed in order to assess the problems rising for highly variable data in presence of spatial trends. In these cases stationary prediction models have very limited application. Wavelet analysis is used to model large-scale structures and kriging of the remaining residuals focuses on small-scale peculiarities. WARK is able to model spatial pattern which features multiscale structure. In the present work WARK is applied to the rainfall data and the results of validation are compared with the ones obtained from neural network residual kriging (NNRK). NNRK is also a residual-based method, which uses artificial neural network to model large-scale non-linear trends. The comparison of the results demonstrates the high quality performance of WARK in predicting hot spots, reproducing global statistical characteristics of the distribution and spatial correlation structure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Polistine wasps are important in Neotropical ecosystems due to their ubiquity and diversity. Inventories have not adequately considered spatial attributes of collected specimens. Spatial data on biodiversity are important for study and mitigation of anthropogenic impacts over natural ecosystems and for protecting species. We described and analyzed local-scale spatial patterns of collecting records of wasp species, as well as spatial variation of diversity descriptors in a 2500-hectare area of an Amazon forest in Brazil. Rare species comprised the largest fraction of the fauna. Close range spatial effects were detected for most of the more common species, with clustering of presence-data at short distances. Larger spatial lag effects could also be identified in some species, constituting probably cases of exogenous autocorrelation and candidates for explanations based on environmental factors. In a few cases, significant or near significant correlations were found between five species (of Agelaia, Angiopolybia, and Mischocyttarus) and three studied environmental variables: distance to nearest stream, terrain altitude, and the type of forest canopy. However, association between these factors and biodiversity variables were generally low. When used as predictors of polistine richness in a linear multiple regression, only the coefficient for the forest canopy variable resulted significant. Some level of prediction of wasp diversity variables can be attained based on environmental variables, especially vegetation structure. Large-scale landscape and regional studies should be scheduled to address this issue.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Whether for investigative or intelligence aims, crime analysts often face up the necessity to analyse the spatiotemporal distribution of crimes or traces left by suspects. This article presents a visualisation methodology supporting recurrent practical analytical tasks such as the detection of crime series or the analysis of traces left by digital devices like mobile phone or GPS devices. The proposed approach has led to the development of a dedicated tool that has proven its effectiveness in real inquiries and intelligence practices. It supports a more fluent visual analysis of the collected data and may provide critical clues to support police operations as exemplified by the presented case studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper deals with the development and application of the generic methodology for automatic processing (mapping and classification) of environmental data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve the problem of spatial data mapping (regression). The Probabilistic Neural Network (PNN) is considered as an automatic tool for spatial classifications. The automatic tuning of isotropic and anisotropic GRNN/PNN models using cross-validation procedure is presented. Results are compared with the k-Nearest-Neighbours (k-NN) interpolation algorithm using independent validation data set. Real case studies are based on decision-oriented mapping and classification of radioactively contaminated territories.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantitative information from magnetic resonance imaging (MRI) may substantiate clinical findings and provide additional insight into the mechanism of clinical interventions in therapeutic stroke trials. The PERFORM study is exploring the efficacy of terutroban versus aspirin for secondary prevention in patients with a history of ischemic stroke. We report on the design of an exploratory longitudinal MRI follow-up study that was performed in a subgroup of the PERFORM trial. An international multi-centre longitudinal follow-up MRI study was designed for different MR systems employing safety and efficacy readouts: new T2 lesions, new DWI lesions, whole brain volume change, hippocampal volume change, changes in tissue microstructure as depicted by mean diffusivity and fractional anisotropy, vessel patency on MR angiography, and the presence of and development of new microbleeds. A total of 1,056 patients (men and women ≥ 55 years) were included. The data analysis included 3D reformation, image registration of different contrasts, tissue segmentation, and automated lesion detection. This large international multi-centre study demonstrates how new MRI readouts can be used to provide key information on the evolution of cerebral tissue lesions and within the macrovasculature after atherothrombotic stroke in a large sample of patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Office of Special Investigations at Iowa Department of Transportation (DOT) collects FWD data on regular basis to evaluate pavement structural conditions. The primary objective of this study was to develop a fully-automated software system for rapid processing of the FWD data along with a user manual. The software system automatically reads the FWD raw data collected by the JILS-20 type FWD machine that Iowa DOT owns, processes and analyzes the collected data with the rapid prediction algorithms developed during the phase I study. This system smoothly integrates the FWD data analysis algorithms and the computer program being used to collect the pavement deflection data. This system can be used to assess pavement condition, estimate remaining pavement life, and eventually help assess pavement rehabilitation strategies by the Iowa DOT pavement management team. This report describes the developed software in detail and can also be used as a user-manual for conducting simulation studies and detailed analyses. *********************** Large File ***********************

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Geophysical techniques can help to bridge the inherent gap with regard to spatial resolution and the range of coverage that plagues classical hydrological methods. This has lead to the emergence of the new and rapidly growing field of hydrogeophysics. Given the differing sensitivities of various geophysical techniques to hydrologically relevant parameters and their inherent trade-off between resolution and range the fundamental usefulness of multi-method hydrogeophysical surveys for reducing uncertainties in data analysis and interpretation is widely accepted. A major challenge arising from such endeavors is the quantitative integration of the resulting vast and diverse database in order to obtain a unified model of the probed subsurface region that is internally consistent with all available data. To address this problem, we have developed a strategy towards hydrogeophysical data integration based on Monte-Carlo-type conditional stochastic simulation that we consider to be particularly suitable for local-scale studies characterized by high-resolution and high-quality datasets. Monte-Carlo-based optimization techniques are flexible and versatile, allow for accounting for a wide variety of data and constraints of differing resolution and hardness and thus have the potential of providing, in a geostatistical sense, highly detailed and realistic models of the pertinent target parameter distributions. Compared to more conventional approaches of this kind, our approach provides significant advancements in the way that the larger-scale deterministic information resolved by the hydrogeophysical data can be accounted for, which represents an inherently problematic, and as of yet unresolved, aspect of Monte-Carlo-type conditional simulation techniques. We present the results of applying our algorithm to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the local-scale porosity structure. Our procedure is first tested on pertinent synthetic data and then applied to corresponding field data collected at the Boise Hydrogeophysical Research Site near Boise, Idaho, USA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The book presents the state of the art in machine learning algorithms (artificial neural networks of different architectures, support vector machines, etc.) as applied to the classification and mapping of spatially distributed environmental data. Basic geostatistical algorithms are presented as well. New trends in machine learning and their application to spatial data are given, and real case studies based on environmental and pollution data are carried out. The book provides a CD-ROM with the Machine Learning Office software, including sample sets of data, that will allow both students and researchers to put the concepts rapidly to practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Global positioning systems (GPS) offer a cost-effective and efficient method to input and update transportation data. The spatial location of objects provided by GPS is easily integrated into geographic information systems (GIS). The storage, manipulation, and analysis of spatial data are also relatively simple in a GIS. However, many data storage and reporting methods at transportation agencies rely on linear referencing methods (LRMs); consequently, GPS data must be able to link with linear referencing. Unfortunately, the two systems are fundamentally incompatible in the way data are collected, integrated, and manipulated. In order for the spatial data collected using GPS to be integrated into a linear referencing system or shared among LRMs, a number of issues need to be addressed. This report documents and evaluates several of those issues and offers recommendations. In order to evaluate the issues associated with integrating GPS data with a LRM, a pilot study was created. To perform the pilot study, point features, a linear datum, and a spatial representation of a LRM were created for six test roadway segments that were located within the boundaries of the pilot study conducted by the Iowa Department of Transportation linear referencing system project team. Various issues in integrating point features with a LRM or between LRMs are discussed and recommendations provided. The accuracy of the GPS is discussed, including issues such as point features mapping to the wrong segment. Another topic is the loss of spatial information that occurs when a three-dimensional or two-dimensional spatial point feature is converted to a one-dimensional representation on a LRM. Recommendations such as storing point features as spatial objects if necessary or preserving information such as coordinates and elevation are suggested. The lack of spatial accuracy characteristic of most cartography, on which LRM are often based, is another topic discussed. The associated issues include linear and horizontal offset error. The final topic discussed is some of the issues in transferring point feature data between LRMs.