39 resultados para exploratory spatial data analysis
Resumo:
A wireless sensor network (WSN) is a group of sensors linked by wireless medium to perform distributed sensing tasks. WSNs have attracted a wide interest from academia and industry alike due to their diversity of applications, including home automation, smart environment, and emergency services, in various buildings. The primary goal of a WSN is to collect data sensed by sensors. These data are characteristic of being heavily noisy, exhibiting temporal and spatial correlation. In order to extract useful information from such data, as this paper will demonstrate, people need to utilise various techniques to analyse the data. Data mining is a process in which a wide spectrum of data analysis methods is used. It is applied in the paper to analyse data collected from WSNs monitoring an indoor environment in a building. A case study is given to demonstrate how data mining can be used to optimise the use of the office space in a building.
Resumo:
The organization of non-crystalline polymeric materials at a local level, namely on a spatial scale between a few and 100 a, is still unclear in many respects. The determination of the local structure in terms of the configuration and conformation of the polymer chain and of the packing characteristics of the chain in the bulk material represents a challenging problem. Data from wide-angle diffraction experiments are very difficult to interpret due to the very large amount of information that they carry, that is the large number of correlations present in the diffraction patterns.We describe new approaches that permit a detailed analysis of the complex neutron diffraction patterns characterizing polymer melts and glasses. The coupling of different computer modelling strategies with neutron scattering data over a wide Q range allows the extraction of detailed quantitative information on the structural arrangements of the materials of interest. Proceeding from modelling routes as diverse as force field calculations, single-chain modelling and reverse Monte Carlo, we show the successes and pitfalls of each approach in describing model systems, which illustrate the need to attack the data analysis problem simultaneously from several fronts.
Resumo:
Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.
Resumo:
Virtual globe technology holds many exciting possibilities for environmental science. These easy-to-use, intuitive systems provide means for simultaneously visualizing four-dimensional environmental data from many different sources, enabling the generation of new hypotheses and driving greater understanding of the Earth system. Through the use of simple markup languages, scientists can publish and consume data in interoperable formats without the need for technical assistance. In this paper we give, with examples from our own work, a number of scientific uses for virtual globes, demonstrating their particular advantages. We explain how we have used Web Services to connect virtual globes with diverse data sources and enable more sophisticated usage such as data analysis and collaborative visualization. We also discuss the current limitations of the technology, with particular regard to the visualization of subsurface data and vertical sections.
Resumo:
Soil data and reliable soil maps are imperative for environmental management. conservation and policy. Data from historical point surveys, e.g. experiment site data and farmers fields can serve this purpose. However, legacy soil information is not necessarily collected for spatial analysis and mapping such that the data may not have immediately useful geo-references. Methods are required to utilise these historical soil databases so that we can produce quantitative maps of soil propel-ties to assess spatial and temporal trends but also to assess where future sampling is required. This paper discusses two such databases: the Representative Soil Sampling Scheme which has monitored the agricultural soil in England and Wales from 1969 to 2003 (between 400 and 900 bulked soil samples were taken annually from different agricultural fields); and the former State Chemistry Laboratory, Victoria, Australia where between 1973 and 1994 approximately 80,000 soil samples were submitted for analysis by farmers. Previous statistical analyses have been performed using administrative regions (with sharp boundaries) for both databases, which are largely unrelated to natural features. For a more detailed spatial analysis that call be linked to climate and terrain attributes, gradual variation of these soil properties should be described. Geostatistical techniques such as ordinary kriging are suited to this. This paper describes the format of the databases and initial approaches as to how they can be used for digital soil mapping. For this paper we have selected soil pH to illustrate the analyses for both databases.
Resumo:
While over-dispersion in capture–recapture studies is well known to lead to poor estimation of population size, current diagnostic tools to detect the presence of heterogeneity have not been specifically developed for capture–recapture studies. To address this, a simple and efficient method of testing for over-dispersion in zero-truncated count data is developed and evaluated. The proposed method generalizes an over-dispersion test previously suggested for un-truncated count data and may also be used for testing residual over-dispersion in zero-inflation data. Simulations suggest that the asymptotic distribution of the test statistic is standard normal and that this approximation is also reasonable for small sample sizes. The method is also shown to be more efficient than an existing test for over-dispersion adapted for the capture–recapture setting. Studies with zero-truncated and zero-inflated count data are used to illustrate the test procedures.
Resumo:
In survival analysis frailty is often used to model heterogeneity between individuals or correlation within clusters. Typically frailty is taken to be a continuous random effect, yielding a continuous mixture distribution for survival times. A Bayesian analysis of a correlated frailty model is discussed in the context of inverse Gaussian frailty. An MCMC approach is adopted and the deviance information criterion is used to compare models. As an illustration of the approach a bivariate data set of corneal graft survival times is analysed. (C) 2006 Elsevier B.V. All rights reserved.
Resumo:
In previous empirical and modelling studies of rare species and weeds, evidence of fractal behaviour has been found. We propose that weeds in modern agricultural systems may be managed close to critical population dynamic thresholds, below which their rates of increase will be negative and where scale-invariance may be expected as a consequence. We collected detailed spatial data on five contrasting species over a period of three years in a primarily arable field. Counts in 20×20 cm contiguous quadrats, 225,000 in 1998 and 84,375 thereafter, could be re-structured into a wide range of larger quadrat sizes. These were analysed using three methods based on correlation sum, incidence and conditional incidence. We found non-trivial scale invariance for species occurring at low mean densities and where they were strongly aggregated. The fact that the scale-invariance was not found for widespread species occurring at higher densities suggests that the scaling in agricultural weed populations may, indeed, be related to critical phenomena.
Resumo:
A fully automated procedure to extract and to image local fibre orientation in biological tissues from scanning X-ray diffraction is presented. The preferred chitin fibre orientation in the flow sensing system of crickets is determined with high spatial resolution by applying synchrotron radiation based X-ray microbeam diffraction in conjunction with advanced sample sectioning using a UV micro-laser. The data analysis is based on an automated detection of azimuthal diffraction maxima after 2D convolution filtering (smoothing) of the 2D diffraction patterns. Under the assumption of crystallographic fibre symmetry around the morphological fibre axis, the evaluation method allows mapping the three-dimensional orientation of the fibre axes in space. The resulting two-dimensional maps of the local fibre orientations - together with the complex shape of the flow sensing system - may be useful for a better understanding of the mechanical optimization of such tissues.
Resumo:
Event-related functional magnetic resonance imaging (efMRI) has emerged as a powerful technique for detecting brains' responses to presented stimuli. A primary goal in efMRI data analysis is to estimate the Hemodynamic Response Function (HRF) and to locate activated regions in human brains when specific tasks are performed. This paper develops new methodologies that are important improvements not only to parametric but also to nonparametric estimation and hypothesis testing of the HRF. First, an effective and computationally fast scheme for estimating the error covariance matrix for efMRI is proposed. Second, methodologies for estimation and hypothesis testing of the HRF are developed. Simulations support the effectiveness of our proposed methods. When applied to an efMRI dataset from an emotional control study, our method reveals more meaningful findings than the popular methods offered by AFNI and FSL. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The ability to display and inspect powder diffraction data quickly and efficiently is a central part of the data analysis process. Whilst many computer programs are capable of displaying powder data, their focus is typically on advanced operations such as structure solution or Rietveld refinement. This article describes a lightweight software package, Jpowder, whose focus is fast and convenient visualization and comparison of powder data sets in a variety of formats from computers with network access. Jpowder is written in Java and uses its associated Web Start technology to allow ‘single-click deployment’ from a web page, http://www.jpowder.org. Jpowder is open source, free and available for use by anyone.
Resumo:
Although extensively studied within the lidar community, the multiple scattering phenomenon has always been considered a rare curiosity by radar meteorologists. Up to few years ago its appearance has only been associated with two- or three-body-scattering features (e.g. hail flares and mirror images) involving highly reflective surfaces. Recent atmospheric research aimed at better understanding of the water cycle and the role played by clouds and precipitation in affecting the Earth's climate has driven the deployment of high frequency radars in space. Examples are the TRMM 13.5 GHz, the CloudSat 94 GHz, the upcoming EarthCARE 94 GHz, and the GPM dual 13-35 GHz radars. These systems are able to detect the vertical distribution of hydrometeors and thus provide crucial feedbacks for radiation and climate studies. The shift towards higher frequencies increases the sensitivity to hydrometeors, improves the spatial resolution and reduces the size and weight of the radar systems. On the other hand, higher frequency radars are affected by stronger extinction, especially in the presence of large precipitating particles (e.g. raindrops or hail particles), which may eventually drive the signal below the minimum detection threshold. In such circumstances the interpretation of the radar equation via the single scattering approximation may be problematic. Errors will be large when the radiation emitted from the radar after interacting more than once with the medium still contributes substantially to the received power. This is the case if the transport mean-free-path becomes comparable with the instrument footprint (determined by the antenna beam-width and the platform altitude). This situation resembles to what has already been experienced in lidar observations, but with a predominance of wide- versus small-angle scattering events. At millimeter wavelengths, hydrometeors diffuse radiation rather isotropically compared to the visible or near infrared region where scattering is predominantly in the forward direction. A complete understanding of radiation transport modeling and data analysis methods under wide-angle multiple scattering conditions is mandatory for a correct interpretation of echoes observed by space-borne millimeter radars. This paper reviews the status of research in this field. Different numerical techniques currently implemented to account for higher order scattering are reviewed and their weaknesses and strengths highlighted. Examples of simulated radar backscattering profiles are provided with particular emphasis given to situations in which the multiple scattering contributions become comparable or overwhelm the single scattering signal. We show evidences of multiple scattering effects from air-borne and from CloudSat observations, i.e. unique signatures which cannot be explained by single scattering theory. Ideas how to identify and tackle the multiple scattering effects are discussed. Finally perspectives and suggestions for future work are outlined. This work represents a reference-guide for studies focused at modeling the radiation transport and at interpreting data from high frequency space-borne radar systems that probe highly opaque scattering media such as thick ice clouds or precipitating clouds.
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.