111 resultados para statistical spatial analysis
Resumo:
Regardless of technology benefits, safety planners still face difficulties explaining errors related to the use of different technologies and evaluating how the errors impact the performance of safety decision making. This paper presents a preliminary error impact analysis testbed to model object identification and tracking errors caused by image-based devices and algorithms and to analyze the impact of the errors for spatial safety assessment of earthmoving and surface mining activities. More specifically, this research designed a testbed to model workspaces for earthmoving operations, to simulate safety-related violations, and to apply different object identification and tracking errors on the data collected and processed for spatial safety assessment. Three different cases were analyzed based on actual earthmoving operations conducted at a limestone quarry. Using the testbed, the impacts of the errors were investigated for the safety planning purpose.
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.
Resumo:
The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.
Resumo:
Modern technology now has the ability to generate large datasets over space and time. Such data typically exhibit high autocorrelations over all dimensions. The field trial data motivating the methods of this paper were collected to examine the behaviour of traditional cropping and to determine a cropping system which could maximise water use for grain production while minimising leakage below the crop root zone. They consist of moisture measurements made at 15 depths across 3 rows and 18 columns, in the lattice framework of an agricultural field. Bayesian conditional autoregressive (CAR) models are used to account for local site correlations. Conditional autoregressive models have not been widely used in analyses of agricultural data. This paper serves to illustrate the usefulness of these models in this field, along with the ease of implementation in WinBUGS, a freely available software package. The innovation is the fitting of separate conditional autoregressive models for each depth layer, the ‘layered CAR model’, while simultaneously estimating depth profile functions for each site treatment. Modelling interest also lay in how best to model the treatment effect depth profiles, and in the choice of neighbourhood structure for the spatial autocorrelation model. The favoured model fitted the treatment effects as splines over depth, and treated depth, the basis for the regression model, as measured with error, while fitting CAR neighbourhood models by depth layer. It is hierarchical, with separate onditional autoregressive spatial variance components at each depth, and the fixed terms which involve an errors-in-measurement model treat depth errors as interval-censored measurement error. The Bayesian framework permits transparent specification and easy comparison of the various complex models compared.
Resumo:
In this paper, spatially offset Raman spectroscopy (SORS) is demonstrated for non-invasively investigating the composition of drug mixtures inside an opaque plastic container. The mixtures consisted of three components including a target drug (acetaminophen or phenylephrine hydrochloride) and two diluents (glucose and caffeine). The target drug concentrations ranged from 5% to 100%. After conducting SORS analysis to ascertain the Raman spectra of the concealed mixtures, principal component analysis (PCA) was performed on the SORS spectra to reveal trends within the data. Partial least squares (PLS) regression was used to construct models that predicted the concentration of each target drug, in the presence of the other two diluents. The PLS models were able to predict the concentration of acetaminophen in the validation samples with a root-mean-square error of prediction (RMSEP) of 3.8% and the concentration of phenylephrine hydrochloride with an RMSEP of 4.6%. This work demonstrates the potential of SORS, used in conjunction with multivariate statistical techniques, to perform non-invasive, quantitative analysis on mixtures inside opaque containers. This has applications for pharmaceutical analysis, such as monitoring the degradation of pharmaceutical products on the shelf, in forensic investigations of counterfeit drugs, and for the analysis of illicit drug mixtures which may contain multiple components.
Resumo:
Concerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques(e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.
Resumo:
With increasing rate of shipping traffic, the risk of collisions in busy and congested port waters is likely to rise. However, due to low collision frequencies in port waters, it is difficult to analyze such risk in a sound statistical manner. A convenient approach of investigating navigational collision risk is the application of the traffic conflict techniques, which have potential to overcome the difficulty of obtaining statistical soundness. This study aims at examining port water conflicts in order to understand the characteristics of collision risk with regard to vessels involved, conflict locations, traffic and kinematic conditions. A hierarchical binomial logit model, which considers the potential correlations between observation-units, i.e., vessels, involved in the same conflicts, is employed to evaluate the association of explanatory variables with conflict severity levels. Results show higher likelihood of serious conflicts for vessels of small gross tonnage or small overall length. The probability of serious conflict also increases at locations where vessels have more varied headings, such as traffic intersections and anchorages; becoming more critical at night time. Findings from this research should assist both navigators operating in port waters as well as port authorities overseeing navigational management.
Resumo:
Objective To identify the spatial and temporal clusters of Barmah Forest virus (BFV) disease in Queensland in Australia, using geographical information systems (GIS) and spatial scan statistic (SaTScan). Methods We obtained BFV disease cases, population and statistical local areas boundary data from Queensland Health and Australian Bureau of Statistics respectively during 1992-2008 for Queensland. A retrospective Poisson-based analysis using SaTScan software and method was conducted in order to identify both purely spatial and space-time BFV disease high-rate clusters. A spatial cluster size of a proportion of the population and a 200km circle radius and varying time windows from 1 month to 12 months were chosen (for the space-time analysis). Results The spatial scan statistic detected a most likely significant purely spatial cluster (including 23 SLAs) and a most likely significant space-time cluster (including 24 SLAs) in approximately the same location. Significant secondary clusters were also identified from both the analyses in several locations. Conclusions This study provides evidence of the existence of statistically significant BFV disease clusters in Queensland, Australia. The study also demonstrated the relevance and applicability of SaTScan in analysing on-going surveillance data to identify clusters to facilitate the development of effective BFV disease prevention and control strategies in Queensland, Australia.
Resumo:
Crop simulation models have the potential to assess the risk associated with the selection of a specific N fertilizer rate, by integrating the effects of soil-crop interactions on crop growth under different pedo-climatic and management conditions. The objective of this study was to simulate the environmental and economic impact (nitrate leaching and N2O emissions) of a spatially variable N fertilizer application in an irrigated maize field in Italy. The validated SALUS model was run with 5 nitrogen rates scenarios, 50, 100, 150, 200, and 250 kg N ha−1, with the latter being the N fertilization adopted by the farmer. The long-term (25 years) simulations were performed on two previously identified spatially and temporally stable zones, a high yielding and low yielding zone. The simulation results showed that N fertilizer rate can be reduced without affecting yield and net return. The marginal net return was on average higher for the high yield zone, with values ranging from 1550 to 2650 € ha−1 for the 200 N and 1485 to 2875 € ha−1 for the 250 N. N leaching varied between 16.4 and 19.3 kg N ha−1 for the 200 N and the 250 N in the high yield zone. In the low yield zone, the 250 N had a significantly higher N leaching. N2O emissions varied between 0.28 kg N2O ha−1 for the 50 kg N ha−1 rate to a maximum of 1.41 kg N2O ha−1 for the 250 kg N ha−1 rate.
Resumo:
Treatment plans for conformal radiotherapy are based on an initial CT scan. The aim is to deliver the prescribed dose to the tumour, while minimising exposure to nearby organs. Recent advances make it possible to also obtain a Cone-Beam CT (CBCT) scan, once the patient has been positioned for treatment. A statistical model will be developed to compare these CBCT scans with the initial CT scan. Changes in the size, shape and position of the tumour and organs will be detected and quantified. Some progress has already been made in segmentation of prostate CBCT scans [1],[2],[3]. However, none of the existing approaches have taken full advantage of the prior information that is available. The planning CT scan is expertly annotated with contours of the tumour and nearby sensitive objects. This data is specific to the individual patient and can be viewed as a snapshot of spatial information at a point in time. There is an abundance of studies in the radiotherapy literature that describe the amount of variation in the relevant organs between treatments. The findings from these studies can form a basis for estimating the degree of uncertainty. All of this information can be incorporated as an informative prior into a Bayesian statistical model. This model will be developed using scans of CT phantoms, which are objects with known geometry. Thus, the accuracy of the model can be evaluated objectively. This will also enable comparison between alternative models.
Resumo:
Certain statistic and scientometric features of articles published in the journal “International Research in Geographical and Environmental Education” are examined in this paper, for the period 1992-2009, by applying nonparametric statistics and Shannon’s entropy (diversity) formula. The main findings of this analysis are: a) after 2004 the research priorities of researchers in geographical and environmental education seem to have changed, b) “teacher education” has been the most recurrent theme throughout these 18 years, followed by “values & attitudes” and “inquiry & problem solving” c) the themes “GIS” and “Sustainability” were the most “stable” throughout the 18 years, meaning that they maintained their ranks as publication priorities more than other themes, d) citations of IRGEE increase annually, e) the average thematic diversity of articles published during the period 1992-2009 is 82.7% of the maximum thematic diversity (very high), meaning that the Journal has the capacity to attract a wide readership for the 10 themes it has successfully covered throughout the 18 years of its publication.
Resumo:
Matched case–control research designs can be useful because matching can increase power due to reduced variability between subjects. However, inappropriate statistical analysis of matched data could result in a change in the strength of association between the dependent and independent variables or a change in the significance of the findings. We sought to ascertain whether matched case–control studies published in the nursing literature utilized appropriate statistical analyses. Of 41 articles identified that met the inclusion criteria, 31 (76%) used an inappropriate statistical test for comparing data derived from case subjects and their matched controls. In response to this finding, we developed an algorithm to support decision-making regarding statistical tests for matched case–control studies.
Resumo:
We define a pair-correlation function that can be used to characterize spatiotemporal patterning in experimental images and snapshots from discrete simulations. Unlike previous pair-correlation functions, the pair-correlation functions developed here depend on the location and size of objects. The pair-correlation function can be used to indicate complete spatial randomness, aggregation or segregation over a range of length scales, and quantifies spatial structures such as the shape, size and distribution of clusters. Comparing pair-correlation data for various experimental and simulation images illustrates their potential use as a summary statistic for calibrating discrete models of various physical processes.
Resumo:
Operational modal analysis (OMA) is prevalent in modal identifi cation of civil structures. It asks for response measurements of the underlying structure under ambient loads. A valid OMA method requires the excitation be white noise in time and space. Although there are numerous applications of OMA in the literature, few have investigated the statistical distribution of a measurement and the infl uence of such randomness to modal identifi cation. This research has attempted modifi ed kurtosis to evaluate the statistical distribution of raw measurement data. In addition, a windowing strategy employing this index has been proposed to select quality datasets. In order to demonstrate how the data selection strategy works, the ambient vibration measurements of a laboratory bridge model and a real cable-stayed bridge have been respectively considered. The analysis incorporated with frequency domain decomposition (FDD) as the target OMA approach for modal identifi cation. The modal identifi cation results using the data segments with different randomness have been compared. The discrepancy in FDD spectra of the results indicates that, in order to fulfi l the assumption of an OMA method, special care shall be taken in processing a long vibration measurement data. The proposed data selection strategy is easy-to-apply and verifi ed effective in modal analysis.