910 resultados para Missing data
Resumo:
This dissertation develops the model of a prototype system for the digital lodgement of spatial data sets with statutory bodies responsible for the registration and approval of land related actions under the Torrens Title system. Spatial data pertain to the location of geographical entities together with their spatial dimensions and are classified as point, line, area or surface. This dissertation deals with a sub-set of spatial data, land boundary data that result from the activities performed by surveying and mapping organisations for the development of land parcels. The prototype system has been developed, utilising an event-driven paradigm for the user-interface, to exploit the potential of digital spatial data being generated from the utilisation of electronic techniques. The system provides for the creation of a digital model of the cadastral network and dependent data sets for an area of interest from hard copy records. This initial model is calibrated on registered control and updated by field survey to produce an amended model. The field-calibrated model then is electronically validated to ensure it complies with standards of format and content. The prototype system was designed specifically to create a database of land boundary data for subsequent retrieval by land professionals for surveying, mapping and related activities. Data extracted from this database are utilised for subsequent field survey operations without the need to create an initial digital model of an area of interest. Statistical reporting of differences resulting when subsequent initial and calibrated models are compared, replaces the traditional checking operations of spatial data performed by a land registry office. Digital lodgement of survey data is fundamental to the creation of the database of accurate land boundary data. This creation of the database is fundamental also to the efficient integration of accurate spatial data about land being generated by modem technology such as global positioning systems, and remote sensing and imaging, with land boundary information and other information held in Government databases. The prototype system developed provides for the delivery of accurate, digital land boundary data for the land registration process to ensure the continued maintenance of the integrity of the cadastre. Such data should meet also the more general and encompassing requirements of, and prove to be of tangible, longer term benefit to the developing, electronic land information industry.
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.