951 resultados para Spatial Data Quality
Resumo:
Online geographic information systems provide the means to extract a subset of desired spatial information from a larger remote repository. Data retrieved representing real-world geographic phenomena are then manipulated to suit the specific needs of an end-user. Often this extraction requires the derivation of representations of objects specific to a particular resolution or scale from a single original stored version. Currently standard spatial data handling techniques cannot support the multi-resolution representation of such features in a database. In this paper a methodology to store and retrieve versions of spatial objects at, different resolutions with respect to scale using standard database primitives and SQL is presented. The technique involves heavy fragmentation of spatial features that allows dynamic simplification into scale-specific object representations customised to the display resolution of the end-user's device. Experimental results comparing the new approach to traditional R-Tree indexing and external object simplification reveal the former performs notably better for mobile and WWW applications where client-side resources are limited and retrieved data loads are kept relatively small.
Resumo:
This paper develops an Internet geographical information system (GIS) and spatial model application that provides socio-economic information and exploratory spatial data analysis for local government authorities (LGAs) in Queensland, Australia. The application aims to improve the means by which large quantities of data may be analysed, manipulated and displayed in order to highlight trends and patterns as well as provide performance benchmarking that is readily understandable and easily accessible for decision-makers. Measures of attribute similarity and spatial proximity are combined in a clustering model with a spatial autocorrelation index for exploratory spatial data analysis to support the identification of spatial patterns of change. Analysis of socio-economic changes in Queensland is presented. The results demonstrate the usefulness and potential appeal of the Internet GIS applications as a tool to inform the process of regional analysis, planning and policy.
Resumo:
We combine spatial data on home ranges of individuals and microsatellite markers to examine patterns of fine-scale spatial genetic structure and dispersal within a brush-tailed rock-wallaby (Petrogale penicillata) colony at Hurdle Creek Valley, Queensland. Brush-tailed rock-wallabies were once abundant and widespread throughout the rocky terrain of southeastern Australia; however, populations are nearly extinct in the south of their range and in decline elsewhere. We use pairwise relatedness measures and a recent multilocus spatial autocorrelation analysis to test the hypotheses that in this species, within-colony dispersal is male-biased and that female philopatry results in spatial clusters of related females within the colony. We provide clear evidence for strong female philopatry and male-biased dispersal within this rock-wallaby colony. There was a strong, significant negative correlation between pairwise relatedness and geographical distance of individual females along only 800 m of cliff line. Spatial genetic autocorrelation analyses showed significant positive correlation for females in close proximity to each other and revealed a genetic neighbourhood size of only 600 m for females. Our study is the first to report on the fine-scale spatial genetic structure within a rock-wallaby colony and we provide the first robust evidence for strong female philopatry and spatial clustering of related females within this taxon. We discuss the ecological and conservation implications of our findings for rock-wallabies, as well as the importance of fine-scale spatial genetic patterns in studies of dispersal behaviour.
Resumo:
The data structure of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. This research develops a methodology for evaluating, ex ante, the relative desirability of alternative data structures for end user queries. This research theorizes that the data structure that yields the lowest weighted average complexity for a representative sample of information requests is the most desirable data structure for end user queries. The theory was tested in an experiment that compared queries from two different relational database schemas. As theorized, end users querying the data structure associated with the less complex queries performed better Complexity was measured using three different Halstead metrics. Each of the three metrics provided excellent predictions of end user performance. This research supplies strong evidence that organizations can use complexity metrics to evaluate, ex ante, the desirability of alternate data structures. Organizations can use these evaluations to enhance the efficient and effective retrieval of information by creating data structures that minimize end user query complexity.
Resumo:
X-ray crystallography is the most powerful method for determining the three-dimensional structure of biological macromolecules. One of the major obstacles in the process is the production of high-quality crystals for structure determination. All too often, crystals are produced that are of poor quality and are unsuitable for diffraction studies. This review provides a compilation of post-crystallization methods that can convert poorly diffracting crystals into data-quality crystals. Protocols for annealing, dehydration, soaking and cross-linking are outlined and examples of some spectacular changes in crystal quality are provided. The protocols are easily incorporated into the structure-determination pipeline and a practical guide is provided that shows how and when to use the different post-crystallization treatments for improving crystal quality.
Resumo:
We have performed a systematic temporal and spatial expression profiling of the developing mouse kidney using Compugen long-oligonucleotide microarrays. The activity of 18,000 genes was monitored at 24-h intervals from 10.5-day-postcoitum (dpc) metanephric mesenchyme (MM) through to neonatal kidney, and a cohort of 3,600 dynamically expressed genes was identified. Early metanephric development was further surveyed by directly comparing RNA from 10.5 vs. 11.5 vs. 13.5dpc kidneys. These data showed high concordance with the previously published dynamic profile of rat kidney development (Stuart RO, Bush KT, and Nigam SK. Proc Natl Acad Sci USA 98: 5649-5654, 2001) and our own temporal data. Cluster analyses were used to identify gene ontological terms, functional annotations, and pathways associated with temporal expression profiles. Genetic network analysis was also used to identify biological networks that have maximal transcriptional activity during early metanephric development, highlighting the involvement of proliferation and differentiation. Differential gene expression was validated using whole mount and section in situ hybridization of staged embryonic kidneys. Two spatial profiling experiments were also undertaken. MM (10.5dpc) was compared with adjacent intermediate mesenchyme to further define metanephric commitment. To define the genes involved in branching and in the induction of nephrogenesis, expression profiling was performed on ureteric bud (GFP+) FACS sorted from HoxB7-GFP transgenic mice at 15.5dpc vs. the GFP- mesenchymal derivatives. Comparisons between temporal and spatial data enhanced the ability to predict function for genes and networks. This study provides the most comprehensive temporal and spatial survey of kidney development to date, and the compilation of these transcriptional surveys provides important insights into metanephric development that can now be functionally tested.
Resumo:
Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.
Resumo:
Spatial data are particularly useful in mobile environments. However, due to the low bandwidth of most wireless networks, developing large spatial database applications becomes a challenging process. In this paper, we provide the first attempt to combine two important techniques, multiresolution spatial data structure and semantic caching, towards efficient spatial query processing in mobile environments. Based on the study of the characteristics of multiresolution spatial data (MSD) and multiresolution spatial query, we propose a new semantic caching model called Multiresolution Semantic Caching (MSC) for caching MSD in mobile environments. MSC enriches the traditional three-category query processing in semantic cache to five categories, thus improving the performance in three ways: 1) a reduction in the amount and complexity of the remainder queries; 2) the redundant transmission of spatial data already residing in a cache is avoided; 3) a provision for satisfactory answers before 100% query results have been transmitted to the client side. Our extensive experiments on a very large and complex real spatial database show that MSC outperforms the traditional semantic caching models significantly
Resumo:
Client-side caching of spatial data is an important yet very much under investigated issue. Effective caching of vector spatial data has the potential to greatly improve the performance of spatial applications in the Web and wireless environments. In this paper, we study the problem of semantic spatial caching, focusing on effective organization of spatial data and spatial query trimming to take advantage of cached data. Semantic caching for spatial data is a much more complex problem than semantic caching for aspatial data. Several novel ideas are proposed in this paper for spatial applications. A number of typical spatial application scenarios are used to generate spatial query sequences. An extensive experimental performance study is conducted based on these scenarios using real spatial data. We demonstrate a significant performance improvement using our ideas.
Resumo:
Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users’ poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users’ understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users’ performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.
Resumo:
Spatial data mining recently emerges from a number of real applications, such as real-estate marketing, urban planning, weather forecasting, medical image analysis, road traffic accident analysis, etc. It demands for efficient solutions for many new, expensive, and complicated problems. In this paper, we investigate the problem of evaluating the top k distinguished “features” for a “cluster” based on weighted proximity relationships between the cluster and features. We measure proximity in an average fashion to address possible nonuniform data distribution in a cluster. Combining a standard multi-step paradigm with new lower and upper proximity bounds, we presented an efficient algorithm to solve the problem. The algorithm is implemented in several different modes. Our experiment results not only give a comparison among them but also illustrate the efficiency of the algorithm.
Resumo:
Existing theories of semantic cognition propose models of cognitive processing occurring in a conceptual space, where ‘meaning’ is derived from the spatial relationships between concepts’ mapped locations within the space. Information visualisation is a growing area of research within the field of information retrieval, and methods for presenting database contents visually in the form of spatial data management systems (SDMSs) are being developed. This thesis combined these two areas of research to investigate the benefits associated with employing spatial-semantic mapping (documents represented as objects in two- and three-dimensional virtual environments are proximally mapped dependent on the semantic similarity of their content) as a tool for improving retrieval performance and navigational efficiency when browsing for information within such systems. Positive effects associated with the quality of document mapping were observed; improved retrieval performance and browsing behaviour were witnessed when mapping was optimal. It was also shown using a third dimension for virtual environment (VE) presentation provides sufficient additional information regarding the semantic structure of the environment that performance is increased in comparison to using two-dimensions for mapping. A model that describes the relationship between retrieval performance and browsing behaviour was proposed on the basis of findings. Individual differences were not found to have any observable influence on retrieval performance or browsing behaviour when mapping quality was good. The findings from this work have implications for both cognitive modelling of semantic information, and for designing and testing information visualisation systems. These implications are discussed in the conclusions of this work.
Resumo:
Geospatial data have become a crucial input for the scientific community for understanding the environment and developing environmental management policies. The Global Earth Observation System of Systems (GEOSS) Clearinghouse is a catalogue and search engine that provides access to the Earth Observation metadata. However, metadata are often not easily understood by users, especially when presented in ISO XML encoding. Data quality included in the metadata is basic for users to select datasets suitable for them. This work aims to help users to understand the quality information held in metadata records and to provide the results to geospatial users in an understandable and comparable way. Thus, we have developed an enhanced tool (Rubric-Q) for visually assessing the metadata quality information and quantifying the degree of metadata population. Rubric-Q is an extension of a previous NOAA Rubric tool used as a metadata training and improvement instrument. The paper also presents a thorough assessment of the quality information by applying the Rubric-Q to all dataset metadata records available in the GEOSS Clearinghouse. The results reveal that just 8.7% of the datasets have some quality element described in the metadata, 63.4% have some lineage element documented, and merely 1.2% has some usage element described. © 2013 IEEE.
Resumo:
The purpose of the work is to claim that engineers can be motivated to study statistical concepts by using the applications in their experience connected with Statistical ideas. The main idea is to choose a data from the manufacturing factility (for example, output from CMM machine) and explain that even if the parts used do not meet exact specifications they are used in production. By graphing the data one can show that the error is random but follows a distribution, that is, there is regularily in the data in statistical sense. As the error distribution is continuous, we advocate that the concept of randomness be introducted starting with continuous random variables with probabilities connected with areas under the density. The discrete random variables are then introduced in terms of decision connected with size of the errors before generalizing to abstract concept of probability. Using software, they can then be motivated to study statistical analysis of the data they encounter and the use of this analysis to make engineering and management decisions.
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.