889 resultados para Spatial data warehouse
Resumo:
Spatial data mining recently emerges from a number of real applications, such as real-estate marketing, urban planning, weather forecasting, medical image analysis, road traffic accident analysis, etc. It demands for efficient solutions for many new, expensive, and complicated problems. In this paper, we investigate the problem of evaluating the top k distinguished “features” for a “cluster” based on weighted proximity relationships between the cluster and features. We measure proximity in an average fashion to address possible nonuniform data distribution in a cluster. Combining a standard multi-step paradigm with new lower and upper proximity bounds, we presented an efficient algorithm to solve the problem. The algorithm is implemented in several different modes. Our experiment results not only give a comparison among them but also illustrate the efficiency of the algorithm.
Resumo:
Indicators which summarise the characteristics of spatiotemporal data coverages significantly simplify quality evaluation, decision making and justification processes by providing a number of quality cues that are easy to manage and avoiding information overflow. Criteria which are commonly prioritised in evaluating spatial data quality and assessing a dataset’s fitness for use include lineage, completeness, logical consistency, positional accuracy, temporal and attribute accuracy. However, user requirements may go far beyond these broadlyaccepted spatial quality metrics, to incorporate specific and complex factors which are less easily measured. This paper discusses the results of a study of high level user requirements in geospatial data selection and data quality evaluation. It reports on the geospatial data quality indicators which were identified as user priorities, and which can potentially be standardised to enable intercomparison of datasets against user requirements. We briefly describe the implications for tools and standards to support the communication and intercomparison of data quality, and the ways in which these can contribute to the generation of a GEO label.
Resumo:
This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources.
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
This dissertation documents the everyday lives and spaces of a population of youth typically constructed as out of place, and the broader urban context in which they are rendered as such. Thirty-three female and transgender street youth participated in the development of this youth-based participatory action research (YPAR) project utilizing geo-ethnographic methods, auto-photography, and archival research throughout a six-phase, eighteen-month research process in Bogotá, Colombia. ^ This dissertation details the participatory writing process that enabled the YPAR research team to destabilize dominant representations of both street girls and urban space and the participatory mapping process that enabled the development of a youth vision of the city through cartographic images. The maps display individual and aggregate spatial data indicating trends within and making comparisons between three subgroups of the research population according to nine spatial variables. These spatial data, coupled with photographic and ethnographic data, substantiate that street girls’ mobilities and activity spaces intersect with and are altered by state-sponsored urban renewal projects and paramilitary-led social cleansing killings, both efforts to clean up Bogotá by purging the city center of deviant populations and places. ^ Advancing an ethical approach to conducting research with excluded populations, this dissertation argues for the enactment of critical field praxis and care ethics within a YPAR framework to incorporate young people as principal research actors rather than merely voices represented in adultist academic discourse. Interjection of considerations of space, gender, and participation into the study of street youth produce new ways of envisioning the city and the role of young people in research. Instead of seeing the city from a panoptic view, Bogotá is revealed through the eyes of street youth who participated in the construction and feminist visualization of a new cartography and counter-map of the city grounded in embodied, situated praxis. This dissertation presents a socially responsible approach to conducting action-research with high-risk youth by documenting how street girls reclaim their right to the city on paper and in practice; through maps of their everyday exclusion in Bogotá followed by activism to fight against it.^
Resumo:
Modern geographical databases, which are at the core of geographic information systems (GIS), store a rich set of aspatial attributes in addition to geographic data. Typically, aspatial information comes in textual and numeric format. Retrieving information constrained on spatial and aspatial data from geodatabases provides GIS users the ability to perform more interesting spatial analyses, and for applications to support composite location-aware searches; for example, in a real estate database: “Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000”. Efficient processing of such queries require combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by nonspatial filter, or viceversa), which can incur large performance overheads. On the other hand, more recently, the amount of geolocation data has grown rapidly in databases due in part to advances in geolocation technologies (e.g., GPS-enabled smartphones) that allow users to associate location data to objects or events. The latter poses potential data ingestion challenges of large data volumes for practical GIS databases. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre-processing task) can be scaled in MapReduce—a widely-adopted parallel programming model for data intensive problems. The evaluation of our algorithms in a Hadoop cluster showed close to linear scalability in building R-tree indexes. Subsequently, we develop efficient algorithms for processing spatial queries with aspatial conditions. Novel techniques for simultaneously indexing spatial with textual and numeric data are developed to that end. Experimental evaluations with real-world, large spatial datasets measured query response times within the sub-second range for most cases, and up to a few seconds for a small number of cases, which is reasonable for interactive applications. Overall, the previous results show that the MapReduce parallel model is suitable for indexing tasks in spatial databases, and the adequate combination of spatial and aspatial attribute indexes can attain acceptable response times for interactive spatial queries with constraints on aspatial data.
Resumo:
Il lavoro presentato in questo elaborato tratterà lo sviluppo di un sistema di alerting che consenta di monitorare proattivamente una o più sorgenti dati aziendali, segnalando le eventuali condizioni di irregolarità rilevate; questo verrà incluso all'interno di sistemi già esistenti dedicati all'analisi dei dati e alla pianificazione, ovvero i cosiddetti Decision Support Systems. Un sistema di supporto alle decisioni è in grado di fornire chiare informazioni per tutta la gestione dell'impresa, misurandone le performance e fornendo proiezioni sugli andamenti futuri. Questi sistemi vengono catalogati all'interno del più ampio ambito della Business Intelligence, che sottintende l'insieme di metodologie in grado di trasformare i dati di business in informazioni utili al processo decisionale. L'intero lavoro di tesi è stato svolto durante un periodo di tirocinio svolto presso Iconsulting S.p.A., IT System Integrator bolognese specializzato principalmente nello sviluppo di progetti di Business Intelligence, Enterprise Data Warehouse e Corporate Performance Management. Il software che verrà illustrato in questo elaborato è stato realizzato per essere collocato all'interno di un contesto più ampio, per rispondere ai requisiti di un cliente multinazionale leader nel settore della telefonia mobile e fissa.
Resumo:
During the SINOPS project, an optimal state of the art simulation of the marine silicon cycle is attempted employing a biogeochemical ocean general circulation model (BOGCM) through three particular time steps relevant for global (paleo-) climate. In order to tune the model optimally, results of the simulations are compared to a comprehensive data set of 'real' observations. SINOPS' scientific data management ensures that data structure becomes homogeneous throughout the project. Practical work routine comprises systematic progress from data acquisition, through preparation, processing, quality check and archiving, up to the presentation of data to the scientific community. Meta-information and analytical data are mapped by an n-dimensional catalogue in order to itemize the analytical value and to serve as an unambiguous identifier. In practice, data management is carried out by means of the online-accessible information system PANGAEA, which offers a tool set comprising a data warehouse, Graphical Information System (GIS), 2-D plot, cross-section plot, etc. and whose multidimensional data model promotes scientific data mining. Besides scientific and technical aspects, this alliance between scientific project team and data management crew serves to integrate the participants and allows them to gain mutual respect and appreciation.
Resumo:
By providing vehicle-to-vehicle and vehicle-to-infrastructure wireless communications, vehicular ad hoc networks (VANETs), also known as the “networks on wheels”, can greatly enhance traffic safety, traffic efficiency and driving experience for intelligent transportation system (ITS). However, the unique features of VANETs, such as high mobility and uneven distribution of vehicular nodes, impose critical challenges of high efficiency and reliability for the implementation of VANETs. This dissertation is motivated by the great application potentials of VANETs in the design of efficient in-network data processing and dissemination. Considering the significance of message aggregation, data dissemination and data collection, this dissertation research targets at enhancing the traffic safety and traffic efficiency, as well as developing novel commercial applications, based on VANETs, following four aspects: 1) accurate and efficient message aggregation to detect on-road safety relevant events, 2) reliable data dissemination to reliably notify remote vehicles, 3) efficient and reliable spatial data collection from vehicular sensors, and 4) novel promising applications to exploit the commercial potentials of VANETs. Specifically, to enable cooperative detection of safety relevant events on the roads, the structure-less message aggregation (SLMA) scheme is proposed to improve communication efficiency and message accuracy. The scheme of relative position based message dissemination (RPB-MD) is proposed to reliably and efficiently disseminate messages to all intended vehicles in the zone-of-relevance in varying traffic density. Due to numerous vehicular sensor data available based on VANETs, the scheme of compressive sampling based data collection (CS-DC) is proposed to efficiently collect the spatial relevance data in a large scale, especially in the dense traffic. In addition, with novel and efficient solutions proposed for the application specific issues of data dissemination and data collection, several appealing value-added applications for VANETs are developed to exploit the commercial potentials of VANETs, namely general purpose automatic survey (GPAS), VANET-based ambient ad dissemination (VAAD) and VANET based vehicle performance monitoring and analysis (VehicleView). Thus, by improving the efficiency and reliability in in-network data processing and dissemination, including message aggregation, data dissemination and data collection, together with the development of novel promising applications, this dissertation will help push VANETs further to the stage of massive deployment.
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.
Resumo:
Pretende-se desenvolver um Data Warehouse para um grupo empresarial constituído por quatro empresas, tendo como objectivo primordial a consolidação de informação. A consolidação da informação é de extrema utilidade, uma vez que as empresas podem ter dados comuns, tais como, produtos ou clientes. O principal objectivo dos sistemas analíticos é permitir analisar os dados dos sistemas transacionais da organização, fazendo com que os utilizadores que nada percebem destes sistemas consigam ter apoio nas tomadas decisão de uma forma simples e eficaz. A utilização do Data Warehouse é útil no apoio a decisões, uma vez que torna os utilizadores autónomos na realização de análises. Os utilizadores deixam de estar dependentes de especialistas em informática para efectuar as suas consultas e passam a ser eles próprios a realizá-las. Por conseguinte, o tempo de execução de uma consulta através do Data Warehouse é de poucos segundos, ao contrário das consultas criadas anteriormente pelos especialistas que por vezes demoravam horas a ser executadas. __ ABSTRACT: lt is intended to develop a Data Warehouse for a business related group of four companies, having by main goal the information consolidation. This information consolidation is of extreme usefulness since the companies can have common data, such as products or customers. The main goal of the analytical systems is to allow analyze data from the organization transactional systems, making that the users that do not understand anything of these systems may have support in a simple and effective way in every process of taking decisions. Using the Data Warehouse is useful to support decisions, once it will allow users to become autonomous in carrying out analysis. Users will no longer depend on computer experts to make their own queries and they can do it themselves. Therefore, the time of a query through the Data Warehouse takes only a few seconds, unlike the earlier queries created previously by experts that sometimes took hours to run.
Resumo:
With the application of GIS methodologies to spatial data, researchers can now identify patterns of occurrence for many social problems including health-issues and crime. Further more, since this type of data also contains clues as to the underlying causes of social problems, it can be used to make well-educated and consequently, more effective policy decisions.
Resumo:
Decision support systems (DSS) have evolved rapidly during the last decade from stand alone or limited networked solutions to online participatory solutions. One of the major enablers of this change is the fastest growing areas of geographical information system (GIS) technology development that relates to the use of the Internet as a means to access, display, and analyze geospatial data remotely. World-wide many federal, state, and particularly local governments are designing to facilitate data sharing using interactive Internet map servers. This new generation DSS or planning support systems (PSS), interactive Internet map server, is the solution for delivering dynamic maps and GIS data and services via the world-wide Web, and providing public participatory GIS (PPGIS) opportunities to a wider community (Carver, 2001; Jankowski & Nyerges, 2001). It provides a highly scalable framework for GIS Web publishing, Web-based public participatory GIS (WPPGIS), which meets the needs of corporate intranets and demands of worldwide Internet access (Craig, 2002). The establishment of WPPGIS provides spatial data access through a support centre or a GIS portal to facilitate efficient access to and sharing of related geospatial data (Yigitcanlar, Baum, & Stimson, 2003). As more and more public and private entities adopt WPPGIS technology, the importance and complexity of facilitating geospatial data sharing is growing rapidly (Carver, 2003). Therefore, this article focuses on the online public participation dimension of the GIS technology. The article provides an overview of recent literature on GIS and WPPGIS, and includes a discussion on the potential use of these technologies in providing a democratic platform for the public in decision-making.
Resumo:
The broad definition of sustainable development at the early stage of its introduction has caused confusion and hesitation among local authorities and planning professionals. The main difficulties are experience in employing loosely-defined principles of sustainable development in setting policies and goals. The question of how this theory/rhetoric-practice gap could be filled will be the theme of this study. One of the widely employed sustainability accounting approaches by governmental organisations, triple bottom line, and applicability of this approach to sustainable urban development policies will be examined. When incorporating triple bottom line considerations with the environmental impact assessment techniques, the framework of GIS-based decision support system that helps decision-makers in selecting policy option according to the economic, environmental and social impacts will be introduced. In order to embrace sustainable urban development policy considerations, the relationship between urban form, travel pattern and socio-economic attributes should be clarified. This clarification associated with other input decision support systems will picture the holistic state of the urban settings in terms of sustainability. In this study, grid-based indexing methodology will be employed to visualise the degree of compatibility of selected scenarios with the designated sustainable urban future. In addition, this tool will provide valuable knowledge about the spatial dimension of the sustainable development. It will also give fine details about the possible impacts of urban development proposals by employing disaggregated spatial data analysis (e.g. land-use, transportation, urban services, population density, pollution, etc.). The visualisation capacity of this tool will help decision makers and other stakeholders compare and select alternative of future urban developments.
Resumo:
Ecologically sustainable development has become a major feature of legal systems at the international, national and local levels throughout the world. In Australia, governments have responded to environmental crises by enacting legislation imposing obligations and restrictions over privately-owned land. Whilst these obligations and restrictions may well be necessary to achieve sustainability, the approach to management of information concerning these instruments is problematic. For example, management of information concerning obligations and restrictions in Queensland is fragmented, with some instruments registered or recorded on the land title register, some on external registers, and some information only available in the legislation itself. This approach is used in most Australian jurisdictions. This fragmented approach has led to two separate but interconnected problems. First, the Torrens system is no longer meeting its goal of providing a complete and accurate picture of title. Second, this uncoordinated approach to the management of land titles, and obligations and restrictions on land use, has created a barrier to sustainable management of natural resources. This is because compliance with environmental laws is impaired in the absence of easily accessible and accurate information. These problems demonstrate a clear need for reform in this area. To determine how information concerning these obligations and restrictions may be most effectively managed, this thesis will apply a comparative methodology and consider three case studies, which each utilise different models for management of this information. These jurisdictions will be assessed according to a set of guidelines for comparison to identify which features of their systems provide for effective management of information concerning obligations and restrictions on title and use. Based on this comparison, this thesis will devise a series of recommendations for an effective system for the management of information concerning obligations and restrictions on land title and use, taking into account any potential legal issues and barriers to implementation. This series of recommendations for reform will be supplemented by suggested draft legislative provisions.