980 resultados para Database systems
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
Moving objects database systems are the most challenging sub-category among Spatio-Temporal database systems. A database system that updates in real-time the location information of GPS-equipped moving vehicles has to meet even stricter requirements. Currently existing data storage models and indexing mechanisms work well only when the number of moving objects in the system is relatively small. This dissertation research aimed at the real-time tracking and history retrieval of massive numbers of vehicles moving on road networks. A total solution has been provided for the real-time update of the vehicles’ location and motion information, range queries on current and history data, and prediction of vehicles’ movement in the near future. To achieve these goals, a new approach called Segmented Time Associated to Partitioned Space (STAPS) was first proposed in this dissertation for building and manipulating the indexing structures for moving objects databases. Applying the STAPS approach, an indexing structure of associating a time interval tree to each road segment was developed for real-time database systems of vehicles moving on road networks. The indexing structure uses affordable storage to support real-time data updates and efficient query processing. The data update and query processing performance it provides is consistent without restrictions such as a time window or assuming linear moving trajectories. An application system design based on distributed system architecture with centralized organization was developed to maximally support the proposed data and indexing structures. The suggested system architecture is highly scalable and flexible. Finally, based on a real-world application model of vehicles moving in region-wide, main issues on the implementation of such a system were addressed.
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
This paper will propose that, rather than sitting on silos of data, historians that utilise quantitative methods should endeavour to make their data accessible through databases, and treat this as a new form of bibliographic entry. Of course in many instances historical data does not lend itself easily to the creation of such data sets. With this in mind some of the issues regarding normalising raw historical data will be looked at with reference to current work on nineteenth century Irish trade. These issues encompass (but are not limited to) measurement systems, geographic locations, and potential problems that may arise in attempting to unify disaggregated sources. It will discuss the need for a concerted effort by historians to define what is required from digital resources for them to be considered accurate, and to what extent the normalisation requirements for database systems may conflict with the desire for accuracy. Many of the issues that the historian may encounter engaging with databases will be common to all historians, and there would be merit in having defined standards for referencing items, such as people, places, locations, and measurements.
Resumo:
A selecção do tema e consequente trabalho de que emerge o titulo desta dissertação decorreu do facto de se ter tomado conhecimento da necessidade que os membros do projecto FCOMP-01-0124-FEDER-007360 - Inquirir da Honra: Comissários do Santo Oficio e das Ordens Militares em Portugal (1570 - 1773) tiveram para satisfazer alguns objectivos em particular relacionados com a Genealogia da rede de Comissários. O sistema de trabalho manual que até aqui era utilizado, continha uma quantidade considerável de informação complexa, descrevendo ao pormenor as características não só dos indivíduos, mas também do que estava associado ao mesmo, incluindo quem e como se relacionava com as demais figuras. O principal objectivo consistiu assim em responder à pergunta: "Como será possível efectuar uma gestão de toda a informação genealógica recolhida no papel e permitir a sua análise no computador, recorrendo a tecnologias que, por um lado sejam eficientes, e por outro, fáceis de aprender pelos utilizadores?". Para conseguir responder à questão, foi necessário conhecer em primeira mão, o universo da Genealogia e a forma como opera, para que posteriormente, se desenhasse e moldasse toda uma aplicação às necessidades do utilizador. No entanto, a aplicação não se centra apenas em permitir ao utilizador efectuar uma gestão, recorrendo a um sistema de gestão de bases de dados MySQL e permitir a análise genealógica "tradicional" em programas como o Personal Ancestral File. Pretende-se sobretudo, que o utilizador faça uso e responda às perguntas "do presente" esperando que a própria aplicação sirva de motivação para novas perguntas, com a integração da tecnologia XML e do Sistema de Informação Geográfico, Google Earth, permitindo assim a análise de informação genealógica no mapa-mundo. ABSTRACT: The choice of this essay's work subject is set on the need to accomplish determinate goals related with the Genealogy of the network lnquisition Commissioners on behalf of the project FCOMP-01-0124-FEDER-007360 members - Inquirir da Honra: Comissários do Santo Ofício e das Ordens Militares em Portugal (1570 - 1773)- To Inquire Honor: Inquisition Commissioners and the Military Orders in Portugal. The manual work system used till now presented a considerable amount of complex information, describing in detail characteristics not only of individuals but also of what is associated to it, including whoandhow. The main goal aimed at thus responding to: «How could it be possible to select and examine all the genealogical data registered on paper and allow it to be analyzed on computer, by means of technology that on one hand are efficient and on other hand easy to learn by its users? ». ln order to get to the answer to that matter, it was necessary to acknowledge firstly the Genealogy's universe so afterwards it could be possible to outline and shape an entire application to user needs. Nevertheless, the application does not only focus on allowing the user to carry out the system’s management, using MySQL database management system and allowing the "traditional" genealogical management in programs such as the Personal Ancestral File. Above all the user should get involved with it and answer the key questions of 'the present’ hoping that the application serves by itself as motivation to arouse new questions with the integration of XML technology and Geographic Information System, Google Earth, thus allowing the analysis of genealogical information worldwide.
Resumo:
Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly. (C) 2002 Elsevier Science Ireland Ltd. All rights reserved.
Resumo:
The vast majority of eukaryotic organisms reproduce sexually, yet the nature of the sexual system and the mechanism of sex determination often vary remarkably, even among closely related species. Some species of animals and plants change sex across their lifespan, some contain hermaphrodites as well as males and females, some determine sex with highly differentiated chromosomes, while others determine sex according to their environment. Testing evolutionary hypotheses regarding the causes and consequences of this diversity requires interspecific data placed in a phylogenetic context. Such comparative studies have been hampered by the lack of accessible data listing sexual systems and sex determination mechanisms across the eukaryotic tree of life. Here, we describe a database developed to facilitate access to sexual system and sex chromosome information, with data on sexual systems from 11,038 plant, 705 fish, 173 amphibian, 593 non-avian reptilian, 195 avian, 479 mammalian, and 11,556 invertebrate species.
Resumo:
Our ability to identify, acquire, store, enquire on and analyse data is increasing as never before, especially in the GIS field. Technologies are becoming available to manage a wider variety of data and to make intelligent inferences on that data. The mainstream arrival of large-scale database engines is not far away. The experience of using the first such products tells us that they will radically change data management in the GIS field.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
One of the most demanding needs in cloud computing is that of having scalable and highly available databases. One of the ways to attend these needs is to leverage the scalable replication techniques developed in the last decade. These techniques allow increasing both the availability and scalability of databases. Many replication protocols have been proposed during the last decade. The main research challenge was how to scale under the eager replication model, the one that provides consistency across replicas. In this paper, we examine three eager database replication systems available today: Middle-R, C-JDBC and MySQL Cluster using TPC-W benchmark. We analyze their architecture, replication protocols and compare the performance both in the absence of failures and when there are failures.
Resumo:
One of the most demanding needs in cloud computing and big data is that of having scalable and highly available databases. One of the ways to attend these needs is to leverage the scalable replication techniques developed in the last decade. These techniques allow increasing both the availability and scalability of databases. Many replication protocols have been proposed during the last decade. The main research challenge was how to scale under the eager replication model, the one that provides consistency across replicas. This thesis provides an in depth study of three eager database replication systems based on relational systems: Middle-R, C-JDBC and MySQL Cluster and three systems based on In-Memory Data Grids: JBoss Data Grid, Oracle Coherence and Terracotta Ehcache. Thesis explore these systems based on their architecture, replication protocols, fault tolerance and various other functionalities. It also provides experimental analysis of these systems using state-of-the art benchmarks: TPC-C and TPC-W (for relational systems) and Yahoo! Cloud Serving Benchmark (In- Memory Data Grids). Thesis also discusses three Graph Databases, Neo4j, Titan and Sparksee based on their architecture and transactional capabilities and highlights the weaker transactional consistencies provided by these systems. It discusses an implementation of snapshot isolation in Neo4j graph database to provide stronger isolation guarantees for transactions.