860 resultados para Distributed database systems
Resumo:
This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
This paper will propose that, rather than sitting on silos of data, historians that utilise quantitative methods should endeavour to make their data accessible through databases, and treat this as a new form of bibliographic entry. Of course in many instances historical data does not lend itself easily to the creation of such data sets. With this in mind some of the issues regarding normalising raw historical data will be looked at with reference to current work on nineteenth century Irish trade. These issues encompass (but are not limited to) measurement systems, geographic locations, and potential problems that may arise in attempting to unify disaggregated sources. It will discuss the need for a concerted effort by historians to define what is required from digital resources for them to be considered accurate, and to what extent the normalisation requirements for database systems may conflict with the desire for accuracy. Many of the issues that the historian may encounter engaging with databases will be common to all historians, and there would be merit in having defined standards for referencing items, such as people, places, locations, and measurements.
Resumo:
A selecção do tema e consequente trabalho de que emerge o titulo desta dissertação decorreu do facto de se ter tomado conhecimento da necessidade que os membros do projecto FCOMP-01-0124-FEDER-007360 - Inquirir da Honra: Comissários do Santo Oficio e das Ordens Militares em Portugal (1570 - 1773) tiveram para satisfazer alguns objectivos em particular relacionados com a Genealogia da rede de Comissários. O sistema de trabalho manual que até aqui era utilizado, continha uma quantidade considerável de informação complexa, descrevendo ao pormenor as características não só dos indivíduos, mas também do que estava associado ao mesmo, incluindo quem e como se relacionava com as demais figuras. O principal objectivo consistiu assim em responder à pergunta: "Como será possível efectuar uma gestão de toda a informação genealógica recolhida no papel e permitir a sua análise no computador, recorrendo a tecnologias que, por um lado sejam eficientes, e por outro, fáceis de aprender pelos utilizadores?". Para conseguir responder à questão, foi necessário conhecer em primeira mão, o universo da Genealogia e a forma como opera, para que posteriormente, se desenhasse e moldasse toda uma aplicação às necessidades do utilizador. No entanto, a aplicação não se centra apenas em permitir ao utilizador efectuar uma gestão, recorrendo a um sistema de gestão de bases de dados MySQL e permitir a análise genealógica "tradicional" em programas como o Personal Ancestral File. Pretende-se sobretudo, que o utilizador faça uso e responda às perguntas "do presente" esperando que a própria aplicação sirva de motivação para novas perguntas, com a integração da tecnologia XML e do Sistema de Informação Geográfico, Google Earth, permitindo assim a análise de informação genealógica no mapa-mundo. ABSTRACT: The choice of this essay's work subject is set on the need to accomplish determinate goals related with the Genealogy of the network lnquisition Commissioners on behalf of the project FCOMP-01-0124-FEDER-007360 members - Inquirir da Honra: Comissários do Santo Ofício e das Ordens Militares em Portugal (1570 - 1773)- To Inquire Honor: Inquisition Commissioners and the Military Orders in Portugal. The manual work system used till now presented a considerable amount of complex information, describing in detail characteristics not only of individuals but also of what is associated to it, including whoandhow. The main goal aimed at thus responding to: «How could it be possible to select and examine all the genealogical data registered on paper and allow it to be analyzed on computer, by means of technology that on one hand are efficient and on other hand easy to learn by its users? ». ln order to get to the answer to that matter, it was necessary to acknowledge firstly the Genealogy's universe so afterwards it could be possible to outline and shape an entire application to user needs. Nevertheless, the application does not only focus on allowing the user to carry out the system’s management, using MySQL database management system and allowing the "traditional" genealogical management in programs such as the Personal Ancestral File. Above all the user should get involved with it and answer the key questions of 'the present’ hoping that the application serves by itself as motivation to arouse new questions with the integration of XML technology and Geographic Information System, Google Earth, thus allowing the analysis of genealogical information worldwide.
Resumo:
This thesis attempts to investigate the problems associated with such schemes and suggests a software architecture, which is aimed towards achieving a meaningful discovery. Usage of information elements as a modelling base for efficient information discovery in distributed systems is demonstrated with the aid of a novel conceptual entity called infotron. The investigations are focused on distributed systems and their associated problems. The study was directed towards identifying suitable software architecture and incorporating the same in an environment where information growth is phenomenal and a proper mechanism for carrying out information discovery becomes feasible. An empirical study undertaken with the aid of an election database of constituencies distributed geographically, provided the insights required. This is manifested in the Election Counting and Reporting Software (ECRS) System. ECRS system is a software system, which is essentially distributed in nature designed to prepare reports to district administrators about the election counting process and to generate other miscellaneous statutory reports.
Resumo:
This research presents several components encompassing the scope of the objective of Data Partitioning and Replication Management in Distributed GIS Database. Modern Geographic Information Systems (GIS) databases are often large and complicated. Therefore data partitioning and replication management problems need to be addresses in development of an efficient and scalable solution. ^ Part of the research is to study the patterns of geographical raster data processing and to propose the algorithms to improve availability of such data. These algorithms and approaches are targeting granularity of geographic data objects as well as data partitioning in geographic databases to achieve high data availability and Quality of Service(QoS) considering distributed data delivery and processing. To achieve this goal a dynamic, real-time approach for mosaicking digital images of different temporal and spatial characteristics into tiles is proposed. This dynamic approach reuses digital images upon demand and generates mosaicked tiles only for the required region according to user's requirements such as resolution, temporal range, and target bands to reduce redundancy in storage and to utilize available computing and storage resources more efficiently. ^ Another part of the research pursued methods for efficient acquiring of GIS data from external heterogeneous databases and Web services as well as end-user GIS data delivery enhancements, automation and 3D virtual reality presentation. ^ There are vast numbers of computing, network, and storage resources idling or not fully utilized available on the Internet. Proposed "Crawling Distributed Operating System "(CDOS) approach employs such resources and creates benefits for the hosts that lend their CPU, network, and storage resources to be used in GIS database context. ^ The results of this dissertation demonstrate effective ways to develop a highly scalable GIS database. The approach developed in this dissertation has resulted in creation of TerraFly GIS database that is used by US government, researchers, and general public to facilitate Web access to remotely-sensed imagery and GIS vector information. ^
Resumo:
With the recent explosion in the complexity and amount of digital multimedia data, there has been a huge impact on the operations of various organizations in distinct areas, such as government services, education, medical care, business, entertainment, etc. To satisfy the growing demand of multimedia data management systems, an integrated framework called DIMUSE is proposed and deployed for distributed multimedia applications to offer a full scope of multimedia related tools and provide appealing experiences for the users. This research mainly focuses on video database modeling and retrieval by addressing a set of core challenges. First, a comprehensive multimedia database modeling mechanism called Hierarchical Markov Model Mediator (HMMM) is proposed to model high dimensional media data including video objects, low-level visual/audio features, as well as historical access patterns and frequencies. The associated retrieval and ranking algorithms are designed to support not only the general queries, but also the complicated temporal event pattern queries. Second, system training and learning methodologies are incorporated such that user interests are mined efficiently to improve the retrieval performance. Third, video clustering techniques are proposed to continuously increase the searching speed and accuracy by architecting a more efficient multimedia database structure. A distributed video management and retrieval system is designed and implemented to demonstrate the overall performance. The proposed approach is further customized for a mobile-based video retrieval system to solve the perception subjectivity issue by considering individual user's profile. Moreover, to deal with security and privacy issues and concerns in distributed multimedia applications, DIMUSE also incorporates a practical framework called SMARXO, which supports multilevel multimedia security control. SMARXO efficiently combines role-based access control (RBAC), XML and object-relational database management system (ORDBMS) to achieve the target of proficient security control. A distributed multimedia management system named DMMManager (Distributed MultiMedia Manager) is developed with the proposed framework DEMUR; to support multimedia capturing, analysis, retrieval, authoring and presentation in one single framework.
Resumo:
In database applications, access control security layers are mostly developed from tools provided by vendors of database management systems and deployed in the same servers containing the data to be protected. This solution conveys several drawbacks. Among them we emphasize: 1) if policies are complex, their enforcement can lead to performance decay of database servers; 2) when modifications in the established policies implies modifications in the business logic (usually deployed at the client-side), there is no other possibility than modify the business logic in advance and, finally, 3) malicious users can issue CRUD expressions systematically against the DBMS expecting to identify any security gap. In order to overcome these drawbacks, in this paper we propose an access control stack characterized by: most of the mechanisms are deployed at the client-side; whenever security policies evolve, the security mechanisms are automatically updated at runtime and, finally, client-side applications do not handle CRUD expressions directly. We also present an implementation of the proposed stack to prove its feasibility. This paper presents a new approach to enforce access control in database applications, this way expecting to contribute positively to the state of the art in the field.
Resumo:
In database applications, access control security layers are mostly developed from tools provided by vendors of database management systems and deployed in the same servers containing the data to be protected. This solution conveys several drawbacks. Among them we emphasize: (1) if policies are complex, their enforcement can lead to performance decay of database servers; (2) when modifications in the established policies implies modifications in the business logic (usually deployed at the client-side), there is no other possibility than modify the business logic in advance and, finally, 3) malicious users can issue CRUD expressions systematically against the DBMS expecting to identify any security gap. In order to overcome these drawbacks, in this paper we propose an access control stack characterized by: most of the mechanisms are deployed at the client-side; whenever security policies evolve, the security mechanisms are automatically updated at runtime and, finally, client-side applications do not handle CRUD expressions directly. We also present an implementation of the proposed stack to prove its feasibility. This paper presents a new approach to enforce access control in database applications, this way expecting to contribute positively to the state of the art in the field.
Resumo:
In this paper the continuous Verhulst dynamic model is used to synthesize a new distributed power control algorithm (DPCA) for use in direct sequence code division multiple access (DS-CDMA) systems. The Verhulst model was initially designed to describe the population growth of biological species under food and physical space restrictions. The discretization of the corresponding differential equation is accomplished via the Euler numeric integration (ENI) method. Analytical convergence conditions for the proposed DPCA are also established. Several properties of the proposed recursive algorithm, such as Euclidean distance from optimum vector after convergence, convergence speed, normalized mean squared error (NSE), average power consumption per user, performance under dynamics channels, and implementation complexity aspects, are analyzed through simulations. The simulation results are compared with two other DPCAs: the classic algorithm derived by Foschini and Miljanic and the sigmoidal of Uykan and Koivo. Under estimated errors conditions, the proposed DPCA exhibits smaller discrepancy from the optimum power vector solution and better convergence (under fixed and adaptive convergence factor) than the classic and sigmoidal DPCAs. (C) 2010 Elsevier GmbH. All rights reserved.
Resumo:
Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly. (C) 2002 Elsevier Science Ireland Ltd. All rights reserved.
Resumo:
A number of characteristics are boosting the eagerness of extending Ethernet to also cover factory-floor distributed real-time applications. Full-duplex links, non-blocking and priority-based switching, bandwidth availability, just to mention a few, are characteristics upon which that eagerness is building up. But, will Ethernet technologies really manage to replace traditional Fieldbus networks? Ethernet technology, by itself, does not include features above the lower layers of the OSI communication model. In the past few years, it is particularly significant the considerable amount of work that has been devoted to the timing analysis of Ethernet-based technologies. It happens, however, that the majority of those works are restricted to the analysis of sub-sets of the overall computing and communication system, thus without addressing timeliness at a holistic level. To this end, we are addressing a few inter-linked research topics with the purpose of setting a framework for the development of tools suitable to extract temporal properties of Commercial-Off-The-Shelf (COTS) Ethernet-based factory-floor distributed systems. This framework is being applied to a specific COTS technology, Ethernet/IP. In this paper, we reason about the modelling and simulation of Ethernet/IP-based systems, and on the use of statistical analysis techniques to provide usable results. Discrete event simulation models of a distributed system can be a powerful tool for the timeliness evaluation of the overall system, but particular care must be taken with the results provided by traditional statistical analysis techniques.
Resumo:
Fieldbus communication networks aim to interconnect sensors, actuators and controllers within process control applications. Therefore, they constitute the foundation upon which real-time distributed computer-controlled systems can be implemented. P-NET is a fieldbus communication standard, which uses a virtual token-passing medium-access-control mechanism. In this paper pre-run-time schedulability conditions for supporting real-time traffic with P-NET networks are established. Essentially, formulae to evaluate the upper bound of the end-to-end communication delay in P-NET messages are provided. Using this upper bound, a feasibility test is then provided to check the timing requirements for accessing remote process variables. This paper also shows how P-NET network segmentation can significantly reduce the end-to-end communication delays for messages with stringent timing requirements.
Resumo:
In the past few years, a significant amount of work has been devoted to the timing analysis of Ethernet-based technologies. However, none of these address the problem of timeliness evaluation at a holistic level. This paper describes a research framework embracing this objective. It is advocated that, simulation models can be a powerful tool, not only for timeliness evaluation, but also to enable the introduction of less pessimistic assumptions in an analytical response time approach, which, most often, are afflicted with simplifications leading to pessimistic assumptions and, therefore, delusive results. To this end, we address a few inter-linked research topics with the purpose of setting a framework for developing tools suitable to extract temporal properties of commercial-off-the-shelf (COTS) factory-floor communication systems.