973 resultados para Data replication


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dissertação de Mestrado em Engenharia Informática

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nowadays, data available and used by companies is growing very fast creating the need to use and manage this data in the most efficient way. To this end, data is replicated overmultiple datacenters and use different replication protocols, according to their needs, like more availability or stronger consistency level. The costs associated with full data replication can be very high, and most of the times, full replication is not needed since information can be logically partitioned. Another problem, is that by using datacenters to store and process information clients become heavily dependent on them. We propose a partial replication protocol called ParTree, which replicates data to clients, and organizes clients in a hierarchy, using communication between them to propagate information. This solution addresses some of these problems, namely by supporting partial data replication and offline execution mode. Given the complexity of the protocol, the use of formal verification is crucial to ensure the protocol two correctness properties: causal consistency and preservation of data. The use of TLA+ language and tools to formally specificity and verify the proposed protocol are also described.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Abstract This thesis proposes a set of adaptive broadcast solutions and an adaptive data replication solution to support the deployment of P2P applications. P2P applications are an emerging type of distributed applications that are running on top of P2P networks. Typical P2P applications are video streaming, file sharing, etc. While interesting because they are fully distributed, P2P applications suffer from several deployment problems, due to the nature of the environment on which they perform. Indeed, defining an application on top of a P2P network often means defining an application where peers contribute resources in exchange for their ability to use the P2P application. For example, in P2P file sharing application, while the user is downloading some file, the P2P application is in parallel serving that file to other users. Such peers could have limited hardware resources, e.g., CPU, bandwidth and memory or the end-user could decide to limit the resources it dedicates to the P2P application a priori. In addition, a P2P network is typically emerged into an unreliable environment, where communication links and processes are subject to message losses and crashes, respectively. To support P2P applications, this thesis proposes a set of services that address some underlying constraints related to the nature of P2P networks. The proposed services include a set of adaptive broadcast solutions and an adaptive data replication solution that can be used as the basis of several P2P applications. Our data replication solution permits to increase availability and to reduce the communication overhead. The broadcast solutions aim, at providing a communication substrate encapsulating one of the key communication paradigms used by P2P applications: broadcast. Our broadcast solutions typically aim at offering reliability and scalability to some upper layer, be it an end-to-end P2P application or another system-level layer, such as a data replication layer. Our contributions are organized in a protocol stack made of three layers. In each layer, we propose a set of adaptive protocols that address specific constraints imposed by the environment. Each protocol is evaluated through a set of simulations. The adaptiveness aspect of our solutions relies on the fact that they take into account the constraints of the underlying system in a proactive manner. To model these constraints, we define an environment approximation algorithm allowing us to obtain an approximated view about the system or part of it. This approximated view includes the topology and the components reliability expressed in probabilistic terms. To adapt to the underlying system constraints, the proposed broadcast solutions route messages through tree overlays permitting to maximize the broadcast reliability. Here, the broadcast reliability is expressed as a function of the selected paths reliability and of the use of available resources. These resources are modeled in terms of quotas of messages translating the receiving and sending capacities at each node. To allow a deployment in a large-scale system, we take into account the available memory at processes by limiting the view they have to maintain about the system. Using this partial view, we propose three scalable broadcast algorithms, which are based on a propagation overlay that tends to the global tree overlay and adapts to some constraints of the underlying system. At a higher level, this thesis also proposes a data replication solution that is adaptive both in terms of replica placement and in terms of request routing. At the routing level, this solution takes the unreliability of the environment into account, in order to maximize reliable delivery of requests. At the replica placement level, the dynamically changing origin and frequency of read/write requests are analyzed, in order to define a set of replica that minimizes communication cost.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Currently, many museums, botanic gardens and herbariums keep data of biological collections and using computational tools researchers digitalize and provide access to their data using data portals. The replication of databases in portals can be accomplished through the use of protocols and data schema. However, the implementation of this solution demands a large amount of time, concerning both the transfer of fragments of data and processing data within the portal. With the growth of data digitalization in institutions, this scenario tends to be increasingly exacerbated, making it hard to maintain the records updated on the portals. As an original contribution, this research proposes analysing the data replication process to evaluate the performance of portals. The Inter-American Biodiversity Information Network (IABIN) biodiversity data portal of pollinators was used as a study case, which supports both situations: conventional data replication of records of specimen occurrences and interactions between them. With the results of this research, it is possible to simulate a situation before its implementation, thus predicting the performance of replication operations. Additionally, these results may contribute to future improvements to this process, in order to decrease the time required to make the data available in portals. © Rinton Press.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O presente projecto tem como objectivo a disponibilização de uma plataforma de serviços para gestão e contabilização de tempo remunerável, através da marcação de horas de trabalho, férias e faltas (com ou sem justificação). Pretende-se a disponibilização de relatórios com base nesta informação e a possibilidade de análise automática dos dados, como por exemplo excesso de faltas e férias sobrepostas de trabalhadores. A ênfase do projecto está na disponibilização de uma arquitectura que facilite a inclusão destas funcionalidades. O projecto está implementado sobre a plataforma Google App Engine (i.e. GAE), de forma a disponibilizar uma solução sob o paradigma de Software as a Service, com garantia de disponibilidade e replicação de dados. A plataforma foi escolhida a partir da análise das principais plataformas cloud existentes: Google App Engine, Windows Azure e Amazon Web Services. Foram analisadas as características de cada plataforma, nomeadamente os modelos de programação, os modelos de dados disponibilizados, os serviços existentes e respectivos custos. A escolha da plataforma foi realizada com base nas suas características à data de iniciação do presente projecto. A solução está estruturada em camadas, com as seguintes componentes: interface da plataforma, lógica de negócio e lógica de acesso a dados. A interface disponibilizada está concebida com observação dos princípios arquitecturais REST, suportando dados nos formatos JSON e XML. A esta arquitectura base foi acrescentada uma componente de autorização, suportada em Spring-Security, sendo a autenticação delegada para os serviços Google Acounts. De forma a permitir o desacoplamento entre as várias camadas foi utilizado o padrão Dependency Injection. A utilização deste padrão reduz a dependência das tecnologias utilizadas nas diversas camadas. Foi implementado um protótipo, para a demonstração do trabalho realizado, que permite interagir com as funcionalidades do serviço implementadas, via pedidos AJAX. Neste protótipo tirou-se partido de várias bibliotecas javascript e padrões que simplificaram a sua realização, tal como o model-view-viewmodel através de data binding. Para dar suporte ao desenvolvimento do projecto foi adoptada uma abordagem de desenvolvimento ágil, baseada em Scrum, de forma a implementar os requisitos do sistema, expressos em user stories. De forma a garantir a qualidade da implementação do serviço foram realizados testes unitários, sendo também feita previamente a análise da funcionalidade e posteriormente produzida a documentação recorrendo a diagramas UML.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND AND PURPOSE: Beyond the Framingham Stroke Risk Score, prediction of future stroke may improve with a genetic risk score (GRS) based on single-nucleotide polymorphisms associated with stroke and its risk factors. METHODS: The study includes 4 population-based cohorts with 2047 first incident strokes from 22,720 initially stroke-free European origin participants aged ≥55 years, who were followed for up to 20 years. GRSs were constructed with 324 single-nucleotide polymorphisms implicated in stroke and 9 risk factors. The association of the GRS to first incident stroke was tested using Cox regression; the GRS predictive properties were assessed with area under the curve statistics comparing the GRS with age and sex, Framingham Stroke Risk Score models, and reclassification statistics. These analyses were performed per cohort and in a meta-analysis of pooled data. Replication was sought in a case-control study of ischemic stroke. RESULTS: In the meta-analysis, adding the GRS to the Framingham Stroke Risk Score, age and sex model resulted in a significant improvement in discrimination (all stroke: Δjoint area under the curve=0.016, P=2.3×10(-6); ischemic stroke: Δjoint area under the curve=0.021, P=3.7×10(-7)), although the overall area under the curve remained low. In all the studies, there was a highly significantly improved net reclassification index (P<10(-4)). CONCLUSIONS: The single-nucleotide polymorphisms associated with stroke and its risk factors result only in a small improvement in prediction of future stroke compared with the classical epidemiological risk factors for stroke.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Research focus of this thesis is to explore options for building systems for business critical web applications. Business criticality here includes requirements for data protection and system availability. The focus is on open source software. Goals are to identify robust technologies and engineering practices to implement such systems. Research methods include experiments made with sample systems built around chosen software packages that represent certain technologies. The main research focused on finding a good method for database data replication, a key functionality for high-availability, database-driven web applications. Research included also finding engineering best practices from books written by administrators of high traffic web applications. Experiment with database replication showed, that block level synchronous replication offered by DRBD replication software offered considerably more robust data protection and high-availability functionality compared to leading open source database product MySQL, and its built-in asynchronous replication. For master-master database setups, block level replication is more recommended way to build high-availability into the system. Based on thesis research, building high-availability web applications is possible using a combination of open source software and engineering best practices for data protection, availability planning and scaling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

3D geographic information system (GIS) is data and computation intensive in nature. Internet users are usually equipped with low-end personal computers and network connections of limited bandwidth. Data reduction and performance optimization techniques are of critical importance in quality of service (QoS) management for online 3D GIS. In this research, QoS management issues regarding distributed 3D GIS presentation were studied to develop 3D TerraFly, an interactive 3D GIS that supports high quality online terrain visualization and navigation. ^ To tackle the QoS management challenges, multi-resolution rendering model, adaptive level of detail (LOD) control and mesh simplification algorithms were proposed to effectively reduce the terrain model complexity. The rendering model is adaptively decomposed into sub-regions of up-to-three detail levels according to viewing distance and other dynamic quality measurements. The mesh simplification algorithm was designed as a hybrid algorithm that combines edge straightening and quad-tree compression to reduce the mesh complexity by removing geometrically redundant vertices. The main advantage of this mesh simplification algorithm is that grid mesh can be directly processed in parallel without triangulation overhead. Algorithms facilitating remote accessing and distributed processing of volumetric GIS data, such as data replication, directory service, request scheduling, predictive data retrieving and caching were also proposed. ^ A prototype of the proposed 3D TerraFly implemented in this research demonstrates the effectiveness of our proposed QoS management framework in handling interactive online 3D GIS. The system implementation details and future directions of this research are also addressed in this thesis. ^